Site Reliability Engineer
Dallas, TX
Contracted
Experienced
Title: Site Reliability Engineer (SRE)
Location: Dallas TX
Must Have: Azure DevOps (YAML, ARM), Azure Kubernetes Service, Kubernetes (open source), Docker
JOB DUTIES
JOB SPECIFICATION
Skills
Additional Knowledge Skills and Abilities
Location: Dallas TX
Must Have: Azure DevOps (YAML, ARM), Azure Kubernetes Service, Kubernetes (open source), Docker
JOB DUTIES
- Partner with the architecture and development teams on how to make applications highly available, reliable, and performant at global scale
- Collaborate with the architecture team to ensure Reliability factors are accounted for in business features and enablers
- Guide development teams in understanding established service level objectives and consequences, and implementing appropriate SLIs to support the objectives.
- Collaborate with development team members to swarm, troubleshoot, and resolve problems.
- Guide ad-hoc teams to brainstorm solutions and build implementation plans based on the Root Cause Analysis of production issues
- Design and build automated solutions to optimize application/service/platform uptime with minimal human intervention
- Be available for an on-call rotation to participate in troubleshooting and communication efforts outside of normal business hours
- Implement and help create standards and best practices, and mentor other team members in order to drive adoption across development teams
- Perform other duties as assigned
- Conform with all company policies and procedures
JOB SPECIFICATION
- Knowledge
- Expert in defining, implementing, and evaluating Service Level Objectives (SLO) and Service Level Indicators (SLI), and associated consequences
- Software development expertise in two or more high-level programming and scripting languages
- Experience in evolutionary database design, query performance analysis, and indexing as a cornerstone for delivering scalable, performant products and services
- Experience in designing, building, and optimizing automated pipelines with automated testing and automated security controls
- Experience in performing Root Cause Analysis and Problem Management
- Experience working in Agile Scrum teams with demonstrated success leading improvements (getting better/faster/happier)
Skills
- Help establish and maintain a culture of learning through the development and sharing of skills, knowledge, process and tools; combat traditional silos that create “us and them” environments
- A driving passion for finding solutions to hard problems at scale and operationalizing them
- Exceptional critical thinking and communication skills, with a passion for leveraging documentation as a tool for constant improvement
Additional Knowledge Skills and Abilities
- Pipeline Automation: Azure DevOps (YAML, ARM), Terraform, Jenkins, Chef, Octopus Deploy
- Code Scanning: SonarQube, Checkmarx
- Source Code repos: Git
- Containerization: Azure Kubernetes Service, Kubernetes (open source), Docker
- High level programming languages: Java, C# (.NET MVC and .NET Core), Go
- Scripting: PowerShell, Bash
- Database: Oracle, Microsoft SQL Server, NoSQL (e.g. CosmosDB)
- Test Automation: Xamarin.UITest, Specflow, DevTest, Selenium, Test Data Manager, Postman,
- Maven, TestNG, JMeter
- Operating systems: Windows, Linux
- Cloud Platforms: Azure
- Metrics and Monitoring: Splunk
Apply for this position
Required*