Senior Site Reliability Engineer

Posted at: 08/15/2025

Addison, TX

  -  IT - Infrastructure / Network / Systems  -  Direct Placement  -  Job ID: 25-15626

Title: Senior Site Reliability Engineer
Location: Addison, TX (2 days a week onsite)
Duration: Fulltime / Permanent
Compensation: Between $110,000 – $150,000 (There is No annual bonus for this role)
Work Authorization: Must be able to work in the United States without sponsorship (Either now or in the future)
 
Job Responsibilities 

  • Lead incident response efforts and conduct blameless postmortems to identify root causes and drive systemic improvements.
  • Define, monitor, and report on service-level indicators (SLIs), objectives (SLOs), and agreements (SLAs).
  • Evolve the architecture to support future requirements based on SLIs, SLOs, and SLAs.
  • Identify and eliminate toil by automating repetitive operational tasks, thus increasing velocity and reliability.
  • Ensure management awareness of problems that are severe in nature or that are exceeding documented targets.
  • Ensure that all problems are resolved in a timely and efficient manner.
  • Own development of software to automate processes like analyzing logs, testing production environments, and responding to any issues.
  • Develop software tasks in accordance with standards and methodologies.
  • Possess deep knowledge of the entire technology stack.
  • Participate in capacity planning, performance analysis, and system tuning to ensure scalability and resilience.
  • Collaborate with development teams to ensure reliability is considered during design and implementation phases.
  • Mentor others to accelerate their career growth and encourage participation.
  • Provide technical mentoring to junior SREs.
  • Help build team spirit by assisting other staff members and promoting a positive workplace.
  • Challenge team processes, looking for ways to improve them.
  • Recognize potential areas where policies and procedures require change, or where new ones need to be developed, especially regarding future business expansion. Submit recommendations as appropriate.
  • Ensure all changes comply with change management policies and procedures.
  • Embody the philosophy of DevOps & Sire Reliability Engineering by providing a prescriptive way of measuring and achieving reliability through engineering and operations work.
  • Monitor and report on any security violations related to the unwarranted access to corporate data.
  • Review outstanding issues daily to assure that troubleshooting and resolutions are current.
  • Cross-functional collaboration with application engineering, QA, and infrastructure teams to ensure observability and reliability.
  • Perform tool evaluation and selection in support of observability and automation

 
Qualifications

  • Education Level: Bachelor's Degree
  • Preferred experience includes AWS or Azure certifications.
  • 7+ years of total work experience in IT, software engineering, or infrastructure roles.
  • Minimum of 5 years of hands-on experience in Sire Reliability Engineering, DevOps, or closely related roles.
  • At least 3 years of direct experience with AWS and/or Azure, including infrastructure provisioning, automation, and monitoring.
  • Experience with implementing, managing, and using observability tools, data visualization, and application monitoring platforms such as Dynatrace, AWS CloudWatch, Azure Monitor, Grafana, Prometheus, or Datadog.
  • Familiarity with error budgets and their role in balancing reliability and innovation.
  • Direct experience building, launching, configuring, and maintaining AWS and/or Microsoft Azure cloud resources.
    • Expertise preferred in implementing methodologies for Automation, Continuous Integration, Continuous Delivery, High Availability, High Scalability, Monitoring, Logging, Security and Governance
  • Experience with Terraform and a strong understanding of Infrastructure as Code (IaC) principles.
  • Strong scripting knowledge using languages such as PowerShell, Bash, Python, Groovy, etc.
  • Proficiency in at least one programming language preferred, e.g., Python, Java, or .NET.
  • Proficient in Git for version control and collaborative development.
  • Experience with GitLab or similar platforms for source code management and CI/CD.
  • Familiarity with Atlassian tools (Jira, Confluence) is a plus.
  • Proficient in administering Linux and/or Windows-based platforms.
  • Experience supporting production enterprise applications.
  • Strong understanding of complex multi-tiered environments and their integration with DevOps toolsets.
  • Experience in problem management, preventive maintenance, and analytical and conceptual problem solving.
  • Experience in business process improvement is also desired.

Job-Related Skills/Competencies

  • Excellent analytical and problem-solving skills
  • Proven drive towards continual improvement
  • Strong interpersonal and communication skills
  • Strong analytical mindset for risk assessment and mitigation.
  • Ability to assess and mitigate risks to system reliability through proactive engineering.
  • Skilled in quantifying reliability metrics and communicating their impact to stakeholders

 
About INSPYR Solutions
Technology is our focus and quality is our commitment. As a national expert in delivering flexible technology and talent solutions, we strategically align industry and technical expertise with our clients' business objectives and cultural needs. Our solutions are tailored to each client and include a wide variety of professional services, project, and talent solutions. By always striving for excellence and focusing on the human aspect of our business, we work seamlessly with our talent and clients to match the right solutions to the right opportunities. Learn more about us at inspyrsolutions.com.

INSPYR Solutions provides Equal Employment Opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability, or genetics. In addition to federal law requirements, INSPYR Solutions complies with applicable state and local laws governing nondiscrimination in employment in every location in which the company has facilities.
 
 
 
 

25-15626

MORE OPPORTUNITIES

APPLY NOW

TAKE THE NEXT STEP.