Grafana Cloud Engineer
Job Title: Grafana Cloud Engineer
Location: Dallas, TX (Hybrid)
Duration: 12 Months
Job Summary
We are seeking an experienced Grafana Cloud Engineer with strong expertise in designing, implementing, and optimizing enterprise observability solutions using Grafana Cloud. The ideal candidate will have hands-on experience with metrics, logs, traces, dashboards, alerting, and cloud-native monitoring architectures.
Key Responsibilities
Grafana Cloud Implementation & Administration
Design, deploy, and manage Grafana Cloud environments for enterprise monitoring solutions.
Configure and manage data sources such as Prometheus, Loki, Tempo, InfluxDB, Elasticsearch, and cloud monitoring tools.
Integrate monitoring platforms with AWS CloudWatch, Azure Monitor, and GCP Operations Suite.
Implement secure authentication and authorization using SSO, OAuth, LDAP, or Azure AD.
Optimize Grafana architecture for scalability, performance, and cost efficiency.
Observability & Monitoring Architecture
Design and implement end-to-end observability stacks covering metrics, logs, and distributed tracing.
Develop monitoring strategies aligned with SRE and DevOps practices (SLIs, SLOs, SLAs).
Configure alerting systems using Grafana Alerting, Alertmanager, and Loki alerts with escalation workflows.
Dashboarding & Visualization
Develop custom Grafana dashboards for application and infrastructure monitoring.
Translate business and technical monitoring requirements into actionable visualizations.
Standardize dashboard templates and monitoring frameworks across teams.
Integration & Automation
Integrate Grafana Cloud with Kubernetes clusters, CI/CD pipelines, and cloud platforms.
Automate observability deployments using Terraform, Helm, Ansible, or GitOps workflows.
Implement OpenTelemetry instrumentation for distributed tracing.
Troubleshooting & Optimization
Diagnose and resolve issues related to metrics, logs, traces, and data source configurations.
Optimize queries using PromQL, LogQL, and SQL for performance and cost efficiency.
Ensure high availability and reliability of monitoring platforms.
Documentation & Knowledge Sharing
Develop architecture documentation, runbooks, and best practice guides.
Train internal teams on Grafana dashboards, alerting, and observability practices.
Act as a Subject Matter Expert (SME) for Grafana Cloud initiatives.
Required Qualifications
5+ years of experience implementing and managing Grafana or Grafana Cloud.
Strong hands-on experience with:
Grafana Mimir
Loki
Tempo
Prometheus
Alertmanager
Experience with PromQL, LogQL, and SQL query optimization.
Hands-on experience implementing OpenTelemetry instrumentation.
Experience with cloud platforms such as AWS, Azure, or GCP.
Strong knowledge of Kubernetes, microservices architecture, and cloud-native observability.
Experience with DevOps automation tools including Terraform, Helm, Git, and CI/CD pipelines.
Strong understanding of SRE principles, monitoring design, and performance engineering.
Excellent collaboration skills for working with cross-functional teams.
Preferred Qualifications
Grafana Cloud or Prometheus certifications.
Experience designing multi-tenant monitoring solutions.
Integration experience with ServiceNow, PagerDuty, Opsgenie, or Jira.
Knowledge of RBAC, security frameworks, and compliance standards.
Experience with other observability tools such as Splunk or New Relic.
Ability to work independently in fast-paced environments and support hands-on implementation.