sherlock-ops

Your alerts deserve a detective, not a dashboard.

SherlockOps receives an alert, queries your monitoring, logs, and infrastructure, then tells you exactly what went wrong and how to fix it.

How it works

Three steps. No dashboards involved.

01

Alert comes in

Alertmanager fires a webhook. SherlockOps picks it up.

# alertmanager.yml receivers: - name: 'sherlockops' webhook_configs: - url: 'http://sherlockops:8080/webhook/alertmanager' send_resolved: true http_config: http_headers: - name: X-Environment values: [ "prod" ] - name: X-Channel-Slack values: [ "#alerts" ]
02

SherlockOps investigates

The agent picks the right tools, queries your stack, and correlates the data. No runbooks needed.

prometheus ─── cpu_usage: 94.2%, mem: 87% kubernetes ─── pod restarts: 3 in 10min loki ─── OOMKilled detected, exit code 137 git ─── commit abc123 deployed 2h ago └─ correlating 4 data sources... ✓ Root cause identified
03

You get the answer

A clear diagnosis lands in Slack, Telegram, or wherever you work. With the root cause, evidence, and next steps.

S
SherlockOps 2 min ago
CRITICAL — HighCPU on prod/payment-service

Diagnosis: Memory leak causing OOM kills. Pod restarted 3 times in 10 min. Triggered by commit abc123 deployed 2h ago. Scale memory 512Mi → 1Gi · Revert commit abc123 · Check for goroutine leak in /api/checkout

Integrations

What it connects to

Monitoring

───────────
  • Alertmanager
  • Grafana
  • ZZabbix
  • Datadog
  • ELK / Loki

Cloud

───────────
  • AWS CloudWatch
  • GCP Monitoring
  • Azure Monitor
  • Yandex Cloud
  • DigitalOcean
  • VMware vSphere
  • Kubernetes

Databases

───────────
  • PostgreSQL
  • MongoDB
  • (any via MCP)

Messengers

───────────
  • Slack
  • Telegram
  • MS Teams

Why SherlockOps

Less noise. More answers.

Self-hosted

Your data stays yours. Single Docker container, zero SaaS dependencies. Runs in your VPC, behind your firewall.

Any LLM

Claude, GPT-4, Ollama, vLLM. Bring your own model. Switch providers without changing a line of config.

50+ tools

From Prometheus to MongoDB, your agent investigates with real data. Not summaries, not guesses — actual queries.

MCP native

Connect any MCP server. Add new data sources without writing code. The protocol does the plumbing.

// connect any MCP server in 3 lines of config

mcp:
  clients:
    - name: "k8s-cluster"
      url: "https://k8s-mcp.your-infra.com/mcp"

    - name: "argocd"
      url: "https://argo-mcp.your-infra.com/mcp"

    - name: "alertmanager"
      url: "https://am-mcp.your-infra.com/mcp"

    - name: "custom-tool"
      url: "http://internal-mcp:3000"
      auth: "bearer"
      token: "your-token"

Any MCP-compatible server becomes a tool for the AI agent. ArgoCD, Alertmanager, Vault, custom APIs — if it speaks MCP, SherlockOps can use it to investigate alerts. No code changes needed.

Quick Start

Running in 2 minutes.

Clone and configure
$ git clone https://github.com/Duops/SherlockOps.git $ cd sherlock-ops $ cp .env.example .env # add your LLM API key $ docker compose up -d
Enable your tools in config.yaml
tools: prometheus: enabled: true url: "http://prometheus:9090" kubernetes: enabled: true kubeconfig: "/root/.kube/config" loki: enabled: true url: "http://loki:3100" notifications: slack: webhook_url: "https://hooks.slack.com/..."
Point Alertmanager at SherlockOps and you're done.

Architecture

What's under the hood.

Alertmanager
Grafana
Zabbix
Datadog

SherlockOps API
webhook receiver + worker pool

LLM Agent
Claude · GPT-4 · Ollama · vLLM
Prometheus
query metrics
Kubernetes
pods · logs · events
Loki
search logs
Cloud APIs
AWS · GCP · Azure · YC · DO
Databases
PostgreSQL · MongoDB
MCP
any MCP server
Diagnosis
root cause + actionable steps
Slack
thread reply
Telegram
edit message
Teams
update card