SherlockOps — Your alerts deserve a detective, not a dashboard.

How it works

Three steps. No dashboards involved.

Alert comes in

Alertmanager fires a webhook. SherlockOps picks it up.

# alertmanager.yml
receivers:
  - name: 'sherlockops'
    webhook_configs:
      - url: 'http://sherlockops:8080/webhook/alertmanager'
        send_resolved: true
        http_config:
          http_headers:
            - name: X-Environment
              values: [ "prod" ]
            - name: X-Channel-Slack
              values: [ "#alerts" ]

SherlockOps investigates

The agent picks the right tools, queries your stack, and correlates the data. No runbooks needed.

prometheus ─── cpu_usage: 94.2%, mem: 87%
kubernetes ─── pod restarts: 3 in 10min
loki       ─── OOMKilled detected, exit code 137
git        ─── commit abc123 deployed 2h ago

  └─ correlating 4 data sources...
  ✓ Root cause identified

You get the answer

A clear diagnosis lands in Slack, Telegram, or wherever you work. With the root cause, evidence, and next steps.

SherlockOps 2 min ago

CRITICAL — HighCPU on prod/payment-service

Diagnosis: Memory leak causing OOM kills. Pod restarted 3 times in 10 min. Triggered by commit abc123 deployed 2h ago. Scale memory 512Mi → 1Gi · Revert commit abc123 · Check for goroutine leak in /api/checkout

Integrations

What it connects to

Monitoring

───────────

Alertmanager
Grafana
ZZabbix
Datadog
ELK / Loki

Cloud

───────────

AWS CloudWatch
GCP Monitoring
Azure Monitor
Yandex Cloud
DigitalOcean
VMware vSphere
Kubernetes

Databases

───────────

PostgreSQL
MongoDB
(any via MCP)

Messengers

───────────

Slack
Telegram
MS Teams

Why SherlockOps

Less noise. More answers.

Self-hosted

Your data stays yours. Single Docker container, zero SaaS dependencies. Runs in your VPC, behind your firewall.

Any LLM

Claude, GPT-4, Ollama, vLLM. Bring your own model. Switch providers without changing a line of config.

50+ tools

From Prometheus to MongoDB, your agent investigates with real data. Not summaries, not guesses — actual queries.

MCP native

Connect any MCP server. Add new data sources without writing code. The protocol does the plumbing.

// connect any MCP server in 3 lines of config

mcp:
  clients:
    - name: "k8s-cluster"
      url: "https://k8s-mcp.your-infra.com/mcp"

    - name: "argocd"
      url: "https://argo-mcp.your-infra.com/mcp"

    - name: "alertmanager"
      url: "https://am-mcp.your-infra.com/mcp"

    - name: "custom-tool"
      url: "http://internal-mcp:3000"
      auth: "bearer"
      token: "your-token"

Any MCP-compatible server becomes a tool for the AI agent. ArgoCD, Alertmanager, Vault, custom APIs — if it speaks MCP, SherlockOps can use it to investigate alerts. No code changes needed.

Quick Start

Running in 2 minutes.

Clone and configure

$ git clone https://github.com/Duops/SherlockOps.git
$ cd sherlock-ops
$ cp .env.example .env   # add your LLM API key
$ docker compose up -d

Enable your tools in config.yaml

tools:
  prometheus:
    enabled: true
    url: "http://prometheus:9090"
  kubernetes:
    enabled: true
    kubeconfig: "/root/.kube/config"
  loki:
    enabled: true
    url: "http://loki:3100"

notifications:
  slack:
    webhook_url: "https://hooks.slack.com/..."

Point Alertmanager at SherlockOps and you're done.

Architecture

What's under the hood.

Alertmanager

Grafana

Zabbix

Datadog

│
▼

SherlockOps API
webhook receiver + worker pool

│
▼

LLM Agent
Claude · GPT-4 · Ollama · vLLM

│││

▼▼▼

Prometheus
query metrics

Kubernetes
pods · logs · events

Loki
search logs

Cloud APIs
AWS · GCP · Azure · YC · DO

Databases
PostgreSQL · MongoDB

MCP
any MCP server

│││

▼

Diagnosis
root cause + actionable steps

│││

▼▼▼

Slack
thread reply

Telegram
edit message

Teams
update card