I Didn't Want to Give My AI Agent SSH

I’ve been looking for a genuinely good use case for a personal AI agent for about a year. Most of what gets suggested (let it write your emails, let it book your meetings, let it triage your inbox) I find faintly depressing. Not bad, exactly. Just lame. Not the kind of thing that pays for the complexity.

Infrastructure monitoring turned out to be the one that landed for me.

I’d started playing with OpenClaw, Peter Steinberger’s personal agent, specifically because the use case I wanted was patient, curious thing that pokes at my servers when I can’t be bothered. I have about a dozen Linux and macOS boxes scattered across my life: a mini PC under the stairs, a colo server, a couple of cloud VMs, a Mac mini doing media duty. Over the years I’ve bolted every variety of monitoring stack to them. Grafana. Prometheus. Telegraf. Once, briefly, Nagios. None of it ever got looked at.

What I actually do, when I want to know what’s going on, is SSH in and run top. Then df -h. Then docker ps. A thirty-year ritual. Faster than any dashboard I set up and forgot about.

The agent could do that. The question was how.

The obvious ways are the wrong ways

The two patterns most commonly suggested for wiring an AI agent into infrastructure both made me uncomfortable in ways I couldn’t shake.

Option one: let the agent gateway SSH directly into your boxes. Hand it your key, let it run arbitrary commands, trust it not to do anything stupid. As someone who’s spent three decades in security, this is close to my worst-case threat model. You now have an LLM with shell access to everything, and the blast radius of a prompt injection or a misbehaving tool call is… all of it. I can’t think of a threat modelling exercise that ends well with “and then we gave the language model sudo.”

Option two: run agent nodes on every host. OpenClaw, like most agent frameworks, can be deployed directly on the machines you want it to manage. That turns every server into an attack surface running a non-trivial codebase with reasoning capabilities and, almost certainly, some network exposure. On my list of things I want to multiply across a dozen hosts, AI agent runtime is somewhere below clickable email attachments.

Neither passed the smell test. What I wanted was something narrower and more boring. I didn’t want the agent to have SSH. I wanted it to have a small, curated set of specific questions it was allowed to ask.

snoopd: the middleman that does the boring bit

snoopd is an MCP server. It exposes about a dozen read-only tools (system health, disk usage, memory, Docker containers, listening ports, cron jobs, kernel errors), and each tool does exactly one thing: opens an SSH connection from my machine using my key, runs a curated set of commands, parses the output, disconnects.

The agent never touches SSH. It never sees my keys. It can’t run arbitrary commands. It can ask for memory usage on um790-01 and get some JSON back. That’s all.

Three design choices made it work.

The commands are boring. Every command snoopd runs is something I’d type by hand. cat /proc/meminfo. df -h. ps aux --sort=-%cpu. That’s deliberate. When the agent tells me load average is 4.2 and three processes are pegged, I want to be able to reproduce that without trusting any clever parsing or abstraction in my own tool. Boring, auditable commands against boring, auditable files.

The client is the smart bit. snoopd itself does almost nothing. It wraps SSH, parses text, returns structured data. The intelligence (is this healthy, should I worry, what does this correlate with) lives in the agent, not in snoopd. Earlier iterations got this wrong. I kept trying to put alerting logic in the tool. It was always the wrong place.

The scope is enforced, not negotiated. Every snoopd tool runs a tight, pre-defined set of commands: df, ps, cat /proc/meminfo, ip addr. None of them accepts a shell command as input. None of them passes agent-supplied strings into exec. The agent can’t ask snoopd to run this for me. It can only invoke one of a dozen specific questions I’ve chosen to expose, each one tied to a narrow, read-only role. That enforcement lives in the MCP translation layer, in code I control. The read-only guarantee doesn’t depend on the agent behaving itself; it depends on me not writing a tool that mutates anything. Much easier to defend in a code review than the model is supposed to be careful.

The result is a small TypeScript project that sits between the agent and my infrastructure, gives up the power of run anything, and keeps the thing I actually wanted: something patient and curious that’s willing to look around when I ask it to.

A live one

Here’s one from last week. My scheduled health sweep (snoopd runs a check across all my boxes every three hours and pipes the results into OpenClaw) flagged high CPU on the Mac mini. Which is, with some comedic timing, the box OpenClaw itself runs on. The agent noticing that its own host was struggling was a mildly funny moment.

I asked it to take a look. It called get_system_health, saw load elevated, then list_processes to pull the top CPU consumers. It came back with a diagnosis: the top two processes were macOS’s dynamic wallpaper engine, the thing that subtly shifts your desktop image to match the time of day.

The Mac mini is headless. There is no desktop. Nobody has ever seen that wallpaper. I turned it off. Load dropped. Happy days.

That’s the whole shape of it. A warning fires, the agent reaches for the right tool, the agent reasons about what it finds, a human makes the call. I didn’t open a terminal. I didn’t hand over my SSH key. snoopd gave the agent a narrow way to look; OpenClaw brought the reasoning.

What it’s actually about

This isn’t really a story about monitoring. It’s a story about scope.

The reason I couldn’t stomach the two default patterns wasn’t that I don’t trust AI agents. It’s that I don’t trust unbounded AI agents, and I especially don’t trust giving them SSH. Handing an agent a narrow, curated set of read-only tools (that I can audit, that speak a vocabulary I chose) changes what the risk analysis looks like. The attack surface is a dozen tools that run df -h and friends. That, I can reason about.

If you’re looking at personal agents and your first instinct is but the security implications though, good. That instinct is correct. The answer isn’t to avoid the use case. It’s to stop giving the agent the whole key ring and start giving it the specific questions it’s allowed to ask.

Peter Steinberger describes OpenClaw’s ethos as we give your AI claws. That’s the part I love about the project: tools, not waffle. Action, not chat. What I’ve tried to do with snoopd is hand over the claws carefully. The agent can reach. It can prod. It can poke. But only at the things I’ve chosen to expose, and only in the ways I’ve decided are safe to expose them.

Claws, yes. But clipped.

Glass half full.