code · · 7 min read
For about a year, three open-source personal AI agents have been getting serious noise in the parts of the internet I pay attention to. OpenClaw from Peter Steinberger. Hermes-Agent from Nous Research. Feynman from Companion. All three are serious projects with real teams behind them. All three get treated as broadly the same category of thing (“personal AI agent”). And all three landed on my desk at roughly the moment I decided I was going to actually use one rather than admire them from a distance.
So I ran all three. Properly. Not a weekend each, not a demo walkthrough; enough weeks of each for the novelty to wear off and the real shape of each tool to surface.
The short version: I wanted OpenClaw to be the one. It wasn’t. The other two were.
The longer version is more interesting, because the three projects aren’t really competing. They’re three completely different answers to the question what is a personal agent for?, and you don’t really see the shape of the question until you’ve lived with each of the answers.
Before I get into any of them specifically, a note on framing. The reflex, when you see three projects that sound similar, is to ask which is best? It’s the LLM-leaderboard instinct. Rank them. Pick a winner. Move on.
It’s the wrong question. Once you’ve actually run OpenClaw, Hermes, and Feynman for a few weeks each, you stop believing that “personal AI agent” is a meaningful product category. It’s at least three categories, each represented here by one serious project, and none of the three is trying to do what the other two are trying to do. A ranking that put them on the same axis would be the intellectual equivalent of ranking a chat app, an operating system, and a research assistant on “productivity”.
So this isn’t a leaderboard. It’s three separate stories about three separate tools that happen to share a surface description. The interesting questions are which one fit my life, and what the shape of the mismatch tells you about the rest of the category.
I should say this up front, because what follows won’t read like praise. I have a lot of respect for Pete Steinberger, I like OpenClaw’s ambition, and we give your AI claws is a better tagline than 90% of commercial products get out of their marketing departments. If I could have made OpenClaw stick, I would have.
I couldn’t.
The honest version of my experience is that OpenClaw was in near-constant flux while I was using it. Features would land, break something adjacent, get patched, and a week later a different bit would regress. The project is moving fast, which is a good sign for a young open-source tool, but at any given moment I had a decent chance of spending more time debugging OpenClaw than being productive with it. That ratio is a killer for something whose job is supposed to be giving you time back.
The surface area doesn’t help. OpenClaw’s big pitch is channel breadth: WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Matrix, and roughly eighteen others. Each of those is its own ecosystem with its own authentication quirks, its own API eccentricities, and its own ways of breaking. Running an assistant that’s reachable through all of them is an impressive feat for one project. Keeping that assistant reliable across all of them is a different problem, and for me, for now, it wasn’t solved.
The one place OpenClaw still has a seat in my life is as the runtime my snoopd MCP server talks to. Inside that narrow scope it’s steady, partly because snoopd does exactly one boring thing and the OpenClaw runtime doesn’t need to hold much state to make it work. For anything wider, the fragility started costing me more time than the tool saved.
None of this is a takedown. A younger me, or a more patient one, would have filed bugs and stuck with it. But the bar I was measuring against isn’t is this a credible project?. It’s does this give me time back?, and at the moment, the honest answer is no.
What struck me most about Hermes, and I know this isn’t a flashy thing to say, is that it just works. The config is solid. The mental model of what the agent does and how you extend it is simple enough to hold in your head after an afternoon. Nothing has mysteriously fallen over in the weeks I’ve been running it.
It just works is a boring sentence, and it’s the sentence that matters. In a category where most projects are fighting to out-demo each other, a tool that boringly does what the docs say it does is quietly doing something much harder than it looks.
I’ve built a handful of small skills on Hermes, the kind of personal-scope stuff that’s a pain to wire up for a one-off but becomes genuinely useful once the agent remembers the skill is there. Its persistent memory layer does the actual thing: context accumulates across sessions in a way that makes Hermes feel like it has a history with me, rather than waking up as a different amnesiac every conversation.
It’s also the agent I reach for when I want quick-turnaround research: the kind where I want a reasonable answer and a couple of sources to check, and I’m not about to publish anything off the back of it. The line between what I send to Hermes and what I send to the third tool is the most interesting split in my stack.
Feynman is the first of these three projects that made me think oh, this is what the role division in these systems is actually for.
Most AI agents are one thing wearing different hats. You ask them to research, to draft, to check their work, and it’s all the same machine being polite about context-switching. Feynman makes the roles real. Researcher agents gather evidence in parallel. A Writer agent turns that evidence into a draft. A Reviewer agent critiques it. A Verifier agent checks the citations and tells you when a URL is dead or a source doesn’t actually support the claim being made. Four roles, four different jobs, four different prompts, four different concerns.
The result is that Feynman produces things I actually want to keep. Not notes I have to rewrite. Not drafts I have to fact-check from scratch. Actual research output, with inline citations, where the citations have been validated and the reasoning has been reviewed by a separate agent whose job was to disagree with the first one.
The post you are reading right now is a small example of why I rate it. When I sat down to think through this three-agent comparison, the research pass (what do these projects say they do, what does their architecture look like, where do they overlap and diverge) was done by Feynman. It produced a source-grounded, paper-shaped document that I could sit with, disagree with, and use as a starting point for my own take. Having a tool that produces that kind of artefact on demand feels like having a quiet, patient professor on retainer.
It’s the piece of this stack I’d have been most embarrassed to miss.
Step back from the individual reviews and it becomes obvious that these three projects aren’t building the same thing.
For OpenClaw, a personal agent is a presence across every messaging surface you already use. For Hermes, it’s a durable runtime that accumulates context, skills, and scheduled work over time. For Feynman, it’s a research pipeline with evidence discipline and role separation. All three are defensible answers. None of them is a subset of the others.
The fact that three competent teams reached three such different conclusions is, I think, the most interesting finding you can pull out of this exercise. Personal agent isn’t a product category yet. It’s a word that people are currently using for at least three distinct products, each with its own design center, its own failure modes, and its own right answer about what matters most.
That also means the leaderboard framing is worse than wrong. It actively obscures the thing you most need to notice as a user, which is what shape of personal agent does your life actually want? The answer is rarely obvious upfront. In my case I had to run all three before I knew that my life wanted a durable runtime and a research pipeline more than it wanted a messaging-native always-on presence. Yours might be different.
The two tools I kept are Hermes and Feynman. Hermes is the daily driver: the one I talk to, the one that remembers my preferences, the one I hang small skills off. Feynman is the research companion: the one I turn to when a thing needs investigating properly and writing up with sources somebody else can check.
OpenClaw is the project I wanted the most and still have the most affection for. I’m not ruling it back in once the instability settles. It’s also still doing a real job for me, just a narrow one: it’s the runtime snoopd talks to, and inside that tight scope it’s steady. I love the ambition, I don’t trust it with my week yet is an unflattering sentence. It’s also the one that matches my experience, and one of the rules of writing anything honest is that you don’t swap an unflattering true sentence for a flattering vaguer one.
I wanted OpenClaw to be the one. The other two were. And the most useful thing I got out of running all three wasn’t the answer to which is best. It was the slow realisation that which is best was never the question I was going to be able to answer, because I wasn’t asking it about the same kind of thing three times.