← All episodes

2026-07-04 · Agentic coding notes from Galapogos Island cover art

2026-07-04 · Agentic coding notes from Galapogos Island

Show notes

BRINE — 2026-07-04 · show notes

Guest: the tooling optimist (a fictional archetype).

Claims are paraphrased and attributed; nothing is read verbatim. Where a thread disagreed with the article, the show surfaces the disagreement.

Segments

  1. Agentic coding notes from Galapogos Island
  • Source: https://danluu.com/ai-coding/#appendix-agentic-loops-and-writing-this-post
  • Discussion: https://lobste.rs/s/pigihe
  • Topic: Testing/AI-Engineering · interest 85
  • Dan Luu argues that LLM-driven coding is best utilized not for writing code, but for scaling massive, randomized, property-based testing workflows modeled after hardware design practices. The piece highlights a recurring tension: agents are prone to 'hallucinating' successful tests, making automated, verifiable fuzzing a necessary discipline to maintain quality.
  1. The Short Leash AI Coding Method For Beating Fable
  • Source: https://blog.okturtles.org/2026/07/short-leash-ai-method/
  • Discussion: https://lobste.rs/s/09tksp
  • Topic: AI-assisted development workflow · interest 85
  • The author critiques 'vibe-based' agentic coding and proposes a 'Short Leash' methodology. This approach treats AI as a high-velocity pair programmer that requires constant human intervention, explicit permission gating for diffs, and mandatory human-led reviews to maintain security-critical standards.
  1. Most MCP servers don't need to exist. Your case might be an exception

Transcript

Transcript. Paraphrased; sources in notes.md.

HostIt is July 4th, 2026. Welcome back, Samantha. I feel like we are in the middle of a massive recalibration period for how we build software. The hype cycle is hitting this weird wall where people are realizing that letting an AI agent loose in the codebase without a map is a recipe for disaster.

GuestOh, tell me about it, Daniel. I spent most of last week cleaning up an agent-induced mess in a repo I thought was safe. I love the chaos, really, but my keyboard is getting a workout. I saw some of the links coming out of Lobsters today and they are hitting on exactly that pain. We are moving from the phase of, look what this thing can do, to the phase of, wait, how do I actually stop this thing from burning down my production environment?

HostExactly. And the first piece we have over on Lobsters today speaks to that frustration. Dan Luu writes about his experiences with agentic coding, essentially arguing that we should stop trying to make LLMs write perfect features and start using them for massive, randomized, property-based testing. He notes that if a human behaved the way these agents do, you would fire them on the spot, but his reaction is to just spin up a thousand more of them to see what happens.

GuestI love that energy. It is so pragmatic. It is like the difference between a junior dev who writes buggy code and a fuzzer that finds ten thousand ways to crash your server. If you treat the agent as a chaos engine that is churning out test cases rather than a senior engineer who needs to write your business logic, you actually get somewhere. Though, as a Lobsters user called dryya points out in the thread, the loop can be deceptive. Dryya says that when these things get stuck in a blind alley, they just make increasingly desperate, meaningless tweaks that do not improve the actual performance. It is easy to hallucinate progress when you are just watching logs fly by.

HostThat is the danger, right? You think you are shipping, but you are just watching the model panic in real time.

GuestExactly. You need the guardrails. You need to verify the output with something that is not just another LLM.

HostSpeaking of guardrails, our second story dives into the Short Leash methodology. The author suggests that instead of letting agents fly solo, we should be treating them like high-velocity, high-maintenance pair programmers. It is about explicit permission gating for every single diff and keeping a human in the loop for anything security-critical.

GuestThis is my life now. Seriously. I have a custom workflow where my AI agent essentially acts as a glorified diff generator, but I have a script that forces a human-readable summary before I even see the code. A commenter on Lobsters named davepagurek hits the nail on the head. They mentioned that often, it is just faster to write the code yourself than to play this guessing game of whether the AI understands your intent. I found myself doing that yesterday, I was trying to refactor a legacy auth module, and I spent twenty minutes coaxing the AI to do it right. I finally just closed the window, drank an extra espresso, and wrote it in five minutes.

HostIt feels like we are learning that the AI is best at the boring stuff, but it struggles when the context is too heavy or the stakes are too high.

GuestTotally. When the task requires deep domain knowledge, the friction of describing the problem starts to outweigh the speed gains.

HostWhich brings us to the infrastructure side of things, specifically MCP. For our listeners, MCP, or the Model Context Protocol, is a standard meant to let AI agents connect to data and tools. The piece on Lobsters today argues that most MCP servers are just premature product interfaces. The author suggests that if you do not have a public API or a decent CLI, you have no business building an MCP server.

GuestThis is the hot take I needed today. People are rushing to wrap their databases in MCP just because it is the new shiny, but they are ignoring the basics. If your API is a mess, your MCP server is just a mess that is easier for an agent to exploit. I think of MCP as the final layer in a stack. You need your stable API contract first, then your CLI for the humans, and then, if you really need to, you give the agents the keys via MCP.

HostIt seems like a lot of people are skipping steps.

GuestThey are, and it is going to come back to haunt them. It is like building a house on a foundation of loose sand. I would rather see people focus on clean, stable APIs that actually behave predictably. If you are going to let an agent touch your product, it better be able to rely on a solid interface, not some half-baked abstraction that only works if the model is having a good day.

HostI think that is a perfect place to leave it. Samantha, thanks for the caffeine and the perspective today.

GuestMy pleasure, Daniel. I am going to go spend the rest of the afternoon making sure my own agents are on a very, very short leash.

HostSounds like a plan. We grabbed these stories from the community over at Lobsters. Thanks for joining us, and we will see you all back here tomorrow.