Notes on a Dev Using AI Tools

Gabe

15 Feb 2025 • 7 min read

I started programming at age ~21/22. And I'm 35 now. But not all ~14 years have been spent full-time doing dev work. I'm highly capable but know I have a lot of room to grow after finding some incredibly smart people on X.

Notes

Just wanted to share some quick notes/thoughts on using AI tools to help with coding tasks from the perspective of someone who is still very comfortable building without AI assistance.

Loss of Control

Working with AI is interesting because it's like outsourcing your work to a cheaper contractor. And sometimes it pays off and seems magical. While other times you wonder why people ever thought AI could fully replace a software engineer.

There is a spectrum of control you can maintain when working with AI:

[High control] You prompt ChatGPT or Claude via their chat interface. You asked a question, it spits out code in a markdown code block, you inspect it, then copy/paste into your code. Similar vibes to StackOverflow days, except every output is highly tailored to your situation.
[Medium control] You use a copilot or something similar (Cursor, Bolt, Windsurf, Lovable, etc). You still have full control, but the problem is that AI can more easily output a bunch of code and edit files for you. It can more easily delete stuff. So while you're still in control, it's easier to slip up and trust that AI is giving you something good (because it has firsthand context about what you're working on). More than once I've had Cursor's "Apply" feature (to apply the diff from chat) randomly decide to delete code that shouldn't be deleted. And I didn't immediately catch it.
[Low control] You use something like Devin or OpenHands that acts as an engineer on your team. These apps don't really involve you at all. They make changes on a file level pretty much freely unless you interrupt them. This gives up the most control and could easily put you in a bad spot if you don't know what you're doing. It might screw up the structure of your codebase, delete entire files or directories that you need, get stuck in a loop trying to fix bugs by editing a file over and over adding more and more garbage, etc. You need to be VERY careful when working with AI on this level. It's really easy to end up out over your skis.

In order to get the most from AI, we all need to start shifting from those "High Control" methods where LLMs are just tailored StackOverflow answers to lower control methods where you cede more control of your code to AI. But that results in a loss of control which might not be comfortable, and it also might slow you down initially.

Personally, I'm very comfortable now in the "Medium Control" area using copilots (I use Cursor). I've established a good workflow where I can ask AI to make changes, quickly glance over the code and the diff it comes up with, apply it, and test it. I spot major issues the majority of the time (e.g. deleting large chunks of code) and feel like I move at least ~5x faster when using AI.

Now I'm trying to move down one more layer to Low Control methods by using OpenHands. I'm still figuring out the process for this, and I'll share more tips below. But I'm not totally sure I see the benefit of this yet unless you have a few very niche use cases.

Narrow Tasks, One at a Time

When working with AI, I take an extreme ownership approach. I.e. - if the LLM screws up, it's my fault for not giving it all the info it needed or explaining it in a way that it could understood the task. This is assuming

That I'm working with a ~SOTA model (like GPT, Claude, Gemini, etc)
That I'm asking it to do something it has knowledge of. E.g. I can't ask it to write me code for an API it's never seen before.

This mindset has paid off a lot. I've gotten AI to do solid work for me, but it took a lot of effort initially to figure out how to get the most out of it. I see lots of people complain online how "dumb" AI is and how it's basically useless. In my humble opinion, it's likely a skill issue with communication.

If you treat an LLM like an eager junior dev (yes, even Sonnet 3.5), then I think you'll be in the right frame of mind. The tasks you give it need to be narrowly defined, and the prompt needs to be detailed and give them proper context (or tell them where to find context).

If it screws up, then examine your prompt and the task. Maybe the task wasn't broken down enough. Maybe the context you provided wasn't enough. Or maybe you just need to update your prompt and append something like "Don't do <XYZ>. This path has been explored previously and didn't work out."

But take responsibility for it as if it's an employee you've hired. If it's not doing the job well, you need to figure out why and figure out how to communicate better.

Commit Often

In addition to narrowly defined tasks, you need to commit code OFTEN. I'm talking like obsessively. Give it a narrow task with a great prompt, examine its work, test it (or have it run tests), then have it commit the code if it's good and in a working state.

I've had times where I didn't commit things in their current working state because I just wanted AI to make "one more quick change" where it then proceeded to take a simple task and seriously overcomplicate it. The result is that it broke a few new things and I couldn't just reset back to the last commit and restart the chat with a new prompt.

AI Will Slow You Down (At First)

Like learning anything new, it'll slow you down at first. I can work pretty fast at this point in my life with the years of experience I have. So it's painful to have to slow down to 0.5x or even 0.1x sometimes to try and get AI involved and learn a new tool or try a new model.

But there are 2 possible outcomes:

You slow down initially, but you find ways to get a lot of leverage from the tool after working with it for a few days. You'll get a good sense as to whether or not it will actually help you speed up down the road. And if so, then keep pushing.
You slow down initially, and after working with it for a while, you realize the tool isn't great, the model isn't great (maybe not for what you're working on), or you just don't have the right use case for the tool yet. I.e. this is NOT something that will actually help you speed up and will end up being a distraction.

Just give stuff a chance. Your future likely depends on it. If you don't slow down now to experiment and figure things out, you'll miss out on huge gains that others are getting by using AI intelligently. But in order to use it intelligently, you need to start using it and figure out HOW to use it intelligently.

Claude Sonnet 3.5 is 👑

Similar to the real world where hiring an objectively better and more expensive engineer can be an amazing investment for your company, "hiring" a better (and usually more expensive) AI model is worth it.

When using OpenHands, I started out using o3-mini with high reasoning effort. It was ~cheaper than gpt-4o, and I think it's supposed to be better at coding. But for a reasoning model, I was surprised at how easily it got tripped by simple things.

Started using OpenHands from @allhands_ai last night.

(Basically open source Devin AI)

Great experience so far, but noticed a huge difference in performance between o3-mini (👎) and Sonnet 3.5 (❤️).

Like, massive difference.

o3-mini (with high reasoning) required LOTS of hand…
— Gabe (@gabebusto) February 13, 2025

I wrote an X post detailing a comparison between o3-mini and Sonnet 3.5, and Sonnet not only did better with coding tasks, keeping my codebase clean, and proposing a better setup. But it also navigated the file system better, checked for potentially duplicate files before doing its work to make sure it wasn't duplicating effort, and more.

Sonnet 3.5 is the default in Cursor. Lex Friedman considers Sonnet 3.5 to be the best at coding as well (from what I recall in a recent podcast). And OpenHands' docs mention Sonnet 3.5 as the recommended model. From what I recall on Reddit and X, many seem to hold Sonnet 3.5 in high regard (as well as Opus).

Opinions will differ, models will swap who is on top as they get training updates and improvements, but Claude's Sonnet in my opinion seems to have done really well at maintaining a consistently good reputation amongst developers.

CAVEAT: This is with regards to LLMs you can access via an API and use in a tool like Cursor and OpenHands. I've heard amazing things about o1-pro, but haven't shelled out the $200 to try it yet.

Final Thoughts

There's a loss of control when outsourcing tasks to AI. Start with tools and tasks that keep you in control the most (e.g. just use ChatGPT or Claude or Gemini via the chat app). Then move to copilots. Then move to full blown AI teammate. But BE CAREFUL. You need to inspect output from AI to avoid stupid mistakes.
Experiment with new AI models and tools; your future likely depends on it. It will slow you down at first, but the gains you get after learning to use a good tool will be worth it.
Be realistic about outcomes; it's not magic, it won't make you a "better" engineer (just lets you do more work), and if after some time using it you find it's not helpful or slowing you down, then try moving to another tool or model
Give it narrowly defined tasks and commit code often. If it screws up, it's your fault. Give it a better prompt and learn to communicate better with it.
Anthropic will likely get most of my money over GPT models via the API. And I can't wait to see what they release next.