8 Jan 2026 • on | Productivity | Programming | AI |

RAISE: Evaluating if AI is helping me code

In 2026, as I hope to use AI coding tools, I am keen to follow a more disciplined approach to collect some data that I may be able to look back on after a few weeks or months to establish what value, if any, it brings me. Here’s how I am planning to do that.

I’ve been a programmer for many years now, and over the years, have moved from using C as my main programming language (when I was actually programming for most of the hours of my working day) to now using Ruby most of the time because I still enjoy programming and Ruby gives me probably the best overall outcome for the precious little time that I now get to program things.

This has also meant that I more often build smaller tools and utilities rather than full-fledged prouction systems. The advent and proliferation of generative AI for assistance in coding and programming tasks has been met with a wide range of reactions, ranging from doom predictions to derision to optimism. I find it necessary to tune out the noise and reflect on where AI has been helpful in what I do, and more importantly, if it’s actually saving me time in what I’m doing.

Background: where I am today

For now, I use Kiro most frequently, and a bit of Microsoft Copilot, but usually not for coding tasks. I’ve also used ChatGPT and Gemini a bit. For now, usage has been on small multi-hour activities rather than longer multi-week activities.

Anecdotally, I have observed that AI tools have helped me with these tasks while programming:

Snippets, e.g., a method that parses text with a regular expresion and extracts some parts of it. This gets all the more powerful when you can show it a sample of the text to parse and let it figure the rest out, and then point it to edge cases to tune the code/ expression
Adding a README based on the code
Explaining parts of the code and creating diagrams
Adding documentation to methods and files
Adding relatively standard features, e.g., export to database, create a rake task, parse some data, etc.

I’ve seen less than optimal results also:

When some code did not run as expected: on one occasion, I realised that the schema for the database was wrong, and was preventing the code from “passing” – the tool kept wanting to try completely different things to get it to work somehow. I had to prompt it to “stop” and fix the code.
Trying to repeat the same activity: I observed this while using Copilot to parse chat logs and airline itineraries. It worked fine once and for one input, and with exactly the same prompt, it gave me completely different outputs the next time it was used.
While programming, I observed that the Ruby Gemfile it created had really old versions of the gems in it. Of course, it’s likely based on what it was trained on, but that doesn’t stop me being surprised.
Interpreting the requirements is always tricky: in one case, it re-architected the system to meet a requirement that could have been done much more simply.

However, I must say that it appears that there are some benefits to using AI while programming, and there often are time savings. So, I do intend to continue to use AI more in 2026, but want to get more structured in evaluating if it helps and where it helps.

RAISE: Reasonable AI Savings Estimate

There are two questions that come up in doing an estimate on the effort/ time savings because of using AI:

Estimates are just that – estimates. I don’t always have an estimate for how long the activity will take, especially if I started the work via exploration.
Numerous people talk about Generative AI over-engineering the solution. I am not yet ready to pass judgment on that observation, but I did find that it often did more than was strictly necessary, e.g., I wanted to do something in batches of 1000 elements but the code was designed to have a parameter that defaulted to 1000 but could be passed even from the command line. If I implemented those features, it would take me longer to finish, so that savings were larger than if I had only implemented what was strictly necessary for the task. So, which savings are correct?

For this reason, I am trying to follow what I call RAISE: Reasonable AI Savings Estimate and this is how it works. After the work is done, I look at the output and determine:

How long did it take to get to exactly what has been done? This is basically how long I spent on it for prompting, evaluating the output, re-prompting, letting it run, changing some of the code, committing it to the repos, etc. If I did something else, then I try to reasonably remove that time.
Looking at the working solution now, I can see what meets my needs and what exceeds it, if anything. How much longer would it have taken me to implement the version without all the extra bits, e.g., hard-coding some values that I don’t expect will change, etc. I call this RAISE-1.
To get to exactly what has been done, now that I can see what was done and how, how much longer would it have taken me compared to the time I spent? This is a gut-feeling estimate on everything that was implemented, including things that I may not have implemented if I was doing the full implementation. I call this RAISE-2.

With these three numbers, I now get, as an example:

Actual time spent: 3hr
Implementing only what I wanted (actual + RAISE-1): 8hr
Implementing what was actually done (actual + RAISE-2): 20hr

So, I can say:

Savings of 5hr ~ 17hr for this activity
Minimally, it should be taken as 5hr savings

I would summarise it as: 3hr/8hr/20hr

Let’s say that one activity resulted in: 6hr/14hr/14hr – it would mean that everything that was done is what I would have eventually done. Sometimes, it may be more than what I thought I originally needed, but getting to the end, it feels like I would have eventually implemented it.

On something else, I might get it as: 3hr/1hr/2hr which would mean:

It took 3hr including AI prompting
Actually, I could have got to what I wanted in 1hr if I did it (so, I lost 2hr)
What was finally implemented would have taken me 2hr (so, even with that, I lost 1hr)

I use and often log time in Redmine on the different activities that I do. At the end of the comment for time logged, I intend to add text like this: ~^RAISE:+2.5/+3.5/T^~ to indicate that it saved me 2.5hr on what I needed to get done and a total of 3.5hr (including the extra items) and that it’s a T activity (T: Technical, N: Non-technical). This should be easy to extract and summarise in the future.

A sample collection of logs for a task is as below.

Note that these are all estimates and that’s the “R” and “E” refer to “Reasonable” and “Estimates” respectively. It depends on me looking back at the work and judging how much help it was. It’s not perfect and it’s not an exact science but it feels like a reasonable way to evaluate if the net impact is positive, and “worth it”.

There is probably an argument to be made that in reality, the version with more features might require more actual time (to implement, verify, document, etc.) and it’s not fair to keep the same “actual time” value for all cases. At this point, I am biased towards making progress in collecting this data, and letting that nuance pass. It might make sense in the future to see if a refinement is feasible and relevant.

The Good, Bad and Ugly Log

The other thing that I want to do is try to maintain a log of things that, in my experience with the tools I used, were done well, badly, and terribly. I hope that this will eventually give me a clearer idea of which things are best to automate with AI and which were quite bad at the time when I tried them. I don’t have a formal way or format for this, but right now, it’s just expected to be rows in a spreadsheet.

What do you think? Are you using something? If you have some comments, I’d love to hear from you. Feel free to connect or share the post (you can tag me as @onghu on X or on Mastodon as @onghu@ruby.social or @onghu.com on Bluesky to discuss more).