The most important thing is to have a strong plan cycle in front of you agent work, if you do that, agents are very reliable. You need to have a deep research cycle that basically collects a covering set of code that might need to be modified for a feature, feeds it into gemini/gpt5 to get a broad codebase level understanding, then has a debate cycle on how to address it, with the final artifact being a hyper detailed plan that goes file by file and provides an outline of changes required.
Beyond this, you need to maintain good test coverage, and you need to have agents red-team your tests aggressively to make sure they're robust.
If you implement these two steps your agent performance will skyrocket. The planning phase will produce plans that claude can iterate on for 3+ hours in some cases, if you tell it to complete the entire task in one shot, and the robust test validation / change set analysis will catch agents solving an easier problem because they got frustrated or not following directions.
By that point I would have already produced the 20 line diff for the ticket. Huge commits (or change requests) are usually scaffolding, refactoring, or design changes to support new features. You also got generated code and verbose language like CSS. So stuff where the more knowledge you have about the code, the faster you can be.
The daily struggle was always those 10 line diffs where you have to learn a lot (from the stakeholder, by debugging, from the docs).
This may produce some successes, but it's so much more work than just writing the code yourself that it's pointless. This structured way of working with generative AI is so strict that there is no scaling it up either. It feels like years since this was established to be a waste of time.
If the goal is to start writing code not knowing much, it may be a good way to learn how and establish a similar discipline within yourself to tackle projects? I think there's been research that training wheels don't work either though. Whatever works and gets people learning to write code for real can't be bad, right?
What tends to get overlooked is the actual development speeds these projects achieve.
The PhiCode runtime for example - a complete programming language with code conversion, performance optimization, and security validation. It was built in 14 days. The commit history provides trackable evidence; manual development of comparable functionality would require months of work as a solo developer.
The "more work" claim doesn't hold up to measurement. AI generates code faster than manual typing while systematic constraints prevent the architectural debt that creates expensive refactoring cycles later. The 5-minute setup phase establishes foundations that enable consistent development throughout the project.
On scalability, the runtime demonstrates 70+ modules maintaining architectural consistency. The 150-line constraint forced modularization that made managing these components feasible - each remains comprehensible and testable in isolation. The approach scales by sharing core context (main entry points, configuration, constants, benchmarks) rather than managing entire codebases.
Teams can collaborate effectively under shared architectural constraints without coordination overhead.
This isn't about training wheels or learning syntax. The methodology treats AI as a systematic development partner focused on architectural thinking rather than ad-hoc prompting. AI handles syntax perfectly - the challenge lies in directing it toward maintainable, scalable solutions at production speed.
Previous attempts at structured AI collaboration may have failed, but this approach addresses specific failure modes through empirical measurement rather than theoretical frameworks.
The perceived 'strictness' provides flexibility within proven constraints. Developers retain complete freedom in implementation approaches, but the constraints prevent common pitfalls like monolithic files or tangled dependencies - like guardrails that keep you on the road.
The project examples and commit histories provide concrete evidence for these development speeds and architectural outcomes.
It's just a function of how much code you need to write, and how much un-interrupted time you have.
Editing this kind of configuration has far less cognitive load and loading time, so distractions aren't as destructive to the task as they are when coding. You can then also structure time so that productive agent coding can be happening while you're doing business critical tasks like meetings / calls etc.
I do think this is overkill though, and it's a bad plan and far too early to try and formalize The One Way To Instruct AI How To Code, but every advance is an opportunity to gain career traction so fair play.
The most important thing is to have a strong plan cycle in front of you agent work, if you do that, agents are very reliable. You need to have a deep research cycle that basically collects a covering set of code that might need to be modified for a feature, feeds it into gemini/gpt5 to get a broad codebase level understanding, then has a debate cycle on how to address it, with the final artifact being a hyper detailed plan that goes file by file and provides an outline of changes required.
Beyond this, you need to maintain good test coverage, and you need to have agents red-team your tests aggressively to make sure they're robust.
If you implement these two steps your agent performance will skyrocket. The planning phase will produce plans that claude can iterate on for 3+ hours in some cases, if you tell it to complete the entire task in one shot, and the robust test validation / change set analysis will catch agents solving an easier problem because they got frustrated or not following directions.
By that point I would have already produced the 20 line diff for the ticket. Huge commits (or change requests) are usually scaffolding, refactoring, or design changes to support new features. You also got generated code and verbose language like CSS. So stuff where the more knowledge you have about the code, the faster you can be.
The daily struggle was always those 10 line diffs where you have to learn a lot (from the stakeholder, by debugging, from the docs).
The most important you need always to do:
1. Plan, review the plan.
2. Review the code during changes before even it finish and fix ASAP you see drift.
3. Then again review
4. Add tests & use all quality tools don't rely 100% on LLM.
5. Don't trust LLM reviews for own produced code as it's very biased.
This is basic steps that you do as you like.
Avoid FULL AUTOMATED AGENT pipeline where you review the code only at the end unless it's very small task.
Modern agentic tools already draw up plans before implementation. Some even define "plan" and "build" agents: https://opencode.ai/docs/agents/#built-in
This may produce some successes, but it's so much more work than just writing the code yourself that it's pointless. This structured way of working with generative AI is so strict that there is no scaling it up either. It feels like years since this was established to be a waste of time.
If the goal is to start writing code not knowing much, it may be a good way to learn how and establish a similar discipline within yourself to tackle projects? I think there's been research that training wheels don't work either though. Whatever works and gets people learning to write code for real can't be bad, right?
What tends to get overlooked is the actual development speeds these projects achieve.
The PhiCode runtime for example - a complete programming language with code conversion, performance optimization, and security validation. It was built in 14 days. The commit history provides trackable evidence; manual development of comparable functionality would require months of work as a solo developer.
The "more work" claim doesn't hold up to measurement. AI generates code faster than manual typing while systematic constraints prevent the architectural debt that creates expensive refactoring cycles later. The 5-minute setup phase establishes foundations that enable consistent development throughout the project.
On scalability, the runtime demonstrates 70+ modules maintaining architectural consistency. The 150-line constraint forced modularization that made managing these components feasible - each remains comprehensible and testable in isolation. The approach scales by sharing core context (main entry points, configuration, constants, benchmarks) rather than managing entire codebases.
Teams can collaborate effectively under shared architectural constraints without coordination overhead.
This isn't about training wheels or learning syntax. The methodology treats AI as a systematic development partner focused on architectural thinking rather than ad-hoc prompting. AI handles syntax perfectly - the challenge lies in directing it toward maintainable, scalable solutions at production speed.
Previous attempts at structured AI collaboration may have failed, but this approach addresses specific failure modes through empirical measurement rather than theoretical frameworks.
The perceived 'strictness' provides flexibility within proven constraints. Developers retain complete freedom in implementation approaches, but the constraints prevent common pitfalls like monolithic files or tangled dependencies - like guardrails that keep you on the road.
The project examples and commit histories provide concrete evidence for these development speeds and architectural outcomes.
It's just a function of how much code you need to write, and how much un-interrupted time you have.
Editing this kind of configuration has far less cognitive load and loading time, so distractions aren't as destructive to the task as they are when coding. You can then also structure time so that productive agent coding can be happening while you're doing business critical tasks like meetings / calls etc.
I do think this is overkill though, and it's a bad plan and far too early to try and formalize The One Way To Instruct AI How To Code, but every advance is an opportunity to gain career traction so fair play.
[dead]