I am the person that wrote that. Sorry about the font. This is a bit outdated, AI stuff goes at high speed. More models so I will try to update that.
Every month so many new models come out. My new fav is GLM-4.5... Kimi K2 is also good, and Qwen3-Coder 480b, or 2507 instruct.. very good as well. All of those work really well in any agentic environment/in agent tools.
I made a context helper app ( https://wuu73.org/aicp ) which is linked to from there which helps jump back and forth from all the different AI chat tabs i have open (which is almost always totally free, and I get the best output from those) to my IDE. The app tries to remove all friction, and annoyances, when you are working with the native web chat interfaces for all the AIs. Its free and has been getting great feedback, criticism welcome.
It helps the going from IDE <----> web chat tabs. Made it for myself to save time and I prefer the UI (PySide6 UI so much lighter than a webview)
Its got Preset buttons to add text that you find yourself typing very often, per-project state saves of window size of app and which files were used for context. So next time, it opens at same state.
Auto scans for code files, guesses likely ones needed, prompt box that can put the text above and below the code context (seems to help make the output better). One of my buttons is set to: "Write a prompt for Cline, the AI coding agent, enclose the whole prompt in a single code tag for easy copy and pasting. Break the tasks into some smaller tasks with enough detail and explanations to guide Cline. Use search and replace blocks with plain language to help it find where to edit"
What i do for problem solving, figuring out bugs: I'm usually in VS Code and i type aicp in terminal to open the app. Fine tune any files already checked, type what i am trying to do or what problem i have to fix, click Cline button, click Generate Context!. Paste into GLM-4.5, sometimes o3 or o4-mini, GPT-5, Gemini 2.5 Pro.. if its a super hard thing i'll try 2 or 3 models. I'll look and see which one makes the most sense and just copy and paste into Cline in VS Code - set to GPT 4.1 which is unlimited/free.. 4.1 isn't super crazy smart or anything but it follows orders... it will do whatever you ask, reliably. AND, it will correct minor mistakes from the bigger model's output. The bigger smarter models can figure out the details, and they'll write a prompt that is a task list with how-to's and why's perfect for 4.1 to go and do in agent mode....
You can code for free this way unlimited, and its the smartest the models will be. Anytime you throw some tools or MCPs at a model it dumbs them down.... AND you waste money on all the API costs having to use Claude 4 for everything
(relevant self promotion) i wrote a cli tool called slupe that lets web based llm dictate fs changes to your computer to make it easier to do ai coding from web llms https://news.ycombinator.com/item?id=44776250
Small recommendation: The diagrams on [https://wuu73.org/aicp] are helpful, but clicking them does not display the full‑resolution images; they appear blurry. This occurs in both Firefox and Chrome. In the GitHub repository, the same images appear sharp at full resolution, so the issue may be caused by the JavaScript rendering library.
I was going to downvote you but you are adding to the discussion. In this context this is free from having to spend money. Many of us don't have the option to pay for models. We have to find some way to get the state of the art without spending our food money.
>We have to find some way to get the state of the art without spending our food money.
If it's not your job: Do we "have to" find this way? What's the oppotunity cost compared to a premium subscription or using not-state of the art tools?
If it is your job: it's putting food on the table. So it should be a relatively microscopic cost to doing business. Maybe even a tax write-off.
There is a company that is advertising like crazy for programmers, data scientists, etc. They are looking for college kids, etc. They are paying better than McDonalds.
What are they building? A training corpus.
Are people who responds to their ads getting the money for free?
Handing your codebase to an AI company is not nothing.
> Handing your codebase to an AI company is not nothing.
it's a battle that's already lost a long time ago. Every crappy little service by now indexes everything. If you ever touch Github, Jira, Datadog, Glean (god forbid), Upwork, etc etc they each have their own shitty little "AI" thing which means what? Your project has been indexed, bagged and tagged. So unless you code from a cave without using any saas tools, you will be indexed no matter what.
I appreciate your consideration, disagree != downvote.
To your point, "free from having to spend money" is exactly it. It's paid for with other things, and I get that some folks don't care. But being more open about this would be nice. You don't typically hide a monetary cost either, and everybody trying to do that is rightfully called out on it by being called a scam. Doing that with non-monetary costs would be a nice custom.
I don't trust any AI company not to use and monetise my data, regardless how much I pay or regardless what their terms of service say. I know full well that large companies ignore laws with impunity and no accountability.
I would encourage you to rethink this position just a little bit. Going through life not trusting any company isn't a fun way to live.
If it helps, think about those company's own selfish motivations. They like money, so they like paying customers. If they promise those paying customers (in legally binding agreements, no less) that they won't train on their data... and are then found to have trained on their data anyway, they wont just lose that customer - they'll lose thousands of others too.
Which hurts their bottom line. It's in their interest not to break those promises.
> they wont just lose that customer - they'll lose thousands of others too
No, they won't. And that's the problem in your argument. Google landed in court for tracking users in incognito mode. They also were fined for not complying with the rules for cookie popups. Facebook lost in court for illegally using data for advertising. Did it lose them any paying customer? Maybe, but not nearly enough for them to even notice a difference. The larger outcome was that people are now more pissed at the EU for cookie popups that make the greed for data more transparent. Also in the case of Google most money comes from different people than the ones that have their privacy violated, so the incentives are not working as you suggest.
> Going through life not trusting any company isn't a fun way to live
Ignoring existing problems isn't a recipe for a happy life either.
Landing in court is an expensive thing that companies don't want to happen.
Your examples also differ from what I'm talking about. Advertising supported business models have a different relationship with end users.
People getting something for free are less likely to switch providers over a privacy concern compared with companies is paying thousands of dollars a month (or more) for a paid service under the understanding that it won't train on their data.
>Landing in court is an expensive thing that companies don't want to happen.
"If the penalty is a fine, it's legal for the rich". These businesses also don't want to pay taxes or even workers, but in the end they will take the path of least resistence. if they determine fighting in court for 10 years is more profitable than following regulations, then they'll do it.
Until we start jailing CEO's (a priceless action), this will continue.
>companies is paying thousands of dollars a month (or more) for a paid service under the understanding that it won't train on their data.
Sure, but are we talking about people or companies here?
CEO says the action was against policy and they didn't know, so the blame passes down until you get to a scapegoat that can't defend themselves.
The underlying problem is that we have companies with more power than sovereign states, before you even include the power over the state the companies have.
At some point in the next few decades of continued transfer of wealth from workers to owners more and more workers will snap and bypass the courts. The is what happened with the original fall of feudalism and warlords. This wasn't guaranteed though -- if the company owners keep themselves and their allies rich enough they will be untouchable, same as drug lords.
>Going through life not trusting any company isn't a fun way to live.
Isn't that the Hacker mindset, though? We want to trailblaze solutions and share it with everyone for free. Always in liberty and oftentimes in beer too. I think it's a good mentality to have, precisely because of your lens of selfish motivations.
Wanting money is fine. If it was some flat $200 or even $2000 with legally binding promises that I have an indefinitely license to use this version of the software and they won't extract anything else from me: then fine. Hackers can be cheap, but we aren't opposed to barter.
But that's not the case. Wanting all my time and privacy and data under the veneer of something hackers would provide with no or very few strings is not. tricks to push into that model is all the worse.
> If they promise those paying customers (in legally binding agreements, no less) that they won't train on their data... and are then found to have trained on their data anyway, they wont just lose that customer - they'll lose thousands of others too.
I sure wish they did. In reality, they get a class action, pay off some $100m to lawyers after making $100b, and the lawyers maybe give me $100 if I'm being VERY generous, while the company extracted $10,000+ of value out of me. And the captured market just keeps on keeping on.
Sadly, this is not a land of hackers. It is a market of passive people of various walks of life: of students who do not understand what is going on under the hood (I was here when Facebook was taking off), of businsessmen too busy with other stuff to understand the sausage in the factory, of ordinary people who just wants to fire and forget. This market may never even be aware of what occurred here.
I live a pretty frugal life, and reached the FI part of FIRE in my early 30s as an averagely compensated software engineer.
I am very skeptical anytime something is 'free'. I specifically avoid using a free service when the company profits from my use of the service. These arrangements usually start mutually beneficial, and almost always become user hostile.
Why pay for something when you can get it for free? Because the exchange of money for service sets clear boundaries and expectations.
This sounds pedantic, but I think it's important to spell this out: this sort of stuff is only free if you consider what you're producing/exchanging for it to have 0 value.
If you consider what you're producing as valuable, you're giving it away to companies with an incentive to extract as much value from your thing as possible, with little regard towards your preferences.
If an idiot is convinced to trade his house for some magic beans, would you still be saying "the beans were free"?
I should add a section to the site/guide about privacy, just letting people know they have somewhat of a choice with that.
As for sharing code, most of the parts of a project/app/whatever have already been done and if an experienced developer hears what your idea is, they could just make it and figure it out without any code. The code itself doesn't really seem that valuable (well.. sometimes). Someone can just look at a screenshot of my aicodeprep app and just make one and make it look the same too.
Not all the time of course - If I had some really unique sophisticated algorithms that I knew almost no one else would or has figured out, I would be more careful.
Speaking of privacy.. a while back a thought popped into my head about Slack, and all these unencrypted chat's businesses use. It kinda does seem crazy to do all your business operations over unencrypted chat, Slack rooms.. I personally would not trust Zuckerberg to not look in there and run lots of LLMs through all the conversations to find anything 'good'! Microsoft.. kinda doubt would do that on purpose but what's to stop a rogue employee from finding out some trade secrets etc.. I'd be suprised if it hasn't been done. Security is not usually a priority in tech. They half-ass care about your personal info.
>Someone can just look at a screenshot of my aicodeprep app and just make one and make it look the same too.
To some extent. But without your codebase they will make different decisions in the back which will affect a myriad of factors. Some may actually be better than your app, others will end up adding tech debt or have performance impacts. And this isn't even to get into truly novel algorithms; sometimes just having the experience to make a scalable app with best practices can make all the difference.
Or the audience doesn't care and they take the cheaper app anyway. It's not always a happy ending.
I don't think that's true. It's not that has zero value, it's that it has zero monetizable value.
Hackernews is free. The posts are valuable to me and I guess my posts are valuable to me, but I wouldn't pay for it and I definitely don't expect to get paid.
For YC, you are producing content that is "valuable" that brings people to their site, which they monetize through people signing up for their program. They do this with no regard for what your preferences are when they choose companies to invest in.
They sell ads (Launch, Hire, etc.) against the attention that you create. You ARE the product on HackerNews, and you're OK with it. As am I.
Same as OpenAI, I dont need to monetize them training on my data, and I am happy for you to as I would like to use the services for free.
>Hackernews is free. The posts are valuable to me and I guess my posts are valuable to me, but I wouldn't pay for it and I definitely don't expect to get paid.
at this point, we may need future forums to be premium so we can avoid the deluge of AI bots plauging the internet. a small, one time cost is a guaranteed way to make such strategies untenable. SomethingAwful had a point decades ago.
But like any other business, you need to follow the money and understand the incentives. Hackernews has ads, but ads for companies with us as the audience. It's also indirectly an ad for YCombinator itself as bringing awareness of the accelerator (note what "hackernews.com" redirects to).
I'm fine with a company advertising itself; if I wasn't the idea of a company ceases to really function. And in this structure for companies, I can also get benefits by potentially getting jobs from here. So I don't mind that either. Everything aligns. I agree and support the structure. I can't say that about many other "free" websites.
As for me. I do want to monetize my data one day. I can't stop the scraping the entire internet over (that's for the courts), but I sure as heck won't hand it to them on a silver platter.
Definitely to each their own. I will never have a job at a YC company and I will also never apply to YC, so the ads are completely useless. I did discover some of my favorite shoes from an IG ad, though.
It wouldn't ever be worth me getting $.0001431 dollars for my data and individual data will always be worthless on it's own because 1. taking away one individuals data from a model does not make the model worse. 2. the price of an individuals data will always be zero because you have people like me who are willing to give it away for free in exchange for a free service (aka hackernews or IG)
One user's LTV on IG may be $34, but one user's data is worth $0. Which I think a lot of people struggle with.
From a more moral standpoint, the best part about the advertising business model is that it makes the internet open to everyone, not just those who can pay for every site they use.
I'm not sure if I'd ever have a job at YC (my industry isn't very "investor friendly"). But I like the idea of having a bunch of opportunities with such companies. It also encourages an environment of people I want to be around as well. So that indirectly serves my interests.
I will even use an ad example with conventions and festivals. You can argue an event like Comic-con is simply a huge ad. And it is. But I'm there "for the ad" in that case. It gathers other people "for the ad". It collectively benefits all of us to gather and socialize among one another.
Ads aren't bad, but many ads primarily exist to distract, not to facilitate an experience. And as a hot take, maybe we do need to gatekeep a bit more in this day and age. I don't want a "free intent" if it means 99% of my interactions are with bots instead of humans. If it means that corporations determine what is "worthy" of seeing instead of peers. If credit cards get to determine what I can spend my money on instead of my own personal (and legal) taste.
>It wouldn't ever be worth me getting $.0001431 dollars for my data and individual data will always be worthless on it's own
On top of being a software engineers who's contributed to millions on value with my data, I also strive to be an artist. An industry that has spent decades being extracted from but not as fortunate to be compensated a living wage most often. People can argue that "art is worthless" , yet it also props up multiple billion dollar industries on top of societal cultured. An artisan these days can even sustain themselves as a individual, with much faster turnaround than trying to program a website or app.
By all metrics, its hard to argue this sector's value is zero. Maybe having that lens only strengthened my stance, as a precursor to what software can become if you don't push against abuse early on.
I understand the point people are trying to make with this argument, but we are so far into a nearly universal scam economy where corporations see small (relative to their costs of business) fines as just part of normal expenses that I also think anyone who really believes the AI companies aren't using their data to train models, even if it is against their terms, is wildly naive.
This is not only a privacy concern (in fact, that might be a tiny part since the code might end up public anyway?).
There is an element of disclosure of personal data, there are ownership issues in case that code was not - in fact - going to be public and more.
In any case, not caring about the cost (at a specific time) doesn't make the cost disappear.
if you consider watching a hour of Youtube and 30 minutes of ads to be "free videos", then be my guest. Not everything can be measured in a dollar value.
Sophistry. "many" according to which statistic? And just because some people consider that a trade is very favorable for them, doesn't it is not a trade and it doesn't mean they are correct - who's so naïve they can beat business people at their own game?
Plenty of people can also afford to subscribe to these without any issue. They don’t even know the price, they probably won’t even cancel it when they stop using it as they might not even realize they have a subscription.
By your logic, are the paid plans not sometimes free?
Anecdotal, but Grok seems to have just introduced pretty restrictive rate limits. They’re now giving free users access to Grok 4 with a low limit and then making it difficult to manually switch to Grok 3 and continue. Will only allow a few more requests before pushing an upgrade to paid plans. Just started happening to me last night.
look up LLM7, and Pollinations AI. Both offer free GPT 4.1, but I am not sure how limited it is. They have tons more models but the names are different (openai-large = gpt-4.1)
Meta has free and generous APIs for the crappy Llama 4 models... they're okay at summarizing things but I have no idea if its any good for code. Prob not since no one even talks about those anymore.
FYI: the first AI you link to, " z.ai's GLM 4.5", actually links to zai.net, which appears to be a news site, instead of "chat.z.ai", which is what I think you intended.
oops. was using AI trying to fix some of the bugs and update it real fast with some newer models, since this post was trending here. Hopefully its scrolling better. Link fixed. I know its still ridiculous looking with some of the page but at least its readable for now.
Note that the website is scrolling very slow, sub1-fps on Firefox Android. I'm also unable to scroll the call-out about grok. Also, there's this strange large green button reading CSS loaded at the top.
I would be very interested in an in dept of your experiences of differences between Roo Code and Cline if you feel you can share that. I've only tried Roo Code (with interesting but mixed results) thus far.
Not sure if GLM-4.5 Air is good, but non-Air one is fabulous. I know for free API access there is pollinations ai project. Also llm7. If you just use the web chat's you can use most of the best models for free without API. There are ways to 'emulate' an API automatically.. I was thinking about adding this to my aicodeprep-gui app so it could automatically paste and then cut. Some MCP servers exist that you can use and it will automatically paste or cut from those web chat's and route it to an API interface.
OpenAI offers free tokens for most models, 2.5mil or 250k depending on model. Cerebras has some free limits, Gemini... Meta has plentiful free API for Llama 4 because.. lets face it, it sucks, but it is okay/not bad for stuff like summarizing text.
If you really wanted to code for exactly $0 you could use pollinations ai, in Cline extension (for VS Code) set to use "openai-large" (which is GPT 4.1). If you plan using all the best web chat's like Kimi K2, z.ai's GLM models, Qwen 3 chat, Gemini in AI Studio, OpenAI playground with o3 or o4-mini. You can go forever without being charged money. Pollinations 'openai-large' works fine in Cline as an agent to edit files for you etc.
I built a relevant tool (approved by Apple this week) which may help reduce the friction of you having to constantly copy paste text between your app and the AI assistant in browser.
It's called SelectToSearch and it reduces my friction by 85% by automating all those copy paste etc actions with a single keyboard shortcut:
And to anyone who has ever used it, it appears more like opening smoothbrain.
For a long time it was the only allowed model at work and even for basic cyber security questions it was sometimes completely useless.
It’s really hit and miss for me. Well defined small tasks seem ok. But every time I try some “agentic coding”, it burns through millions of tokens without producing anything working.
My experience lines up with the article. The agentic stuff only works with the biggest models. (Well, "works"... OpenAI Codex took 200 requests with o4-mini to change like 3 lines of code...)
For simple changes I actually found smaller models better because they're so much faster. So I shifted my focus from "best model" to "stupidest I can get away with".
I've been pushing that idea even further. If you give up on agentic, you can go surgical. At that point even 100x smaller models can handle it. Just tell it what to do and let it give you the diff.
Also I found the "fumble around my filesystem" approach stupid for my scale, where I can mostly fit the whole codebase into the context. So I just dump src/ into the prompt. (Other people's projects are a lot more boilerplatey so I'm testing ultra cheap models like gpt-oss-20b for code search. For that, I think you can go even cheaper...)
Aider as a non-agentic coding tool strikes a nice balance on the efficiency vs effectiveness front. Using tree-sitter to create a repo map of the repository means less filesystem digging. No MCP, but shell commands mean it can use utilities I myself am familiar with. Combined with Cerebras as a provider, the turnaround on prompts is instant; I can stay involved rather than waiting on multiple rounds of tool calls. It's my go-to for smaller scale projects.
It's a shame MCP didn't end up using a sandboxed shell (or something similar, maybe even simpler.) All the pre-MCP agents I built just talked to the shell directly since the models are already trained to do that.
I am developing the same opinion. I want something fast and dependable. Getting into a flow state is important to me, and I just can't do that when I'm waiting for an agentic coding assistant to terminate.
I'm also interested in smaller models for their speed. That, or a provider like Cerebras.
Then, if you narrow the problem domain you can increase the dependability. I am curious to hear more about your "surgical" tools.
well, most of the time, I just dump the entire codebase in if the context window is big and its a good model. But there are plenty of times when I need to block one folder in a repo or disable a few files because the files might "nudge" it in a wrong direction.
The surgical context tool (aicodeprep-gui) - there are at least 30 similar tools but most (if not all) are CLI only/no UI. I like UIs, I work faster with them for things like choosing individual files out of a big tree (at least it is using PySide6 library which is "lite" (could go lighter maybe), i HATE that too many things use webview/browsers. All the options on it are there for good reasons, its all focused on things that annoy me..and slow things down: like doing something repeatedly (copy paste copy paste or typing the same sentence over and over every time i have to do a certain thing with the AI and my code.
If you have not run 'aicp' (the command i gave it, but also there is a OS installer menu that will add a Windows/Mac/Linux right click context menu in their file managers) in a folder before, it will try to scan recursively to find code files, but it skips things like node_modules or .venv. but otherwise assumes most types of code files will probably be added so it checks them. You can fine tune it, add some .md or txt files or stuff in there that isn't code but might be helpful. When you generate the context block it puts the text inside the prompt box on the top AND/OR bottom - doing both can get better responses from AI.
It saves every file that is checked, and saves the window size, other window prefs, so you don't have to resize the window again. It saves the state of which files are checked so its less work / time next time. I have been just pasting the output from the LLMs into an agent like Cline but I am wondering if I should add browser automation / browser extension that does the copy pasting and also add option to edit / change files right after grabbing the output from a web chat. Its probably about good enough as it is though, not sure I want to make it into a big thing.
---
Yeah I just keep coming back to this workflow, its very reliable. I have not tried Claude Code yet but I will soon to see if they solved any of these problems.
Strange this thing has been at the top of hacker news for hours and hours.. weird! My server logs are just constant scrolling
- https://ferdium.org - I open all the LLM webapps here as separate "apps", my one place to go to talk with LLMs, without mixing it with regular browsing
- https://www.cherry-ai.com - chat API frontend, you can use it instead of the default webpages for services which give you free API access - Google, OpenRouter, Chutes, Github Models, Pollinations, ...
I really recommend trying a chat API frontend, it really simplifies talking with multiple models from various providers in a unified way and managing those conversations, exporting to markdown, ...
For those who don't know, OpenAI Codex CLI will now work with your ChatGPT plus or pro account. They barely announced it but it's on their github page. You don't have to use an api key.
I agree. I find even Haiku good enough at managing the flow of the conversation and consulting larger models - Gemini 2.5 Pro or GPT-5 - for programming tasks.
Last few days I am experimenting with using Codex (via MCP ${codex mcp}) from Gemini CLI and it works like a charm. Gemini CLI is mostly using Flash underneath but this is good enough for formulating problems and re-evaluating answers.
Same with Claude Code - I am asking (via MCP) for consulting with Gemini 2.5 Pro.
Never had much success of using Claude Code as MCP though.
The original idea comes of course from Aider - using main, weak and editor models all at once.
I use a 500 million parameter model for editor completions because I want those to nearly instantaneous and the plugin makes 50+ completion requests every session.
What editor do you use, and how did you set it up? I've been thinking about trying this with some local models and also with super low-latency ones like Gemini 2.5 Flash Lite. Would love to read more about this.
Neovim with the llama.cpp plugin and heavily quantized qwen2.5-coder with 500 (600?) million parameters. It's almost plug and play although the default ring context limit is way too large if you don't have a GPU.
They don't allow model switching below GPT-5 in codex cli anymore (without API key), because it's not recommended. Try it with thinking=high and it's quite an improvement from o4-mini. o4-mini is more like gpt-5-thinking-mini but they don't allow that for codex. gpt-5-thinking-high is more like o1 or maybe o3-pro.
Maybe optimistic, but reading posts like this makes me hopeful that AI-assisted coding will drive people to design more modular and sanely organized code, to reduce the amount of context required for each task. Sadly pretty much all code I have worked with have been giant messes of everything being connected to everything else, causing the entire project to be potential context for anything.
Guess the name: In 2015 I was preaching that ____ simplfies the mental model of your web app, makes everything performant, the api will dominate and endure for decades. Answer: React. Soon, we were feeling the induced demand for features and timelines till we regressed to our natural "hair on fire" state. AI is (not just) a better footgun.
That's always how it works no matter how good the model is. I'm surprised people keep forgetting this. If no one has the theory then the artifacts are almost unmaintainable.
You can end up doing this with entirely human written code too. Good software devs can see it from a mile away.
It's really very good at that. Frequently, I'll have something I've been working on over the years that has turned into an interconnected mess. "Split this code into modules of separated concerns". Bam, done. I used Claude for the first time last week and gave it a 2k line PowerShell script and it neatly pulled it apart into 5 working modules on the first try. Worked exactly the same, and ended up with better comments too.
So I've done that sort of refactoring a lot, albeit on real code in much bigger systems, not a script. Lots of coders won't do this, they'll just keep adding to the crap, crazy big module.
I always end up with a vastly smaller code base. Like 2000 lines turns into 800 lines or something like that.
Did that happen too or did the AI just do a glorified 'extract method', that any decent IDE can already do without AI?
I use AI, I'm not anti it, but on the other hand I keep seeing these gushing posts where I'm like 'but your ide could already do that, just click the quick refactoring button'.
It ended up being less, yeah, but not by that much, maybe 15%. Thing is, there was no "extract methods" possible. It was an old user script creation tool that had been modified over the last 10 years. If it were that easy, I would have just done it myself.
What this shows me is that it truly understands all the things this script was supposed to do and was able to organize it better, while not breaking any functionality.
These tricks are a little too much for me. I'd rather just write the code myself instead of opening 20 tabs with different LLM chats each.
However, I'd like to mention a tool called repomix (https://repomix.com/), which will pack your code into a single file that can be fed to an LLM's web chat. I typically feed it to Qwen3 Coder or AI Studio with good results.
I think there’s huge potential for a fully local “Cursor-like” stack — no cloud, no API keys, just everything running on your machine.
The setup could be:
• Cursor CLI for agentic/dev stuff (example:https://x.com/cursor_ai/status/1953559384531050724)
• A local memory layer compatible with the CLI — something like LEANN (97% smaller index, zero cloud cost, full privacy, https://github.com/yichuan-w/LEANN) or Milvus (though Milvus often ends up cloud/token-based)
• Your inference engine, e.g. Ollama, which is great for running OSS GPT models locally
With this, you’d have an offline, private, and blazing-fast personal dev+AI environment. LEANN in particular is built exactly for this kind of setup — tiny footprint, semantic search over your entire local world, and Claude Code/ Cursor –compatible out of the box, the ollama for generation. I guess this solution is not only free but also does not need any API.
But I do agree that this need some effort to set up, but maybe someone can make these easy and fully open-source
Yeah, this seems a really fantastic summary of our ideal local AI stack. A powerful, private memory layer has always felt like the missing piece for tools like Cursor or aider.
The idea of this tiny, private index like what the LEANN project describes, combined with local inference via Ollama, is really powerful. I really like this idea about using it in programming, and a truly private "Cursor-like" experience would be a game-changer.
it might be free, private, blazing fast (if you choose a model with appropriate parameters to match your GPU).
but you'll quickly notice that it's not even close to matching the quality of output, thought and reflecting that you'd get from running the same model but significantly high parameter count on a GPU capable of providing over 128gb of actual vram.
There isn't anything available locally that will let me load a 128gb model and provide anything above 150tps
The only thing that local ai model makes sense for right now seems to be Home Assistant in order to replace your google home/alexis.
happy to be proven wrong, but the effort to reward just isn't there for local ai.
Because most of the people squeezing that highly quantized small model into their consumer gpu don't get how they have left no room for the activation weights, and are stuck with a measly small context.
If you're looking for free API access, Google offers access to Gemini for free, including for gemini-2.5-pro with thinking turned on. The limit is... quite high, as I'm running some benchmarking and haven't hit the limit yet.
Open weight models like DeepSeek R1 and GPT-OSS are also made available with free API access from various inference providers and hardware manufacturers.
I'm getting consistently good results with Gemini CLI and the free 100 requests per day and 6 million tokens per day.
Note that you'll need to either authorize with a Google Account or with an API key from AI Studio, just be sure the API key is from an account where billing is disabled.
Also note that there are other rate limits for tokens per request and tokens per minute on the free plan that effectively prevent you from using the whole million token context window.
It's good to exit or /clear frequently so every request doesn't resubmit your entire history as context or you'll use up the token limits long before you hit 100 requests in a day.
I think it'll be hard to find a LLM that actually respects your privacy regardless whether or not you pay. Even with the "privacy" enterprise Co-Pilot from Microsoft with all their promises of respecting your data, it's still not deemed safe enough by leglislation to be used in part of the European energy sector. The way we view LLM's on any subscription is similar to how I imagine companies in the USA views Deepseek. Don't put anything into them you can't afford to share with the world. Of course with the agents, you've probably given them access to everything on your disk.
Though to be fair, it's kind of silly how much effort we go through to protect our mostly open source software from AI agents, while at the same time, half our OT has build in hardware backdoors.
I don't care. From what I understand of LLM training, there's basically 0 chance a key or password I might send it will ever be regurgitated. Do you have any examples of an LLM actually doing anything like this?
I agree, Google is definitely the champion of respecting your privacy. Will definitely not train their model on your data if you pay them. I mean you should definitely just film yourself and give them everything, access to your files, phone records, even bank accounts. Just make sure to pay them those measly $200 and absolutely they will not share that data with anybody.
You're thinking of Facebook. A lot of companies run on Gmail and Google Docs (easy to verify with `dig MX [bigco].com`), and they would not if Google shared that data with anybody.
It’s not really in either Meta or Google’s interests to share that data. What they do is to build super detailed profiles of you and what you’re likely to click on, so they can charge more money for ad impressions.
> When you use AI in web chat's (the chat interfaces like AI Studio, ChatGPT, Openrouter, instead of thru an IDE or agent framework) are almost always better at solving problems, and coming up with solutions compared to the agents like Cline, Trae, Copilot.. Not always, but usually.
I completely agree with this!
While I understand that it looks a little awkward to copy and paste your code out of your IDE and into a web chat interface, I generally get better results that way than with GitHub copilot or cursor.
Either agentic with access to your whole project, “lives” in GitHub, a fine tune, or RAG, or whatever… having access to all of the context drastically reduces hallucinations.
There is a big difference between “write x” and “write x for me in my style, with all y dependencies, and considering all z code that exists around it”.
I’m honestly not understand a defense of copy and paste AI coding… this is why agents are so massively popular right now.
Agreed that it’s all about context — but my experience is that pasting into web chat allows me to manage context much more than if I drop the whole project/whole filesystem into context. With the latter approach the results tend to be hit-and-miss as the model tries to guess what’s right. All about context!
Why is Mistral not mentioned. Is there any reason? I have the impression that they are often ignored by media, bloggers, devs when it comes to comparing or showcasing LLM thingies.
Comes with free tier and quality is quite good. (But I am not an AI power user)
https://chat.mistral.ai/chat
To the OP: I highly recommend you look into Continue.dev and ollama/lmstudio and running models on your own. Some of them are really good at autocomplete-style suggestions while others (like gpt-oss) can reason and use tools.
really - no monthly subscriptions? i hate those but i am fine with bringing my own API URLs etc and paying. I'm building a router that will track all the free tokens from all the different providers and auto rotate them when daily tokens or time limits run out.
Continue and Zed.. gonna check them out, prompts in Cline are too long. I was thinking of just making my own VS Code extension but I need to try Claude Code with GLM 4.5 (heard it pairs nicely)
Wow, there's a lot here that I didn't know about. Just never drilled that far into the options presented. For a change, I'm happy that I read the article rather than only the comments on HN. ;)
And lots of helpful comments here on HN as well. Good job everyone involved. ;)
It’s not free FREE but if you deposit at least $10 on OpenRouter, you can use their free models without credit withdrawals. And those models are quite powerful, like DeepSeek R1. Sometimes, they are rate limited by the provider due to their popularity but it works in a pinch.
Glad to see I'm not the only one who prefers to work like that. I don't need many different models though, the free version of Gemini 2.5 Pro is usually enough for me. Especially the 1.000.000 token context length is really useful. I can just keep dumping full code merges in.
I'll have a look at the alternatives mentioned though. Some questions just seem to throw certain models into logic loops.
You mean SWE-1? I used it like a dozen times and I gave up because the responses were so bad. Not even sure whether it’s good enough for autocomplete because it’s the slowest model I’ve tested in a while.
Not my experience for slowness. For smartness I am typically using it for simple "not worth looking that up" stuff rather than even feature implementation. Got it to write some MySQL SQL today, for example.
As the post says, the problem with coding agents is they send a lot of their own data + almost your entire code base for each request: that's what makes them expensive. But when used in a chat the costs are so low as to be insignificant.
I only use OpenRouter which gives access to almost all models.
Sonnet was my favorite until I tried Gemini 2.5 Pro, which is almost always better. It can be quite slow though. So for basic questions / syntax reminders I just use Gemini Flash: super fast, and good for simple tasks.
The answers would be similar to the question "why is Javascript so popular". It was not fast to run, not safe, not optimized and poor in most areas except for being almost universal and having results faster either due to js developers availability, or due to it being a high level language, even if it did try to multiply a "dog" string by 2 sometimes in some spaghetti codebase. It got better, but even before that this formula was "delivery > quality". It's also why almost no one writes assembly for production. Or C, and we get tons of bloated electron apps.
(If it was not clear, I have no love for JS and I never really programmed in it, but you have to admit, it did allow us to have more stuff. Even if 99% of it should be torched by fire if evaluated purely from engineering perspective)
I bet it's crazy to some people that others okay with giving up so much of their data for free tiers. Like yeah it's better to selfhost but it takes so much resources to run good enough LLM at home that I'd rather give up my code for some free usage, anyway that code eventually will end up open source
Nice write-up, especially the point about mixing different models for different stages of coding.
I’ve been tracking which IDE/CLI tools give free or semi-free access to pro-grade LLMs (e.g., GPT-5, Claude code, Gemini 2.5 Pro) and how generous their quotas are. Ended up putting them side-by-side so it’s easier to compare hours, limits, and gotchas: https://github.com/inmve/free-ai-coding
Without tricks google aistudio definitely has limits, though pretty high ones. gemini.google.com on the other hand has less than a handful of free 2.5 pro messages for free
Also, well, I mean... If there's all that time/effort involved... Just get yourself some tea, coffee, doodle on some piece of paper, do some push-ups, some yoga, prey, meditate, breathe and then... Code, lol!
> Beta technology disclaimer
> Rovo Dev in the CLI is a beta product under active development. We can only support a certain number of users without affecting the top-notch quality and user experience we are known for providing. Once we reach this limit, we will create a waiting list and continue to onboard users as we increase capacity. This product is available for free while in beta.
Qwen3-Coder-30B-A3B-Instruct-FP8 is a good choice ('qwen3-coder:30b' when you use ollama). I have also had good experiences with https://mistral.ai/news/devstral (built under a collaboration between Mistral AI and All Hands AI)
DeepSeek Coder 33B or Llama 3 70B with GGUF quantization (Q4_K_M) would be optimal for your specs, with Mistral Large 2 providing the best balance of performance and resource usage.
I only use LLMs as a substitute for stackexchange, and sometimes to write boilerplate code. The free chat provided by deepseek works very well for me, and I've never encountered any usage limits. V3 / R1 are mostly sufficient. When I need something better (not very often), I use Claude's free tier.
If you really need another model / a custom interface, it's better to use openrouter: deposit $10 and you get 1000 free queries/day across all free models. That $10 will be good for a few months, at the very least.
Now all we need is a wrapper/UI/manager/aggregator for all these "free" AI tools/pages so that we can use them without going into the hassle of changing tabs ;-)
I don't like or love many things in life, but something about AI triggered that natural passion I had when I was first learning to code as a kid. Its just super fun. Coding without AI stopped being fun looong time ago. Unlucky brain or genetics maybe. AI sped up the dopamine feedback iteration loop to where my brain can really feel it again. I can get an idea in my head and just an hour later, have it 80% done and functioning. That gives me motivation, I won't get bored of the idea before I write the code.. which is what would happen a lot. Halfway done, get bored, then don't wanna continue.. AI fixed that
I jump between Claude Sonnet 4 on GitHub Copilot Pro and now GPT-5 on ChatGPT. That seems to get me pretty far. I have gpt-oss:20b installed with ollama, but haven't found a need to use it yet, and it seems like it just takes too long on an M1 Max MacBook Pro 64GB.
Claude Sonnet 4 is pretty exceptional. GPT-4.1 asks me too frequently if it wants to move forward. Yes! Of course! Just do it! I'll reject your changes or do something else later. The former gets a whole task done.
I wonder if anyone is getting better results, or comparable for cheaper or free. GitHub Copilot in Visual Studio Code is so good, I think it'd be pretty hard to beat, but I haven't tried other integrated editors.
Let's just be honest about what it is we actually do: The more people maximize what they can get for free, the more other people will have to shoulder the higher costs or limitations that follow. That's completely fine, not trying to pass judgement – but that's certainly not "free" unless you mean exactly "free for me, somebody else pays".
OpenAI offering 2.5M free tokens daily small models and 250k for big ones (tier 1-2) is so useful for random projects, I use them to learn japanese for example (by having a program that list informations about what the characters are just saying: vocabulary, grammar points, nuances).
You probably only saw the cursive backup font and not the intended Roboto font that the page actually uses (which took about half a second to load for me); here's a snippet from the CSS:
Not all browsers have a reader mode. And for the record I usually don’t like these design-related comments, but it was literally unreadable. Like some sort of 17th century chicken scratch cursive.
For anyone else confused - there is a page 2 and 3 in the post that you need to access via arrow thing at bottom.
I am the person that wrote that. Sorry about the font. This is a bit outdated, AI stuff goes at high speed. More models so I will try to update that.
Every month so many new models come out. My new fav is GLM-4.5... Kimi K2 is also good, and Qwen3-Coder 480b, or 2507 instruct.. very good as well. All of those work really well in any agentic environment/in agent tools.
I made a context helper app ( https://wuu73.org/aicp ) which is linked to from there which helps jump back and forth from all the different AI chat tabs i have open (which is almost always totally free, and I get the best output from those) to my IDE. The app tries to remove all friction, and annoyances, when you are working with the native web chat interfaces for all the AIs. Its free and has been getting great feedback, criticism welcome.
It helps the going from IDE <----> web chat tabs. Made it for myself to save time and I prefer the UI (PySide6 UI so much lighter than a webview)
Its got Preset buttons to add text that you find yourself typing very often, per-project state saves of window size of app and which files were used for context. So next time, it opens at same state.
Auto scans for code files, guesses likely ones needed, prompt box that can put the text above and below the code context (seems to help make the output better). One of my buttons is set to: "Write a prompt for Cline, the AI coding agent, enclose the whole prompt in a single code tag for easy copy and pasting. Break the tasks into some smaller tasks with enough detail and explanations to guide Cline. Use search and replace blocks with plain language to help it find where to edit"
What i do for problem solving, figuring out bugs: I'm usually in VS Code and i type aicp in terminal to open the app. Fine tune any files already checked, type what i am trying to do or what problem i have to fix, click Cline button, click Generate Context!. Paste into GLM-4.5, sometimes o3 or o4-mini, GPT-5, Gemini 2.5 Pro.. if its a super hard thing i'll try 2 or 3 models. I'll look and see which one makes the most sense and just copy and paste into Cline in VS Code - set to GPT 4.1 which is unlimited/free.. 4.1 isn't super crazy smart or anything but it follows orders... it will do whatever you ask, reliably. AND, it will correct minor mistakes from the bigger model's output. The bigger smarter models can figure out the details, and they'll write a prompt that is a task list with how-to's and why's perfect for 4.1 to go and do in agent mode....
You can code for free this way unlimited, and its the smartest the models will be. Anytime you throw some tools or MCPs at a model it dumbs them down.... AND you waste money on all the API costs having to use Claude 4 for everything
(relevant self promotion) i wrote a cli tool called slupe that lets web based llm dictate fs changes to your computer to make it easier to do ai coding from web llms https://news.ycombinator.com/item?id=44776250
Small recommendation: The diagrams on [https://wuu73.org/aicp] are helpful, but clicking them does not display the full‑resolution images; they appear blurry. This occurs in both Firefox and Chrome. In the GitHub repository, the same images appear sharp at full resolution, so the issue may be caused by the JavaScript rendering library.
Another data point: On Android Chrome they render without problem.
thx - i did not know that. Will try to fix.
> You can code for free this way
vs
> If you set your account's data settings to allow OpenAI to use your data for model training
So, it's not "for free".
I was going to downvote you but you are adding to the discussion. In this context this is free from having to spend money. Many of us don't have the option to pay for models. We have to find some way to get the state of the art without spending our food money.
>We have to find some way to get the state of the art without spending our food money.
If it's not your job: Do we "have to" find this way? What's the oppotunity cost compared to a premium subscription or using not-state of the art tools?
If it is your job: it's putting food on the table. So it should be a relatively microscopic cost to doing business. Maybe even a tax write-off.
There is a company that is advertising like crazy for programmers, data scientists, etc. They are looking for college kids, etc. They are paying better than McDonalds.
What are they building? A training corpus.
Are people who responds to their ads getting the money for free?
Handing your codebase to an AI company is not nothing.
> Handing your codebase to an AI company is not nothing.
it's a battle that's already lost a long time ago. Every crappy little service by now indexes everything. If you ever touch Github, Jira, Datadog, Glean (god forbid), Upwork, etc etc they each have their own shitty little "AI" thing which means what? Your project has been indexed, bagged and tagged. So unless you code from a cave without using any saas tools, you will be indexed no matter what.
I feel like this was understood. SaaS has your data, and the pan is very hot. Two lessons that learn quickly with experience.
I appreciate your consideration, disagree != downvote.
To your point, "free from having to spend money" is exactly it. It's paid for with other things, and I get that some folks don't care. But being more open about this would be nice. You don't typically hide a monetary cost either, and everybody trying to do that is rightfully called out on it by being called a scam. Doing that with non-monetary costs would be a nice custom.
I don't trust any AI company not to use and monetise my data, regardless how much I pay or regardless what their terms of service say. I know full well that large companies ignore laws with impunity and no accountability.
I would encourage you to rethink this position just a little bit. Going through life not trusting any company isn't a fun way to live.
If it helps, think about those company's own selfish motivations. They like money, so they like paying customers. If they promise those paying customers (in legally binding agreements, no less) that they won't train on their data... and are then found to have trained on their data anyway, they wont just lose that customer - they'll lose thousands of others too.
Which hurts their bottom line. It's in their interest not to break those promises.
> they wont just lose that customer - they'll lose thousands of others too
No, they won't. And that's the problem in your argument. Google landed in court for tracking users in incognito mode. They also were fined for not complying with the rules for cookie popups. Facebook lost in court for illegally using data for advertising. Did it lose them any paying customer? Maybe, but not nearly enough for them to even notice a difference. The larger outcome was that people are now more pissed at the EU for cookie popups that make the greed for data more transparent. Also in the case of Google most money comes from different people than the ones that have their privacy violated, so the incentives are not working as you suggest.
> Going through life not trusting any company isn't a fun way to live
Ignoring existing problems isn't a recipe for a happy life either.
Landing in court is an expensive thing that companies don't want to happen.
Your examples also differ from what I'm talking about. Advertising supported business models have a different relationship with end users.
People getting something for free are less likely to switch providers over a privacy concern compared with companies is paying thousands of dollars a month (or more) for a paid service under the understanding that it won't train on their data.
>Landing in court is an expensive thing that companies don't want to happen.
"If the penalty is a fine, it's legal for the rich". These businesses also don't want to pay taxes or even workers, but in the end they will take the path of least resistence. if they determine fighting in court for 10 years is more profitable than following regulations, then they'll do it.
Until we start jailing CEO's (a priceless action), this will continue.
>companies is paying thousands of dollars a month (or more) for a paid service under the understanding that it won't train on their data.
Sure, but are we talking about people or companies here?
> Until we start jailing CEO's (a priceless action)
In the context of the original thread here: If all you need to do is go to jail then whatever that's for was "for free"!
CEO says the action was against policy and they didn't know, so the blame passes down until you get to a scapegoat that can't defend themselves.
The underlying problem is that we have companies with more power than sovereign states, before you even include the power over the state the companies have.
At some point in the next few decades of continued transfer of wealth from workers to owners more and more workers will snap and bypass the courts. The is what happened with the original fall of feudalism and warlords. This wasn't guaranteed though -- if the company owners keep themselves and their allies rich enough they will be untouchable, same as drug lords.
>Going through life not trusting any company isn't a fun way to live.
Isn't that the Hacker mindset, though? We want to trailblaze solutions and share it with everyone for free. Always in liberty and oftentimes in beer too. I think it's a good mentality to have, precisely because of your lens of selfish motivations.
Wanting money is fine. If it was some flat $200 or even $2000 with legally binding promises that I have an indefinitely license to use this version of the software and they won't extract anything else from me: then fine. Hackers can be cheap, but we aren't opposed to barter.
But that's not the case. Wanting all my time and privacy and data under the veneer of something hackers would provide with no or very few strings is not. tricks to push into that model is all the worse.
> If they promise those paying customers (in legally binding agreements, no less) that they won't train on their data... and are then found to have trained on their data anyway, they wont just lose that customer - they'll lose thousands of others too.
I sure wish they did. In reality, they get a class action, pay off some $100m to lawyers after making $100b, and the lawyers maybe give me $100 if I'm being VERY generous, while the company extracted $10,000+ of value out of me. And the captured market just keeps on keeping on.
Sadly, this is not a land of hackers. It is a market of passive people of various walks of life: of students who do not understand what is going on under the hood (I was here when Facebook was taking off), of businsessmen too busy with other stuff to understand the sausage in the factory, of ordinary people who just wants to fire and forget. This market may never even be aware of what occurred here.
This is so naive
Hm why pay for something when I can get it for free? Being miserly is a skill that can save a lot of money.
I live a pretty frugal life, and reached the FI part of FIRE in my early 30s as an averagely compensated software engineer.
I am very skeptical anytime something is 'free'. I specifically avoid using a free service when the company profits from my use of the service. These arrangements usually start mutually beneficial, and almost always become user hostile.
Why pay for something when you can get it for free? Because the exchange of money for service sets clear boundaries and expectations.
Remember: if you're not paying for the product, you ARE the product.
If you're fine with compromising your privacy and having others extract wealth from you, you can go the "free" route.
You are the product no matter how much you pay tbh
I built a simple little CRUD app for somebody the other day. They were very appreciative of the free app. So they bought me a pizza.
I got a free pizza just for coding a little app. That saved me a lot of money.
Many folks, especially if they are into getting things free, don't really care much about privacy narrative.
So yes, it is free.
> So yes, it is free.
This sounds pedantic, but I think it's important to spell this out: this sort of stuff is only free if you consider what you're producing/exchanging for it to have 0 value.
If you consider what you're producing as valuable, you're giving it away to companies with an incentive to extract as much value from your thing as possible, with little regard towards your preferences.
If an idiot is convinced to trade his house for some magic beans, would you still be saying "the beans were free"?
I should add a section to the site/guide about privacy, just letting people know they have somewhat of a choice with that.
As for sharing code, most of the parts of a project/app/whatever have already been done and if an experienced developer hears what your idea is, they could just make it and figure it out without any code. The code itself doesn't really seem that valuable (well.. sometimes). Someone can just look at a screenshot of my aicodeprep app and just make one and make it look the same too.
Not all the time of course - If I had some really unique sophisticated algorithms that I knew almost no one else would or has figured out, I would be more careful.
Speaking of privacy.. a while back a thought popped into my head about Slack, and all these unencrypted chat's businesses use. It kinda does seem crazy to do all your business operations over unencrypted chat, Slack rooms.. I personally would not trust Zuckerberg to not look in there and run lots of LLMs through all the conversations to find anything 'good'! Microsoft.. kinda doubt would do that on purpose but what's to stop a rogue employee from finding out some trade secrets etc.. I'd be suprised if it hasn't been done. Security is not usually a priority in tech. They half-ass care about your personal info.
>Someone can just look at a screenshot of my aicodeprep app and just make one and make it look the same too.
To some extent. But without your codebase they will make different decisions in the back which will affect a myriad of factors. Some may actually be better than your app, others will end up adding tech debt or have performance impacts. And this isn't even to get into truly novel algorithms; sometimes just having the experience to make a scalable app with best practices can make all the difference.
Or the audience doesn't care and they take the cheaper app anyway. It's not always a happy ending.
I don't think that's true. It's not that has zero value, it's that it has zero monetizable value.
Hackernews is free. The posts are valuable to me and I guess my posts are valuable to me, but I wouldn't pay for it and I definitely don't expect to get paid.
For YC, you are producing content that is "valuable" that brings people to their site, which they monetize through people signing up for their program. They do this with no regard for what your preferences are when they choose companies to invest in.
They sell ads (Launch, Hire, etc.) against the attention that you create. You ARE the product on HackerNews, and you're OK with it. As am I.
Same as OpenAI, I dont need to monetize them training on my data, and I am happy for you to as I would like to use the services for free.
>Hackernews is free. The posts are valuable to me and I guess my posts are valuable to me, but I wouldn't pay for it and I definitely don't expect to get paid.
at this point, we may need future forums to be premium so we can avoid the deluge of AI bots plauging the internet. a small, one time cost is a guaranteed way to make such strategies untenable. SomethingAwful had a point decades ago.
But like any other business, you need to follow the money and understand the incentives. Hackernews has ads, but ads for companies with us as the audience. It's also indirectly an ad for YCombinator itself as bringing awareness of the accelerator (note what "hackernews.com" redirects to).
I'm fine with a company advertising itself; if I wasn't the idea of a company ceases to really function. And in this structure for companies, I can also get benefits by potentially getting jobs from here. So I don't mind that either. Everything aligns. I agree and support the structure. I can't say that about many other "free" websites.
As for me. I do want to monetize my data one day. I can't stop the scraping the entire internet over (that's for the courts), but I sure as heck won't hand it to them on a silver platter.
Definitely to each their own. I will never have a job at a YC company and I will also never apply to YC, so the ads are completely useless. I did discover some of my favorite shoes from an IG ad, though.
It wouldn't ever be worth me getting $.0001431 dollars for my data and individual data will always be worthless on it's own because 1. taking away one individuals data from a model does not make the model worse. 2. the price of an individuals data will always be zero because you have people like me who are willing to give it away for free in exchange for a free service (aka hackernews or IG)
One user's LTV on IG may be $34, but one user's data is worth $0. Which I think a lot of people struggle with.
From a more moral standpoint, the best part about the advertising business model is that it makes the internet open to everyone, not just those who can pay for every site they use.
I'm not sure if I'd ever have a job at YC (my industry isn't very "investor friendly"). But I like the idea of having a bunch of opportunities with such companies. It also encourages an environment of people I want to be around as well. So that indirectly serves my interests.
I will even use an ad example with conventions and festivals. You can argue an event like Comic-con is simply a huge ad. And it is. But I'm there "for the ad" in that case. It gathers other people "for the ad". It collectively benefits all of us to gather and socialize among one another.
Ads aren't bad, but many ads primarily exist to distract, not to facilitate an experience. And as a hot take, maybe we do need to gatekeep a bit more in this day and age. I don't want a "free intent" if it means 99% of my interactions are with bots instead of humans. If it means that corporations determine what is "worthy" of seeing instead of peers. If credit cards get to determine what I can spend my money on instead of my own personal (and legal) taste.
>It wouldn't ever be worth me getting $.0001431 dollars for my data and individual data will always be worthless on it's own
On top of being a software engineers who's contributed to millions on value with my data, I also strive to be an artist. An industry that has spent decades being extracted from but not as fortunate to be compensated a living wage most often. People can argue that "art is worthless" , yet it also props up multiple billion dollar industries on top of societal cultured. An artisan these days can even sustain themselves as a individual, with much faster turnaround than trying to program a website or app.
By all metrics, its hard to argue this sector's value is zero. Maybe having that lens only strengthened my stance, as a precursor to what software can become if you don't push against abuse early on.
I understand the point people are trying to make with this argument, but we are so far into a nearly universal scam economy where corporations see small (relative to their costs of business) fines as just part of normal expenses that I also think anyone who really believes the AI companies aren't using their data to train models, even if it is against their terms, is wildly naive.
This is not only a privacy concern (in fact, that might be a tiny part since the code might end up public anyway?). There is an element of disclosure of personal data, there are ownership issues in case that code was not - in fact - going to be public and more.
In any case, not caring about the cost (at a specific time) doesn't make the cost disappear.
The point they are making is, that some people know that, and are not as concerned as others about it.
Not being concerned doesn't make the statement "it's free" more true.
if you consider watching a hour of Youtube and 30 minutes of ads to be "free videos", then be my guest. Not everything can be measured in a dollar value.
I understand. I get the point. I disagree
Privacy absolutely does not matter, until it does, and then it is too late
It's a transaction—a trade. You give them your personal data, and you get their services in exchange.
So no, it's not free.
Tech companies are making untold fortunes from unsophisticated people like you.
Sophistry. "many" according to which statistic? And just because some people consider that a trade is very favorable for them, doesn't it is not a trade and it doesn't mean they are correct - who's so naïve they can beat business people at their own game?
they +think they+ can beat business people
Plenty of people can also afford to subscribe to these without any issue. They don’t even know the price, they probably won’t even cancel it when they stop using it as they might not even realize they have a subscription.
By your logic, are the paid plans not sometimes free?
While it is true that sometimes you are the product even if you're paying, I don't think anyone is trying to argue that obviously paid plans are free.
Anecdotal, but Grok seems to have just introduced pretty restrictive rate limits. They’re now giving free users access to Grok 4 with a low limit and then making it difficult to manually switch to Grok 3 and continue. Will only allow a few more requests before pushing an upgrade to paid plans. Just started happening to me last night.
do you really have 20+ tabs of LLMs open at a time?
some days.. it varies but a whole browser window is dedicated to it and always open
I tried Cline with chatgpt 4.1 and I was charged - there are some free credits when you sign up for Cline that it used.
Not sure how you got it for free?
look up LLM7, and Pollinations AI. Both offer free GPT 4.1, but I am not sure how limited it is. They have tons more models but the names are different (openai-large = gpt-4.1)
Meta has free and generous APIs for the crappy Llama 4 models... they're okay at summarizing things but I have no idea if its any good for code. Prob not since no one even talks about those anymore.
GH Copilot is my guess. Not free, but $10 a month or free for students
FYI: the first AI you link to, " z.ai's GLM 4.5", actually links to zai.net, which appears to be a news site, instead of "chat.z.ai", which is what I think you intended.
Fun fact, zai[.]net seems to be an italian school magazine. As an italian I've never known about it, but the words pun got me laughing.
zai[.]net -> zainet -> zainetto -> which is the italian word for "little school backback"
oops. was using AI trying to fix some of the bugs and update it real fast with some newer models, since this post was trending here. Hopefully its scrolling better. Link fixed. I know its still ridiculous looking with some of the page but at least its readable for now.
Note that the website is scrolling very slow, sub1-fps on Firefox Android. I'm also unable to scroll the call-out about grok. Also, there's this strange large green button reading CSS loaded at the top.
Works fine, Firefox Android 142.0b9
I scroll just fine on Vanadium, Duck browser and brave.
On Android?
Very nice article and thx for the update.
I would be very interested in an in dept of your experiences of differences between Roo Code and Cline if you feel you can share that. I've only tried Roo Code (with interesting but mixed results) thus far.
Just use lmstudio.ai, it's what everyone is using nowadays
LM Studio is great, but it's a very different product from an AI-enabled IDE or a Claude Code style coding agent.
LM Studio is awesome
Is glm-4.5 air useable? I see it's free on Openrouter. Also pls advise what you think is the current best free openrouter model for coding. Thanks!
Well, if you download Qwen Code https://github.com/QwenLM/qwen-code it is free up to 2000 api calls a day.
Not sure if GLM-4.5 Air is good, but non-Air one is fabulous. I know for free API access there is pollinations ai project. Also llm7. If you just use the web chat's you can use most of the best models for free without API. There are ways to 'emulate' an API automatically.. I was thinking about adding this to my aicodeprep-gui app so it could automatically paste and then cut. Some MCP servers exist that you can use and it will automatically paste or cut from those web chat's and route it to an API interface.
OpenAI offers free tokens for most models, 2.5mil or 250k depending on model. Cerebras has some free limits, Gemini... Meta has plentiful free API for Llama 4 because.. lets face it, it sucks, but it is okay/not bad for stuff like summarizing text.
If you really wanted to code for exactly $0 you could use pollinations ai, in Cline extension (for VS Code) set to use "openai-large" (which is GPT 4.1). If you plan using all the best web chat's like Kimi K2, z.ai's GLM models, Qwen 3 chat, Gemini in AI Studio, OpenAI playground with o3 or o4-mini. You can go forever without being charged money. Pollinations 'openai-large' works fine in Cline as an agent to edit files for you etc.
Very cool, a lot to chew on here. Thanks so much for the feedback!
bro you are final boss of free tier users lol
damn right !!!!
I built a relevant tool (approved by Apple this week) which may help reduce the friction of you having to constantly copy paste text between your app and the AI assistant in browser.
It's called SelectToSearch and it reduces my friction by 85% by automating all those copy paste etc actions with a single keyboard shortcut:
https://apps.apple.com/ca/app/select-to-search-ai-assistant/...
Have you seen Microsoft's copilot? It is essentially free openai models
And to anyone who has ever used it, it appears more like opening smoothbrain. For a long time it was the only allowed model at work and even for basic cyber security questions it was sometimes completely useless.
I would not recommend it to anyone.
Which of their many Copilot products do you mean?
The regular copilot, copilot.microsoft.com
It was a bit difficult to trust the source after seeing the phrase "Nazi-adjacent" used in relation to Grok.
Qwen is totally useless any serious dev work.
It’s really hit and miss for me. Well defined small tasks seem ok. But every time I try some “agentic coding”, it burns through millions of tokens without producing anything working.
Which Qwen? They have over a dozen models now.
My experience lines up with the article. The agentic stuff only works with the biggest models. (Well, "works"... OpenAI Codex took 200 requests with o4-mini to change like 3 lines of code...)
For simple changes I actually found smaller models better because they're so much faster. So I shifted my focus from "best model" to "stupidest I can get away with".
I've been pushing that idea even further. If you give up on agentic, you can go surgical. At that point even 100x smaller models can handle it. Just tell it what to do and let it give you the diff.
Also I found the "fumble around my filesystem" approach stupid for my scale, where I can mostly fit the whole codebase into the context. So I just dump src/ into the prompt. (Other people's projects are a lot more boilerplatey so I'm testing ultra cheap models like gpt-oss-20b for code search. For that, I think you can go even cheaper...)
Patent pending.
Aider as a non-agentic coding tool strikes a nice balance on the efficiency vs effectiveness front. Using tree-sitter to create a repo map of the repository means less filesystem digging. No MCP, but shell commands mean it can use utilities I myself am familiar with. Combined with Cerebras as a provider, the turnaround on prompts is instant; I can stay involved rather than waiting on multiple rounds of tool calls. It's my go-to for smaller scale projects.
Just added a fork of aider that does do agentic commands: https://github.com/sutt/agent-aider
In testing I've found it to be underwhelming at being an agent compared to claude code, wrote up some case-studies on it here: https://github.com/sutt/agro/blob/master/docs/case-studies/a...
It's a shame MCP didn't end up using a sandboxed shell (or something similar, maybe even simpler.) All the pre-MCP agents I built just talked to the shell directly since the models are already trained to do that.
I am developing the same opinion. I want something fast and dependable. Getting into a flow state is important to me, and I just can't do that when I'm waiting for an agentic coding assistant to terminate.
I'm also interested in smaller models for their speed. That, or a provider like Cerebras.
Then, if you narrow the problem domain you can increase the dependability. I am curious to hear more about your "surgical" tools.
I rambled about this on my blog about a week ago: https://hpincket.com/what-would-the-vim-of-llm-tooling-look-...
well, most of the time, I just dump the entire codebase in if the context window is big and its a good model. But there are plenty of times when I need to block one folder in a repo or disable a few files because the files might "nudge" it in a wrong direction.
The surgical context tool (aicodeprep-gui) - there are at least 30 similar tools but most (if not all) are CLI only/no UI. I like UIs, I work faster with them for things like choosing individual files out of a big tree (at least it is using PySide6 library which is "lite" (could go lighter maybe), i HATE that too many things use webview/browsers. All the options on it are there for good reasons, its all focused on things that annoy me..and slow things down: like doing something repeatedly (copy paste copy paste or typing the same sentence over and over every time i have to do a certain thing with the AI and my code.
If you have not run 'aicp' (the command i gave it, but also there is a OS installer menu that will add a Windows/Mac/Linux right click context menu in their file managers) in a folder before, it will try to scan recursively to find code files, but it skips things like node_modules or .venv. but otherwise assumes most types of code files will probably be added so it checks them. You can fine tune it, add some .md or txt files or stuff in there that isn't code but might be helpful. When you generate the context block it puts the text inside the prompt box on the top AND/OR bottom - doing both can get better responses from AI.
It saves every file that is checked, and saves the window size, other window prefs, so you don't have to resize the window again. It saves the state of which files are checked so its less work / time next time. I have been just pasting the output from the LLMs into an agent like Cline but I am wondering if I should add browser automation / browser extension that does the copy pasting and also add option to edit / change files right after grabbing the output from a web chat. Its probably about good enough as it is though, not sure I want to make it into a big thing.
--- Yeah I just keep coming back to this workflow, its very reliable. I have not tried Claude Code yet but I will soon to see if they solved any of these problems.
Strange this thing has been at the top of hacker news for hours and hours.. weird! My server logs are just constant scrolling
Have you seen this? https://github.com/robertpiosik/CodeWebChat
Thanks for the article. I'm also doing a similar thing, here are my tips:
- https://chutes.ai - 200 requests per day if you deposit (one-time) $5 for top open weights models - GLM, Qwen, ...
- https://github.com/marketplace/models/ - around 10 requests per day to o3, ... if you have the $10 GitHub Copilot subsciption
- https://ferdium.org - I open all the LLM webapps here as separate "apps", my one place to go to talk with LLMs, without mixing it with regular browsing
- https://www.cherry-ai.com - chat API frontend, you can use it instead of the default webpages for services which give you free API access - Google, OpenRouter, Chutes, Github Models, Pollinations, ...
I really recommend trying a chat API frontend, it really simplifies talking with multiple models from various providers in a unified way and managing those conversations, exporting to markdown, ...
With chutes.ai, where do you see a one-time $5 for 200 requests/day?
aicodeprep-gui looks great. I will try it out
For those who don't know, OpenAI Codex CLI will now work with your ChatGPT plus or pro account. They barely announced it but it's on their github page. You don't have to use an api key.
I agree. I find even Haiku good enough at managing the flow of the conversation and consulting larger models - Gemini 2.5 Pro or GPT-5 - for programming tasks.
Last few days I am experimenting with using Codex (via MCP ${codex mcp}) from Gemini CLI and it works like a charm. Gemini CLI is mostly using Flash underneath but this is good enough for formulating problems and re-evaluating answers.
Same with Claude Code - I am asking (via MCP) for consulting with Gemini 2.5 Pro.
Never had much success of using Claude Code as MCP though.
The original idea comes of course from Aider - using main, weak and editor models all at once.
I use a 500 million parameter model for editor completions because I want those to nearly instantaneous and the plugin makes 50+ completion requests every session.
What editor do you use, and how did you set it up? I've been thinking about trying this with some local models and also with super low-latency ones like Gemini 2.5 Flash Lite. Would love to read more about this.
Neovim with the llama.cpp plugin and heavily quantized qwen2.5-coder with 500 (600?) million parameters. It's almost plug and play although the default ring context limit is way too large if you don't have a GPU.
Can you share which model you are using?
Which model and which plugin, please?
You should try GLM 4.5; it's better in practice than Kimi K2 and Qwen3 Coder, but it's not getting much hype.
They don't allow model switching below GPT-5 in codex cli anymore (without API key), because it's not recommended. Try it with thinking=high and it's quite an improvement from o4-mini. o4-mini is more like gpt-5-thinking-mini but they don't allow that for codex. gpt-5-thinking-high is more like o1 or maybe o3-pro.
> (Well, "works"... OpenAI Codex took 200 requests with o4-mini to change like 3 lines of code...)
Let’s keep something in reason, I have multiple times in my life spent days on what would end up to be maybe three lines of code.
Maybe optimistic, but reading posts like this makes me hopeful that AI-assisted coding will drive people to design more modular and sanely organized code, to reduce the amount of context required for each task. Sadly pretty much all code I have worked with have been giant messes of everything being connected to everything else, causing the entire project to be potential context for anything.
Guess the name: In 2015 I was preaching that ____ simplfies the mental model of your web app, makes everything performant, the api will dominate and endure for decades. Answer: React. Soon, we were feeling the induced demand for features and timelines till we regressed to our natural "hair on fire" state. AI is (not just) a better footgun.
It does, you're essentially forced to write good coding guidelines and documentation.
LLMs will write code this way if you ask but you have to know to ask.
At that(/what) point does it become harder for a human to grok a project?
That's always how it works no matter how good the model is. I'm surprised people keep forgetting this. If no one has the theory then the artifacts are almost unmaintainable.
You can end up doing this with entirely human written code too. Good software devs can see it from a mile away.
It depends if you're willing to drop the $30 for the super version :)
It's really very good at that. Frequently, I'll have something I've been working on over the years that has turned into an interconnected mess. "Split this code into modules of separated concerns". Bam, done. I used Claude for the first time last week and gave it a 2k line PowerShell script and it neatly pulled it apart into 5 working modules on the first try. Worked exactly the same, and ended up with better comments too.
So I've done that sort of refactoring a lot, albeit on real code in much bigger systems, not a script. Lots of coders won't do this, they'll just keep adding to the crap, crazy big module.
I always end up with a vastly smaller code base. Like 2000 lines turns into 800 lines or something like that.
Did that happen too or did the AI just do a glorified 'extract method', that any decent IDE can already do without AI?
I use AI, I'm not anti it, but on the other hand I keep seeing these gushing posts where I'm like 'but your ide could already do that, just click the quick refactoring button'.
It ended up being less, yeah, but not by that much, maybe 15%. Thing is, there was no "extract methods" possible. It was an old user script creation tool that had been modified over the last 10 years. If it were that easy, I would have just done it myself.
What this shows me is that it truly understands all the things this script was supposed to do and was able to organize it better, while not breaking any functionality.
These tricks are a little too much for me. I'd rather just write the code myself instead of opening 20 tabs with different LLM chats each.
However, I'd like to mention a tool called repomix (https://repomix.com/), which will pack your code into a single file that can be fed to an LLM's web chat. I typically feed it to Qwen3 Coder or AI Studio with good results.
I think there’s huge potential for a fully local “Cursor-like” stack — no cloud, no API keys, just everything running on your machine.
The setup could be: • Cursor CLI for agentic/dev stuff (example:https://x.com/cursor_ai/status/1953559384531050724) • A local memory layer compatible with the CLI — something like LEANN (97% smaller index, zero cloud cost, full privacy, https://github.com/yichuan-w/LEANN) or Milvus (though Milvus often ends up cloud/token-based) • Your inference engine, e.g. Ollama, which is great for running OSS GPT models locally
With this, you’d have an offline, private, and blazing-fast personal dev+AI environment. LEANN in particular is built exactly for this kind of setup — tiny footprint, semantic search over your entire local world, and Claude Code/ Cursor –compatible out of the box, the ollama for generation. I guess this solution is not only free but also does not need any API.
But I do agree that this need some effort to set up, but maybe someone can make these easy and fully open-source
Yeah, this seems a really fantastic summary of our ideal local AI stack. A powerful, private memory layer has always felt like the missing piece for tools like Cursor or aider.
The idea of this tiny, private index like what the LEANN project describes, combined with local inference via Ollama, is really powerful. I really like this idea about using it in programming, and a truly private "Cursor-like" experience would be a game-changer.
You should probably disclose everywhere you comment that you're advertising for Leann.
it might be free, private, blazing fast (if you choose a model with appropriate parameters to match your GPU).
but you'll quickly notice that it's not even close to matching the quality of output, thought and reflecting that you'd get from running the same model but significantly high parameter count on a GPU capable of providing over 128gb of actual vram.
There isn't anything available locally that will let me load a 128gb model and provide anything above 150tps
The only thing that local ai model makes sense for right now seems to be Home Assistant in order to replace your google home/alexis.
happy to be proven wrong, but the effort to reward just isn't there for local ai.
Because most of the people squeezing that highly quantized small model into their consumer gpu don't get how they have left no room for the activation weights, and are stuck with a measly small context.
If you're looking for free API access, Google offers access to Gemini for free, including for gemini-2.5-pro with thinking turned on. The limit is... quite high, as I'm running some benchmarking and haven't hit the limit yet.
Open weight models like DeepSeek R1 and GPT-OSS are also made available with free API access from various inference providers and hardware manufacturers.
Gemini 2.5 pro free limit is 100 requests per day.
https://ai.google.dev/gemini-api/docs/rate-limits
I'm getting consistently good results with Gemini CLI and the free 100 requests per day and 6 million tokens per day.
Note that you'll need to either authorize with a Google Account or with an API key from AI Studio, just be sure the API key is from an account where billing is disabled.
Also note that there are other rate limits for tokens per request and tokens per minute on the free plan that effectively prevent you from using the whole million token context window.
It's good to exit or /clear frequently so every request doesn't resubmit your entire history as context or you'll use up the token limits long before you hit 100 requests in a day.
Doesn't it swap to a lower power model after that?
Not automatically but you can switch to a lower power model and access more free requests. I think Gemini 2.5 Flash is 250 requests per day.
I'm assuming it isn't sensitive for your purposes, but note that Google will train on these interactions, but not if you pay.
I think it'll be hard to find a LLM that actually respects your privacy regardless whether or not you pay. Even with the "privacy" enterprise Co-Pilot from Microsoft with all their promises of respecting your data, it's still not deemed safe enough by leglislation to be used in part of the European energy sector. The way we view LLM's on any subscription is similar to how I imagine companies in the USA views Deepseek. Don't put anything into them you can't afford to share with the world. Of course with the agents, you've probably given them access to everything on your disk.
Though to be fair, it's kind of silly how much effort we go through to protect our mostly open source software from AI agents, while at the same time, half our OT has build in hardware backdoors.
I don't care. From what I understand of LLM training, there's basically 0 chance a key or password I might send it will ever be regurgitated. Do you have any examples of an LLM actually doing anything like this?
I agree, Google is definitely the champion of respecting your privacy. Will definitely not train their model on your data if you pay them. I mean you should definitely just film yourself and give them everything, access to your files, phone records, even bank accounts. Just make sure to pay them those measly $200 and absolutely they will not share that data with anybody.
You're thinking of Facebook. A lot of companies run on Gmail and Google Docs (easy to verify with `dig MX [bigco].com`), and they would not if Google shared that data with anybody.
It’s not really in either Meta or Google’s interests to share that data. What they do is to build super detailed profiles of you and what you’re likely to click on, so they can charge more money for ad impressions.
LLMs add a new thread model. If trained on your data, they might very well leak some of its information in some future chat.
Meta, Alphabet might not want that, but it is impossible to completely avoid with current architectures.
Meta certainly shares the data internally. https://www.techradar.com/computing/cyber-security/facebooks...
Big companies can negotiate their own terms and enforce them with meaningful legal action.
> When you use AI in web chat's (the chat interfaces like AI Studio, ChatGPT, Openrouter, instead of thru an IDE or agent framework) are almost always better at solving problems, and coming up with solutions compared to the agents like Cline, Trae, Copilot.. Not always, but usually.
I completely agree with this!
While I understand that it looks a little awkward to copy and paste your code out of your IDE and into a web chat interface, I generally get better results that way than with GitHub copilot or cursor.
100% opposite experience.
Whether agentic, not… it’s all about context.
Either agentic with access to your whole project, “lives” in GitHub, a fine tune, or RAG, or whatever… having access to all of the context drastically reduces hallucinations.
There is a big difference between “write x” and “write x for me in my style, with all y dependencies, and considering all z code that exists around it”.
I’m honestly not understand a defense of copy and paste AI coding… this is why agents are so massively popular right now.
Agreed that it’s all about context — but my experience is that pasting into web chat allows me to manage context much more than if I drop the whole project/whole filesystem into context. With the latter approach the results tend to be hit-and-miss as the model tries to guess what’s right. All about context!
I’m also surprised by this take. I found copy/paste between editor and external chats to be way less helpful.
That being said, I think everyone has probably different expectations and workflows. So if that’s what works for them, who am I to judge?
Why is Mistral not mentioned. Is there any reason? I have the impression that they are often ignored by media, bloggers, devs when it comes to comparing or showcasing LLM thingies. Comes with free tier and quality is quite good. (But I am not an AI power user) https://chat.mistral.ai/chat
Becase Mistral is very bad, Qwen, Kimi and GLM are just better.
Off topic but I use Mistral in production for various one shot tasks (mostly summarizing), it's incredibly cheap, fast and effective.
Bonus: it's European, kinda tired of giving always money to the American overlords.
To the OP: I highly recommend you look into Continue.dev and ollama/lmstudio and running models on your own. Some of them are really good at autocomplete-style suggestions while others (like gpt-oss) can reason and use tools.
It's my goto copilot.
I've found Zed to be a step up from continue.dev - you can use your own models there also
really - no monthly subscriptions? i hate those but i am fine with bringing my own API URLs etc and paying. I'm building a router that will track all the free tokens from all the different providers and auto rotate them when daily tokens or time limits run out.
Continue and Zed.. gonna check them out, prompts in Cline are too long. I was thinking of just making my own VS Code extension but I need to try Claude Code with GLM 4.5 (heard it pairs nicely)
Can you use your GH Copilot subscription with Zed to leverage the Copilot subscription-provided models?
Yes, you can. IIRC both for the assistant/agent and code completions.
Zed is supreme but I have a need that Zed can’t scratch so I’m in VSCode :(
Same! I’ve been using Continue in VSCode and found most of the bigger Qwen models plus gpt-oss-120b to be great in agentic mode!
Do you use openrouter models with continue?
Ai studio using https://aistudio.google.com/ is unlimited.
I also use kiro which I got access for completely free because I was early on seeing kiro and actually trying it out because of hackernews!
Sometimes I use cerebras web ui to get insanely fast token generation of things like gpt-oss or qwen 480 b or qwen in general too.
I want to thank hackernews for kiro! I mean, I am really grateful to this platform y'know. Not just for free stuff but in general too. Thanks :>
Wow, there's a lot here that I didn't know about. Just never drilled that far into the options presented. For a change, I'm happy that I read the article rather than only the comments on HN. ;)
And lots of helpful comments here on HN as well. Good job everyone involved. ;)
Kinda reads like old style blog but worse but good tips.
Also grok disclaimer lol
The qwen coder CLI gives you 1000 free requests per day to the qwen coder model (405b). Probably the best free option right now.
Qwen cli uses whole file edit format which is slow and burns credits fast same is issue with gemini cli.
Do opencode/crush also have this problem?
I use opencode and never experienced it going through tokens at the speed I experienced Gemini CLI go through.
Cant speak to qwen or crush as I have not used them
It’s not free FREE but if you deposit at least $10 on OpenRouter, you can use their free models without credit withdrawals. And those models are quite powerful, like DeepSeek R1. Sometimes, they are rate limited by the provider due to their popularity but it works in a pinch.
Actually nowadays they allow unlimited usage of free models without depositing anything.
https://claude.ai https://chat.z.ai https://chatgpt.com https://chat.qwen.ai https://chat.mistral.ai https://chat.deepseek.com https://gemini.google.com https://dashboard.cohere.com https://copilot.microsoft.com
Ha, I'm working on a similar tool: https://github.com/DrSiemer/codemerger
Glad to see I'm not the only one who prefers to work like that. I don't need many different models though, the free version of Gemini 2.5 Pro is usually enough for me. Especially the 1.000.000 token context length is really useful. I can just keep dumping full code merges in.
I'll have a look at the alternatives mentioned though. Some questions just seem to throw certain models into logic loops.
Windsurf has a good free model. Good enough for autocomplete level work for sure (haven't tried it for more as I use Claude Code)
You mean SWE-1? I used it like a dozen times and I gave up because the responses were so bad. Not even sure whether it’s good enough for autocomplete because it’s the slowest model I’ve tested in a while.
Not my experience for slowness. For smartness I am typically using it for simple "not worth looking that up" stuff rather than even feature implementation. Got it to write some MySQL SQL today, for example.
Assuming you have to at least be logged into a windsurf account though?
Yeah. I didn't see not logged in as a requirement.
As the post says, the problem with coding agents is they send a lot of their own data + almost your entire code base for each request: that's what makes them expensive. But when used in a chat the costs are so low as to be insignificant.
I only use OpenRouter which gives access to almost all models.
Sonnet was my favorite until I tried Gemini 2.5 Pro, which is almost always better. It can be quite slow though. So for basic questions / syntax reminders I just use Gemini Flash: super fast, and good for simple tasks.
Why are people still drawn to using pointless AI assistants for everything? What time do we save by making the code quality worse overall?
The answers would be similar to the question "why is Javascript so popular". It was not fast to run, not safe, not optimized and poor in most areas except for being almost universal and having results faster either due to js developers availability, or due to it being a high level language, even if it did try to multiply a "dog" string by 2 sometimes in some spaghetti codebase. It got better, but even before that this formula was "delivery > quality". It's also why almost no one writes assembly for production. Or C, and we get tons of bloated electron apps.
(If it was not clear, I have no love for JS and I never really programmed in it, but you have to admit, it did allow us to have more stuff. Even if 99% of it should be torched by fire if evaluated purely from engineering perspective)
JS is cool because the browser interprets it to show nice effects on website. AI agents are pointless
All the AI corps have a free model, thats enough to use it for free no?
I bet it's crazy to some people that others okay with giving up so much of their data for free tiers. Like yeah it's better to selfhost but it takes so much resources to run good enough LLM at home that I'd rather give up my code for some free usage, anyway that code eventually will end up open source
And as far as I’m concerned if my work is happy for me to use models to assist with code, then it’s not my problem
Nice write-up, especially the point about mixing different models for different stages of coding. I’ve been tracking which IDE/CLI tools give free or semi-free access to pro-grade LLMs (e.g., GPT-5, Claude code, Gemini 2.5 Pro) and how generous their quotas are. Ended up putting them side-by-side so it’s easier to compare hours, limits, and gotchas: https://github.com/inmve/free-ai-coding
Looks like somebody is a tad bit over reliant on these tools but other than that there is a lot of value in this article
You might find this repo helpful, it compares popular coding tools by hours with top-tier LLMs like Claude Sonnet: https://github.com/inmve/free-ai-coding
Was the page done with AI? The scrolling is kinda laggy. Firefox/m3 pro.
yeah i tried fixing it - the websites were more of an afterthought or annoying thing i had to do and definitely did it way too fast
The chatgpt free tier doesn't seem to expire unlike claude or mistral ai, they just downgrade it to a different model
Without tricks google aistudio definitely has limits, though pretty high ones. gemini.google.com on the other hand has less than a handful of free 2.5 pro messages for free
Slightly off topic: What are good open weight models for coding that run well on a macbook?
This all sounds a lot more complicated and time consuming than just writing the damn code yourself.
I'd love to see a thread that also takes advantage of student offers - for example, GitHub Copilot is free for university and college students
OP must be a master of context switching! I can’t imagine opening that number of tabs and still focus
Also, well, I mean... If there's all that time/effort involved... Just get yourself some tea, coffee, doodle on some piece of paper, do some push-ups, some yoga, prey, meditate, breathe and then... Code, lol!
I replicate SDD from kiro code, it works wonder for multi switching model because I can just re fetch from specs folder
Just use Rovodev CLI. Gives you 20 million tokens for free per 24 hours and you can switch between sonnet 4 / gpt-5.
What is the catch?
> Beta technology disclaimer > Rovo Dev in the CLI is a beta product under active development. We can only support a certain number of users without affecting the top-notch quality and user experience we are known for providing. Once we reach this limit, we will create a waiting list and continue to onboard users as we increase capacity. This product is available for free while in beta.
From https://community.atlassian.com/forums/Rovo-for-Software-Tea...
Isn't this only available to a current Jira cloud/service subscription?
As of today, what is the best local model that can be run on a system with 32gb of ram and 24gb of vram?
Qwen3-Coder-30B-A3B-Instruct-FP8 is a good choice ('qwen3-coder:30b' when you use ollama). I have also had good experiences with https://mistral.ai/news/devstral (built under a collaboration between Mistral AI and All Hands AI)
DeepSeek Coder 33B or Llama 3 70B with GGUF quantization (Q4_K_M) would be optimal for your specs, with Mistral Large 2 providing the best balance of performance and resource usage.
Start with Qwen of a size that fits in the vram.
To stop tab switching I built an extension to query all free models all at once: https://llmcouncil.github.io/llmcouncil/
Is it possible to have the source code? I see that there is a github icon at the bottom of the page but it doesn't work.
But isn't it in the extension store?
I only use LLMs as a substitute for stackexchange, and sometimes to write boilerplate code. The free chat provided by deepseek works very well for me, and I've never encountered any usage limits. V3 / R1 are mostly sufficient. When I need something better (not very often), I use Claude's free tier.
If you really need another model / a custom interface, it's better to use openrouter: deposit $10 and you get 1000 free queries/day across all free models. That $10 will be good for a few months, at the very least.
It's another advertisement article
This is nightmarish, whether or not you like LLMs.
Just use Amazon Q Dev for free which will cover every single area that you need in every context that you need (IDE, CLI, etc.).
Now all we need is a wrapper/UI/manager/aggregator for all these "free" AI tools/pages so that we can use them without going into the hassle of changing tabs ;-)
A lot of work to evaluate these models. Thank you
I don't like or love many things in life, but something about AI triggered that natural passion I had when I was first learning to code as a kid. Its just super fun. Coding without AI stopped being fun looong time ago. Unlucky brain or genetics maybe. AI sped up the dopamine feedback iteration loop to where my brain can really feel it again. I can get an idea in my head and just an hour later, have it 80% done and functioning. That gives me motivation, I won't get bored of the idea before I write the code.. which is what would happen a lot. Halfway done, get bored, then don't wanna continue.. AI fixed that
I jump between Claude Sonnet 4 on GitHub Copilot Pro and now GPT-5 on ChatGPT. That seems to get me pretty far. I have gpt-oss:20b installed with ollama, but haven't found a need to use it yet, and it seems like it just takes too long on an M1 Max MacBook Pro 64GB.
Claude Sonnet 4 is pretty exceptional. GPT-4.1 asks me too frequently if it wants to move forward. Yes! Of course! Just do it! I'll reject your changes or do something else later. The former gets a whole task done.
I wonder if anyone is getting better results, or comparable for cheaper or free. GitHub Copilot in Visual Studio Code is so good, I think it'd be pretty hard to beat, but I haven't tried other integrated editors.
Let's just be honest about what it is we actually do: The more people maximize what they can get for free, the more other people will have to shoulder the higher costs or limitations that follow. That's completely fine, not trying to pass judgement – but that's certainly not "free" unless you mean exactly "free for me, somebody else pays".
OpenAI offering 2.5M free tokens daily small models and 250k for big ones (tier 1-2) is so useful for random projects, I use them to learn japanese for example (by having a program that list informations about what the characters are just saying: vocabulary, grammar points, nuances).
I wonder how much energy this is wasting.
Probably not as much as you think: https://www.sustainabilitybynumbers.com/p/ai-energy-demand
You are better off worrying about your car use and your home heating/cooling efficiency, all of which are significantly worse for energy use.
> You’ll notice that this figure is for 2022, and we’ve had a major AI boom since then
I might as well read LLM gibberish instead of this article.
Untradable carbon tax (or carbon price for people who hate the T word) is needed.
Right - free to you maybe.
who cares. we can build more. energymaxx or the us will become like germany.
[dead]
[flagged]
You probably only saw the cursive backup font and not the intended Roboto font that the page actually uses (which took about half a second to load for me); here's a snippet from the CSS:
Site really needs to drop the completely unreadable backup font.
Commenters need to stop complaining and hit the reader mode button on their browsers.
Reader mode isn't an option for me on this page using Firefox mobile.
The better solution is that web devs and designers should either stop changing fonts or learn how to do so without making peoples' eyes bleed.
> Reader mode isn't an option for me on this page using Firefox mobile.
It is for me. Page was great when I opened it in Telegram browser (my default) though, and then I saw the crazy when I opened in Firefox.
Not all browsers have a reader mode. And for the record I usually don’t like these design-related comments, but it was literally unreadable. Like some sort of 17th century chicken scratch cursive.
UA's FTW.
[flagged]
You could get them all to idly chat to each other.