Am I being dumb or does this not actually contain the facts about the tax code? Is the /demo/all-facts file supposed to be the “real” facts? Are the XML fact files provided in another location?
It’s pretty cool to see the way that the IRS handles defining and maintaining its tax calculations, but also a machine-readable tax code seems cool too.
I believe the actual IRS tax code implementation is in a separate repo here: https://github.com/IRS-Public/direct-file while the originally linked repo is the fact graph tooling decoupled from the tax implementation.
I’ve had frustrating experiences with TurboTax due to its overly complex interface, aggressive data collection under the guise of saving money (which it doesn’t deliver), and a convoluted pricing structure that rivals the IRS’s own complexity.
I hope this initiative is good enough to enable domain experts and good people to build transparent, user-friendly alternatives to challenge TurboTax’s market grip.
Has anyone encountered promising tools or approaches that tackle these pain points?
Intuit spent $3.8 million on lobbying against Direct File in 2023, HR Block another $3 million. In total, the tax prep industry has spent $93 million lobbying against the Free File program since 2003 (through 2023, couldn't find a more recent source).
Note: Freetaxusa.com has not done a good job with Form 3921 (ISO grant exercises) and AMT carryover. If you have exercised ISO grants or later sold stock purchased in years in which you paid AMT, do not use freetaxusa.com. You will lose more in tax costs vs. finding a real CPA willing to go through your nuanced math.
I used Cash App Taxes last year (after years of H&R Block and TurboTax before that), and it worked great and covered all my needs which are more complex than the average tax filer. 100% free.
It's nice to see an open sourced implementation of the US tax code! This was part of the IRS Direct File codebase that allowed people to file their taxes for free, directly with the IRS. It was canceled earlier this year by the Trump administration. It looks like the Fact Graph was already opensourced a couple months ago and that version of the factgraph lives here: https://github.com/IRS-Public/direct-file/tree/main/direct-f...
I'm curious why a second repository was created for this.
Having talked at length with one of the developers from 18F at a conference who was fired along with many of the other folks that worked on Direct File, I can assure you that it's no longer being worked on.
The 2024 site remains up so people can file their taxes for that year, but it will no longer be updated.
Any idea what the actual deduction it supposedly found for private school?
You can pay for K-12 with 529 or Coverdell ESA funds. But neither allows deductions for contributions. Only growth in either is tax free (assuming it’s spent on education expenses).
Many states allow a state tax deduction for 529 contributions, which could net you up to an 8ish% discount if you’re in a high tax locality (e.g. NYC).
I've also saved a bit of money on taxes just by thinking about possible deductions and asking LLMs whether they exist. Of course to actually claim such deductions I need to follow instructions from the IRS/state tax agencies so it's hallucination proof: I'm still manually reading the instructions from the tax agencies to understand how to claim them.
Model training data already contains all the text there is[0], so they can already answer questions like this (especially with web search), but they aren't good at tax calculations.
The problem is that the text of US tax code isn't enough to know the correct action to take. The IRS has semi-formal policies based on how it has chosen to interpret the statutes. There are areas of gray that they don't clearly specify. Some of this is in supplementary publications but it still has subjective elements. One example is that settlements for "serious injuries" are regarded as non-taxable income. What constitutes serious is a squishy concept.
You can technically use the language model as a data model. That was the quick hack that started it all, autocomplete on a question produces the answer, yes.
However it's clear that we are moving towards separating the data and the language model. Even base chatgpt is given Search Tools and python Tools instead of producing them by text, the tool call itself may be generated by the model though.
You can for sure use a pure LLM to ask it questions about tax code, but we'll probably see specific tools that only contain canon law and kosher case law, and sources it properly. Y'know instead of halucinating
I guess as long as it's for entertainment purposes only. I'm going to file "actually following tax/legal advice from a potentially hallucinating LLM" under NOPE.
> What does it mean for the license to say "within the US"?
It means exactly what it says; you have to read the whole thing (or at least the two sentences before the CC 1.0 Universal text, which is the operative mechanism by which the second sentence is effected), not a fraction of the first sentence.
> Does this mean this software cannot be used outside the US?
No. The license explains two things:
(1) Without any license, this is automatically public domain in the US because it is a federal government work.
(2) The federal government (as the owner of the copyright at creation outside the United States, at least anywhere that applies the common rules underlying the Berne Convention) waives copyright worldwide, and does so via the CC 1.0 Universal declaration (the text of which is then included.)
So, it is, to the extent that this is legally possible, copyright-free globally.
Some countries don't recognize the concept of Public Domain works. In the US, many government works are Public Domain as a matter of law. This creates complications internationally in those countries that don't recognize the legitimacy of Public Domain as a legal concept. Nonetheless, the US still wants to make it available internationally.
To satisfy these conflicting requirements, the US government places it in the Public Domain in the US to satisfy US law. Additionally, they make it available internationally under a license that approximates the intent of Public Domain while still being recognized as a legally valid thing.
Good question. Copyright laws are country-specific, right? So perhaps it is just trying to be clear that there is no license being asserted outside of the US.
My eyes read Scala but my brain was thinking Clojure, so I was a bit confused on why there weren’t any parentheses for the first couple of seconds looking at the source.
Am I being dumb or does this not actually contain the facts about the tax code? Is the /demo/all-facts file supposed to be the “real” facts? Are the XML fact files provided in another location?
It’s pretty cool to see the way that the IRS handles defining and maintaining its tax calculations, but also a machine-readable tax code seems cool too.
I believe the actual IRS tax code implementation is in a separate repo here: https://github.com/IRS-Public/direct-file while the originally linked repo is the fact graph tooling decoupled from the tax implementation.
Look like many of them are specifically the xml files here:
https://github.com/IRS-Public/direct-file/tree/e0d5c84451cc5...
I was just reading through those! A bit dizzying
specifically here https://github.com/IRS-Public/direct-file/tree/main/direct-f...
From https://github.com/IRS-Public/fact-graph/blob/main/docs/fact...:
> Standardize Fact Dictionaries as a canonical format for declaratively modeling tax logic
As far as I am aware, fact just means shared assumption. This seems entirely reasonable for a tax code.
I’ve had frustrating experiences with TurboTax due to its overly complex interface, aggressive data collection under the guise of saving money (which it doesn’t deliver), and a convoluted pricing structure that rivals the IRS’s own complexity.
I hope this initiative is good enough to enable domain experts and good people to build transparent, user-friendly alternatives to challenge TurboTax’s market grip.
Has anyone encountered promising tools or approaches that tackle these pain points?
DirectFile was quite good for the one year I was able to use it and addressed your concerns. Don't worry, that's since been taken care of.
https://apnews.com/article/irs-direct-file-tax-returns-free-...
I can totally see the minds running TurboTax spending a lot of money to make this happen.
Intuit spent $3.8 million on lobbying against Direct File in 2023, HR Block another $3 million. In total, the tax prep industry has spent $93 million lobbying against the Free File program since 2003 (through 2023, couldn't find a more recent source).
https://www.opensecrets.org/news/2024/02/turbotax-maker-intu...
It's sad to see how little money needs to be spent to make the lives of millions of tax payers more miserable.
Just goes to show how cheap our politicians really are. In both heart and bank.
Just a heads up, your URL 404’s
Thanks. Fixed. I stripped what I thought was a tracker without testing.
The fun part is you can change the text before what you thought was tracking to anything you want:
http://apnews.com/article/apnews-declares-trump-stupid-4bb0b...
TurboTax’s advertising is borderline fraudulent in my opinion.
Freetaxusa.com (no affiliation) is just as good and legitimately free.
Note: Freetaxusa.com has not done a good job with Form 3921 (ISO grant exercises) and AMT carryover. If you have exercised ISO grants or later sold stock purchased in years in which you paid AMT, do not use freetaxusa.com. You will lose more in tax costs vs. finding a real CPA willing to go through your nuanced math.
FreeTaxUSA is legitimately fantastic!
I love em, I don’t know why anybody uses TurboTax. Good product, generous freemium model with transparent pricing
I used Cash App Taxes last year (after years of H&R Block and TurboTax before that), and it worked great and covered all my needs which are more complex than the average tax filer. 100% free.
The H&R Block software is better imo.
Cash App Taxes
It's nice to see an open sourced implementation of the US tax code! This was part of the IRS Direct File codebase that allowed people to file their taxes for free, directly with the IRS. It was canceled earlier this year by the Trump administration. It looks like the Fact Graph was already opensourced a couple months ago and that version of the factgraph lives here: https://github.com/IRS-Public/direct-file/tree/main/direct-f...
I'm curious why a second repository was created for this.
I wonder too. Perhaps the intent is for it to be standalone for general usage and not just as a part of the direct file project?
Seems so, according to this file: https://github.com/IRS-Public/fact-graph/blob/main/docs/from...
> The main changes are: [...] converting the fact-graph to a standalone library [...]
I'm still disappointed that they got rid of Direct File, such a promising start...
Big W for the tax lobby, big L for the rest of us
It's still there. They like saying things and not doing them.
https://directfile.irs.gov
So it's always possible they'll just forget to shut it off.
Having talked at length with one of the developers from 18F at a conference who was fired along with many of the other folks that worked on Direct File, I can assure you that it's no longer being worked on.
The 2024 site remains up so people can file their taxes for that year, but it will no longer be updated.
I'm far beyond disappointed for that. I'm fucking pissed. Such stupid politicking that makes all of our lives shittier.
It's more than "stupid politicking". Follow the money.
https://directfile.irs.gov ??
Did you try actually using it?
Build it and release for free.
I wonder how this can be used with an LLM to provide interesting tax advice? I'd love to regularly ask questions of the tax code...
patio11's already saved over $2k apparently, maybe he'll do a more formal write-up at some point. (A couple threads here https://x.com/patio11/status/1977425626584711668 and here https://x.com/patio11/status/1978168404793037087 )
Any idea what the actual deduction it supposedly found for private school?
You can pay for K-12 with 529 or Coverdell ESA funds. But neither allows deductions for contributions. Only growth in either is tax free (assuming it’s spent on education expenses).
Many states allow a state tax deduction for 529 contributions, which could net you up to an 8ish% discount if you’re in a high tax locality (e.g. NYC).
I've also saved a bit of money on taxes just by thinking about possible deductions and asking LLMs whether they exist. Of course to actually claim such deductions I need to follow instructions from the IRS/state tax agencies so it's hallucination proof: I'm still manually reading the instructions from the tax agencies to understand how to claim them.
Makes me wonder if someone has already trained a model on the tax code. Would be interesting for sure.
Model training data already contains all the text there is[0], so they can already answer questions like this (especially with web search), but they aren't good at tax calculations.
https://arxiv.org/abs/2507.16126v1
[0] but it's quite possible the conversion from HTML to text is bad
The problem is that the text of US tax code isn't enough to know the correct action to take. The IRS has semi-formal policies based on how it has chosen to interpret the statutes. There are areas of gray that they don't clearly specify. Some of this is in supplementary publications but it still has subjective elements. One example is that settlements for "serious injuries" are regarded as non-taxable income. What constitutes serious is a squishy concept.
Yeah you'd have to pull in a lot of case law and perform a lot of fine tuning on expert tax advice (you'd probably have to create this training data).
Would be neat (and still legally fraught!).
You can technically use the language model as a data model. That was the quick hack that started it all, autocomplete on a question produces the answer, yes.
However it's clear that we are moving towards separating the data and the language model. Even base chatgpt is given Search Tools and python Tools instead of producing them by text, the tool call itself may be generated by the model though.
You can for sure use a pure LLM to ask it questions about tax code, but we'll probably see specific tools that only contain canon law and kosher case law, and sources it properly. Y'know instead of halucinating
I guess as long as it's for entertainment purposes only. I'm going to file "actually following tax/legal advice from a potentially hallucinating LLM" under NOPE.
The super obvious workflow is to query for an idea in natural English and then verify or ask the LLM to provide the paths it was following.
It begs the question why you assume the parent comment was going to blindly follow the LLMs output.
> It begs the question why you assume the parent comment was going to blindly follow the LLMs output.
Many people do
> As a work of the United States Government, this project is in the public domain within the United States.
What does it mean for the license to say "within the US"?
Does this mean this software cannot be used outside the US?
> What does it mean for the license to say "within the US"?
It means exactly what it says; you have to read the whole thing (or at least the two sentences before the CC 1.0 Universal text, which is the operative mechanism by which the second sentence is effected), not a fraction of the first sentence.
> Does this mean this software cannot be used outside the US?
No. The license explains two things:
(1) Without any license, this is automatically public domain in the US because it is a federal government work.
(2) The federal government (as the owner of the copyright at creation outside the United States, at least anywhere that applies the common rules underlying the Berne Convention) waives copyright worldwide, and does so via the CC 1.0 Universal declaration (the text of which is then included.)
So, it is, to the extent that this is legally possible, copyright-free globally.
Some countries don't recognize the concept of Public Domain works. In the US, many government works are Public Domain as a matter of law. This creates complications internationally in those countries that don't recognize the legitimacy of Public Domain as a legal concept. Nonetheless, the US still wants to make it available internationally.
To satisfy these conflicting requirements, the US government places it in the Public Domain in the US to satisfy US law. Additionally, they make it available internationally under a license that approximates the intent of Public Domain while still being recognized as a legally valid thing.
Good question. Copyright laws are country-specific, right? So perhaps it is just trying to be clear that there is no license being asserted outside of the US.
Licenses are offered or granted (they are permissions from the copyright holder), not asserted.
My eyes read Scala but my brain was thinking Clojure, so I was a bit confused on why there weren’t any parentheses for the first couple of seconds looking at the source.
Are the rules versioned somehow? I didn't see that.
This was such a fun neat part of the Direct File code drop 5 months ago. https://news.ycombinator.com/item?id=44131901
In particular there's a pretty nice inline tutorial that's still there in that release: https://github.com/IRS-Public/direct-file/blob/main/direct-f...
How can this be used to hack to save money !!
Surprised to learn we still have an IRS
Scala mentioned
[flagged]
Why would I want to use this over Prolog/Datalog?
Because prolog/datalog don't offer a list of questions that you can ask based on context to calculate someone's US taxes.
That's the database you consult(). Doing income taxes is well-suited to traditional logic programming.
This is a bit like asking "why would I use my car's schematics instead of a wrench".
This is the rules engine's details. You could use it to build the logic and traversal in whatever language you like.
I think they're asking why you would build a rules engine and fact graph instead of "just" encoding it in Datalog.