Ask HN: What license to choose in an open source project with code from an AI?
I have been working for several days in a small package that could be useful for developers as it is useful for me.
The problem is that the package uses some obscure built-in API that I asked a coding AI to help me with and I worked on top of that. The structure is being the same, but I changed the internal behavior (think of a callback system having all the internal code re-implemented by me).
Could I release the project as open source? What would be a good license? Would it be ethical? How much different needs the code to be from the one that was outputted by the AI tool? What do you do in this cases?
In the US, to my understanding:
1. The Copyright Office requires human authorship for a work to be protected by copyright. So in theory, someone else could copy the parts of your project that were LLM-generated without obeying your license's requirements (such as to give credit). Parts that you added and the work as a whole would still be protected
2. Regarding potential infringement of the training data: from progress in ongoing cases so far, copyright's requirement for substantial similarity has been upheld[0] for model output. So if your code doesn't resemble some protected work (and I think common coding LLM services have some attempts at preventing this) you should be in the clear there
So I don't see anything against using whatever open-source license you normally would.
[0]: E.G: in Stable Diffusion case, Judge William H. Orrick agreed with the defendants that "plaintiffs cannot plausibly allege the Output Images are substantially similar or re-present protected aspects of copyrighted Training Images, especially in light of plaintiffs' admission that Output Images are unlikely to look like the Training Images"
It's a legal mess, unfortunately no one really knows the answer, because it's a brand new problem.
Once someone challenges the status quo[1] with a big splash, then we will have a precedent at least.
[1] - the status quo is kind of pretending that there is nothing to see here, and that it will stay like. AI code is kind of treated like open code on GitHub with random OSS license (without having an explicit license)
Ignoring the issue would be the most sensible approach? I’m scared about possible damage to my reputation…
The AI adds no copyright. It either adds no copyright or there is unattributed copyright remaining from whatever training set was used.
The rest is under your copyright as it is the result of your creative work.
1) Yes, you can release it as open source.
2) These days I release as GPL. My experience releasing MIT/BSD has been joyless.
3) It's a shade unethical. OTOH, given the popularity of these tools, it seems your future won't be negatively affected.
4) Enough to convince a judge.
5) I don't use AI tools. Legally speaking, does the AI-generated code show enough creativity (which is the term for the type of materials which can be under copyright) that it could be traced back to an originator? If not, no worries. If so, how much are you willing to deal with that happening?
No it does not show enough creativity as it just created the skeleton for parsing some code. I suppose the original code came from some official docs.