

FreeBSD Project Isn't Ready To Let AI Commit Code Just Yet (theregister.com) 21
The latest status report from the FreeBSD Project says no thanks to code generated by LLM-based assistants. From a report: The FreeBSD Project's Status Report for the second quarter of 2025 contains updates from various sub-teams that are working on improving the FreeBSD OS, including separate sub-projects such as enabling FreeBSD apps to run on Linux, Chinese translation efforts, support for Solaris-style Extended Attributes, and for Apple's legacy HFS+ file system.
The thing that stood out to us, though, was that the core team is working on what it terms a "Policy on generative AI created code and documentation." The relevant paragraph says: "Core is investigating setting up a policy for LLM/AI usage (including but not limited to generating code). The result will be added to the Contributors Guide in the doc repository. AI can be useful for translations (which seems faster than doing the work manually), explaining long/obscure documents, tracking down bugs, or helping to understand large code bases. We currently tend to not use it to generate code because of license concerns. The discussion continues at the core session at BSDCan 2025 developer summit, and core is still collecting feedback and working on the policy."
The thing that stood out to us, though, was that the core team is working on what it terms a "Policy on generative AI created code and documentation." The relevant paragraph says: "Core is investigating setting up a policy for LLM/AI usage (including but not limited to generating code). The result will be added to the Contributors Guide in the doc repository. AI can be useful for translations (which seems faster than doing the work manually), explaining long/obscure documents, tracking down bugs, or helping to understand large code bases. We currently tend to not use it to generate code because of license concerns. The discussion continues at the core session at BSDCan 2025 developer summit, and core is still collecting feedback and working on the policy."
They should let AI do peer review though. (Score:3, Interesting)
Re: (Score:2)
That's not AI generating code or documentation but AI generating text based on code and a prompt.
Perhaps it is harmless, so long as it does not occur prior to appropriate peer review by an actual human being.
If done too early, then the AI might end up tainting the review process by causing people to believe the machine when they shouldn't, Or for reviewers to perhaps look no further than what the machine had said (that is if the AI reviewed the code, then reviewers might not analyze the code so diligently
Re: (Score:2)
Translation (Score:2, Interesting)
TFS:"explaining long/obscure documents,"
How about for translating help/man/info pages?
It's cliche that programmers dislike documentation, so I guess translating documentation is even lower priority.
Re: (Score:2)
In the best case you have community members who can translate to their native language and rather do that than coding. Not everyone needs to be a programmer to help a project.
The licensing complaint is pretty simple to solve (Score:4, Interesting)
Train a coding AI model just on BSD, MIT, ISC, APACHE, WTFPL, CC0 and compatible licensed code, and only accept code generated by that model.
Easy peasy.
Re: (Score:3)
Dunno if that's the ticket (those are different, though similar, licenses), but I love the idea of coding LLM's trained only a targeted codebase. For example, train one only on Linux Kernel source for use in working with Linux Kernel code... I imagine the code style and such would be a better fit, and that codebase is big enough to learn a lot from it.
As a counter-example, if a coding assistant was trained with a lot of obfuscated C, I wouldn't want the results going into my production codebase.
Re: (Score:2)
You could try to train an LLM to de-obfuscate code, though. I am still waiting for the javascript un-minimizer. You can format it nicely and remove some of the optimizations like scientific notation for (smallish) integers, but a LLM could infer readable variable names.
Re:The licensing complaint is pretty simple to sol (Score:4, Insightful)
Those licenses still require you to produce the correct terms when you are redistributing code.
You cannot ship code and state that the terms are the BSD license when that code is under MIT license, Etc.
Also, these licenses require including the copyright statement of the author, so your redistribution can be infringing if you don't specify them.
Thus training only on that group of licenses does not give a free pass. You would have to train only on code where the author has provided a distribution license that allows the distribution method you are planning.
Re: (Score:2)
If you really want to follow the path of considering the training code licenses, you still have unmet conditions like mentioning the names of contributors.
On the other hand, when you now take two lines of a program and put them into your program, you usually don't need a license as it is too trivial. I guess there is no clear line, but for copyright to be relevant, you need a non-trivial amount of code. Even if one were to build an AI that is just remixing originals, each work would probably contain only a
Re: (Score:2)
for copyright to be relevant, you need a non-trivial amount of code.
That is not necessarily true, and the importance of the portion used is a major factor.
You are thinking of Copyright in terms of number of lines of identical code perhaps, but copyright on computer software does not exclusively work that way.
each work would probably contain only a few keywords of each author
Copyright over software does not look solely at direct 1:1 copies. The keywords can be different and still infringing. It is referred
Re: (Score:2)
I'm thinking of copyright more or less in entropy.
If you copy three lines that you could have written yourself without knowing the other code, it is probably under the limit to be copyrightable.
For more code it can become difficult. You may have a full algorithm that is just the pseudo code in a text book put into let's say python. Now there are only few ways (modulo different variable names) to put the code verbatim into python and only few variations that make sense. There is little to copyright there, be
Re: (Score:2)
but your implementation of quicksort is probably not as unique as you may think
It may well not be unique, But copyright rights apply based on originality, not novelty.
You may be contemplating an issue here that technically copyright does not even have.
If two or three or four people happen to write the exact same program -- it is perfectly fine with copyright law, so long as they did not actually have access to each others' works or copy from one another. They are then in fact all entitled to copyright p
AI code commits? (Score:2)
No one should let AI commit code (Score:2)
Indeed (Score:5, Interesting)
I've tried some of the AI coding tools. It works OK for some really basic stuff. If you need a quick 10 line function that does something very specific and you can describe that fairly accurately, its good. Anything that gets remotely complex though it tends to confidently spit out code full of bugs or even code that won't even compile.
Sometimes it even makes up calls to functions in a library that don't even exist (my only guess is that somewhere it parsed in someone talking about trying to call that function when they assumed it did, and that worked its way into its data as a function call).
Overall, it can be ok for some basic stuff, but its far from ready to just turn it loose on anything of value.
Re: (Score:2)
The non-existent libraries probably do exist for the sources the LLM is copying from.
LLM... (Score:3)
Please generate a text that look as close as possible as the text i actually need, at a point that if it's wrong, i won't be able to pick it off unless i manually check it with my advanced debugging skills.
And I hope they ... (Score:2)
... never will. Human control 4 ever.