F-Droid policy on libre "AI"

strypey · September 20, 2025, 7:22am

I’ve noticed at least one app installed via F-Droid thay asked me to opt-in to on-device “AI”. Which made me curious about whether the F-Droid team has formulated a policy on this.

Are apps with on-device “AI” built from source by F-Droid volunteers, or are they Reproducible Builds? If it’s the latter, what’s the standard for reproducibility here?

I imagine the bar would be higher than in the watered down "Open Source AI Definition” that Mufelli pushed through the OSI. Which suggests a need for FSF or another uncaptured software freedom organisation to draft a libre AI definition. Teasing out how the FSD applies to generative models, and other forms of “AI” whose reproduction from source fundamentally depend on elements other than just their full code.

What about vibe coding? Cory Doctorow reports that all outputs of generative models are public domain by default. Subject to copyright only if a human author can prove substantial creative input. This creates a number of dark spots on the map we need to explore;

Is auto-generated and therefore public domain software suitable for inclusion in F-Droid?
If copyright is claimed on vibe coded software, is the author subject to copyleft licenses on code used to ”train” the model they used?
If so, how would we determine whether that applies to Vibe App A?
If we determine it does, which license applies?

Any model trained on GH code has ingested software under many incompatible licenses. Potentially including Source Available licenses like MongoDB’s SPPL.

Does that mean software it produces is automatically under under the most restrictive license in the training data if someone claims authorship for copyright purposes?
If there are multiple Source Available licenses in the “training” data, and they contradict, which applies?

BorgSoft have opened a huge can of worms here, and I doubt they’re going to like what comes out ; )

Also, Is there a need for a new anti-feature flag here? Something like “contains a generative model sometimes called’AI’. Or would that be excessive?

Licaon_Kter · September 20, 2025, 8:42am

Is the app name secret?

The whole text is confusing to me.

You talk about at least 3 issues but all mixed up in one big issue?

Is this about “on device AI processing”? Privacy or something

About “app coded with AI”? License issue, maybe

About “app uses AI model”? Non free asset issue, I guess

About “app generates stuff on device”? License issue

strypey · September 20, 2025, 6:53pm

I could tell you, but I’d have to wipe your memory afterwards

But seriously, I’ll try to remember, but it’s kind of beside the point. What I’m asking about is the policy approach to the tech described in the OSAID.

When I have that experience I have a think about, read it again, and think about it some more before coming back with a reply. But that’s just me

Licaon_Kter · September 20, 2025, 7:38pm

Ah, you wrote with an AI just to prove a point.

strypey · September 22, 2025, 1:56am

So … am I right in thinking that F-Droid has no policy on when “AI” counts as libre and when it doesn’t, and it hasn’t even been discussed?

vdbhb59 · September 22, 2025, 3:27am

QQ: is there actually Libre in AI/LLM etc, & is AI actually doing any hard work by itself?

shuvashish76 · September 22, 2025, 3:40am

Somewhat related topic…Gentoo added no-AI policy for their wiki.
Project:Council/AI policy - Gentoo wiki | comments

It is expressly forbidden to contribute to Gentoo any content that has been created with the assistance of Natural Language Processing artificial intelligence tools. This motion can be revisited, should a case been made over such a tool that does not pose copyright, ethical and quality concerns.

Offtopic:

https://archive.is/VJ1DN | The Job Market Is Hell

Young people are using ChatGPT to write their applications; HR is using AI to read them; no one is getting hired.

AverageGoogleHater · September 28, 2025, 5:23pm

I mean… These concerns and issues aside.

There’s a good amount of people that look at AI as what would describe an anti feature regardless. And honestly in it’s current condition I do too. But that’s a topic for another day.

So it might be good to consider making a anti feature for it for the time being anyways? After all these anti features are meant to allow the user to make an informed decision. Not punish the developer.

I don’t know maybe I’m wrong here I just think it could be something worth considering.

linsui · September 29, 2025, 1:28pm

Yes, but it should still be an Anti-Feature instead of a feature. If the AI model is not under a free license then it’s covered by NonFreeAsset. If it’s a free model then I don’t know why it’s an Anti-Feature.

namark · October 19, 2025, 6:48am

The OP and the thread are bit all over the place, as is expected from a buzzword topic, but the overall concern is certainly justified. I don’t think it’s enough to just treat neural networks as assets. It’s a program that runs on your device that cannot be audited or modified. Exactly how undesirable it can be depends on what it is allowed to interface with. It can be as harmless as an asset or as dangerous as a binary blob, and slapping a creative commons license on a binary blob doesn’t suddenly make it free software.

I hope F-Droid (and/or other even better established names in FOSS like debian or gnu) will someday become big enough to be able to provide verifyable reproducable neural netwoks trained on their servers, then you could treat the models as any other project, requiring the training setup as source code and data sets as assets. Untill then I think it warrants a new anti-feature tag, that will allow an informed user to infer the level of danger: if it’s an image generator, then yeah - equivalent of a non free asset; if it’s a personal assistant that would take over your entire device, then need to be absolutely sure you trust where the models come from.

I personally ocasionally use Sayboard which is a speech recognition keyboard. I have no complaints about it specifically, and the models they are using come from Vosk, which at a glance seems promising, as its main “selling feature” seems to be small models than can run without problems on low end devices, and therfore should also be not as resource intensive to train from scratch independently, and their FAQ seems to encourage that, but just to illustrate my point in theory, the provided models could be trained to, for example, intentionally modify or completely cencor certain words or phrases. I would notice that and be mildly annoyed and go back to a normal keyboard, but someone who is visually impaired might not have that option or might not even notice it happening… so yeah, even something as seemingly harmless as speech to/from text can be a problem in some contexts.

shuvashish76 · October 24, 2025, 7:49am

Other examples:

Developer guide - chezmoi Chezmoi introduces ban on LLM-generated contributions | comments
AI apps are not allowed at IzzyOnDroid : Inclusion Policy · Wiki · IzzyOnDroid / repo · GitLab
Fedora Will Allow AI-Assisted Contributions With Proper Disclosure & Transparency

strypey · December 12, 2025, 12:38am

All the issues I laid out there stem from the various copyright and software freedom questions inherent in generative models, or as I like to call it, MOLE Training. I thought it might be helpful to get a general sense of where the community’s thinking is on all this.

From reading the responses so far, I can see that I’m not alone in being concerned about this. I think it would be worth the F-Droid community’s time to do some reading and thinking about this, in some depth.

I’m happy to help with this is any way I can. Some good reading materials are linked in my MOLE Training blog piece, and I particular recommend David Chapman’s book.

EDIT: Drew DeVault’s blog piece on the software freedom issues created by “machine learning“ also gives a basic intro.

EDIT 2: The Register article on copyright violations potentially inherent in using source code under libre licenses for MOLE Training.

namark · December 12, 2025, 5:38pm

You don’t need a separate definition of open source AI anymore than you need a seperate definition of open source java bytecode. It’s all already covered by the normal definition of FOSS. The datasets that it’s trained on are no different than any dataset any other type of software might rely on, like a map or a weather forecast. There are only practical differences that make it difficult for any grassroots ecosystem to support

the training is computationally intensive and pretty much requires specialized hardware
most existing projects rely on proprietary toolchains for said hardware (thanks LLVM)
the datasets are hugenourmous

you need big $$$ to train (or even run) state of the art models, and even if you stick to smaller/simpler ones you’ll probably blow your budget and still be a sad snail everyone would laugh at for being a snail. This is the biggest threat to FOSS from this tech, that no definitions are going to save you from, doesn’t matter how freelicious and opentastic the source is if the big bros model is always going to be better than yours cause they got all the GPUs.

The AI infirging on copyright is a completely different issue, that has nothing to do with definition of FOSS. As far as I’m concerned inanimate objects have no rights or obligation, all right to publish anything and obligations to follow licenses lie on the individual or the organization that does it. If you can prove that what they have published is a copy or a derivative work it doesn’t matter what they used to produce it. I can take a piece of your code compile, decompile, transpile, fuzz, cross-compile and publish as binary blob, then you’d have to resort to foreniscs to prove it’s derivative. If AI is the new shiny haxor tool that defeats your inspector gadget forensics suite, well then get yourself a better AI powered one or something. If anything it’s those who would overuse the AI that should be paranoid, lest their little helper spits out someone elses code verbatim and brings a lawsuit on their heads.

strypey · December 16, 2025, 2:50am

A fully libre Goggle Maps app, for example, could be tweaked to use OSM as its data source instead. The source code gives you everything you need to reproduce the app from scratch.

As Tara’s blog post explains (and mine, links above), with an app that includes a generative model, the original training data is required to reproduce the app. If that data is not under compatible libre licenses, you can’t reproduce the app from scratch without complying with (or breaking) proprietary licensing terms.

These are very different cases.

namark · December 16, 2025, 4:00am

Tweaked google maps that uses osm is not a reproduction of google maps. If the propriatary map data set is somehow superior to the libre data set then the the map app that uses it will be superior. Same with these models, if you have an equivalent libre data set you can produce an equivalent (“tweaked” as you call it) model, it wouldn’t be a reproduction and it doesn’t need to be a reproduction, it only needs to be competitive.

namark · December 16, 2025, 8:32am

Sidenote, while I concider the various bans on machine assited contributions totally “off-topic, ban, reeeeee” in discussion of F-Droid policies, I have to point out that they are aditionally extremely short sighted. You don’t need something the size or scope of copilot to help someone write/dictate and edit code comfortably and efficiently. That is what the thing is actually good for, and smaller lighter solution can be competitive on those merits and you should hope the author of that new project feels included enough in FOSS movement to release the relevant parts under AGPL, instead of being alienated by all this political BS to put it under MIT and get promptly snached up by the big bro next door. If you want to boycot a specific company and their product, then you should do it by name.

Licaon_Kter · December 19, 2025, 12:37pm

github.com/emersion/.github

CONTRIBUTING.md

main

# Contributing

## Preparing the work

If you're planning to work on a large feature, please come discuss in
[#emersion on Libera Chat] before starting the implementation. This will ensure
everybody is on the same page. For very large features, writing a design
document can be very helpful for maintainers.

## Commit log

A [linear, "recipe" style][linear-log] history is used. This means that every
commit should be small, digestible, stand-alone, and functional. Rather than a
purely chronological commit history like this:

    doc: final docs for view transforms
    fix tests when disabled, redo broken doc formatting
    better transformed-view iteration (thanks Hannah!)
    try to catch more cases in tests
    tests: add new spline test

This file has been truncated. show original

strypey · January 28, 2026, 9:45pm

I suggest you do some reading about how generative models are produced (I recommend Gradient Dissent) before diving into discussions about it.

The key difference is that in Goggle Maps or OSMAnd+ the app and the data are two separate things. In a generative model they are not, the “training” data is inherent in the working model.

If you compile the source code of a libre app designed to display GoggleMaps data, performance may vary if you use it with OSM data, but it’s the same app. Every time you compile it. Switch it back to the GM data and it will work exactly as before.

If you create a generative model using the same source code and weights, but a different “training” dataset, you will end up with a totally different model. A different piece of software, that works in a totally different way, and outputs totally different responses to prompts. To reproduce a model, rather than create a different model, you need to use exactly the same dataset as the original.

There is significant evidence that software generators like CoPilot spit out unaltered chunks of the code they’re “trained” on. Which means there’s a risk that any software produced using them violates someone’s copyright. There’s no way to know when this happens, or what license the software being reused is covered by. Banning such software from F-Droid (or any libre software repo) is just common sense.

FYI Copyright offices around the world have ruled that works produced by generative models are not covered by copyright law, because it only cover human expression. So all software produced by generative models is in the public domain, and can’t be copyleft (eg AGPL). Unless there is evidence of substantial human modification of the code generated by the model.

namark · January 28, 2026, 11:37pm

I suggest you basic logic before joining any discussion.

no they are not separate things, the app will be useless without the data, it’s an integral part of the app, and the whole point of the app is to display and work with the data.

what? do you even know what compile means? you can’t compile an app that’s designed to work with OSM to magically work with GM, and if you could you wouldn’t need to compile, you could just freaking switch the datasets at runtime. And once again a map app that is designed to use OSM is not in any sense of the word a “reproduction” of a map that uses GM, it’s a completely different alternative that just does the same thing, how much more can I chew it up for you?

so, if you train a model on 100000 images of dogs it will recognize dogs, and then train the exact same model with different 100000 images of dogs it will suddenly start recognizing cats? If it’s the same kind of data you’re gonna get the same kind of result, not “totally different”. It doesn’t need to be exactly the same to be just as good. OSM is not exactly the same as GM, it’s ok, the universe had not imploded.

that’s about original work produced by the model on it’s own, read original, as in not copies of anything, if chat gpt spits out a copy of Lord of the Rings tomorrow, it wouldn’t suddenly become public domain, sorry to disappoint. Copying is still copying, nobody except you canceled basic logic.

strypey · January 29, 2026, 12:31am

You’re willfully missing the point and being really rude. You’ve done the same thing in the topic about GH. This has become an obstacle to productive discussion, and I’d really like the forum mods to step in here to prevent discussion on these important software freedom topics from being derailed.