F-Droid policy on libre "AI"

I’ve noticed at least one app installed via F-Droid thay asked me to opt-in to on-device “AI”. Which made me curious about whether the F-Droid team has formulated a policy on this.

Are apps with on-device “AI” built from source by F-Droid volunteers, or are they Reproducible Builds? If it’s the latter, what’s the standard for reproducibility here?

I imagine the bar would be higher than in the watered down "Open Source AI Definition” that Mufelli pushed through the OSI. Which suggests a need for FSF or another uncaptured software freedom organisation to draft a libre AI definition. Teasing out how the FSD applies to generative models, and other forms of “AI” whose reproduction from source fundamentally depend on elements other than just their full code.

What about vibe coding? Cory Doctorow reports that all outputs of generative models are public domain by default. Subject to copyright only if a human author can prove substantial creative input. This creates a number of dark spots on the map we need to explore;

  • Is auto-generated and therefore public domain software suitable for inclusion in F-Droid?
  • If copyright is claimed on vibe coded software, is the author subject to copyleft licenses on code used to ”train” the model they used?
  • If so, how would we determine whether that applies to Vibe App A?
  • If we determine it does, which license applies?

Any model trained on GH code has ingested software under many incompatible licenses. Potentially including Source Available licenses like MongoDB’s SPPL.

  • Does that mean software it produces is automatically under under the most restrictive license in the training data if someone claims authorship for copyright purposes?
  • If there are multiple Source Available licenses in the “training” data, and they contradict, which applies?

BorgSoft have opened a huge can of worms here, and I doubt they’re going to like what comes out ; )

Also, Is there a need for a new anti-feature flag here? Something like “contains a generative model sometimes called’AI’. Or would that be excessive?

1 Like

Is the app name secret?

The whole text is confusing to me. :person_shrugging:

You talk about at least 3 issues but all mixed up in one big issue?

Is this about “on device AI processing”? Privacy or something

About “app coded with AI”? License issue, maybe

About “app uses AI model”? Non free asset issue, I guess

About “app generates stuff on device”? License issue

:slight_smile:

2 Likes

I could tell you, but I’d have to wipe your memory afterwards :rofl:

But seriously, I’ll try to remember, but it’s kind of beside the point. What I’m asking about is the policy approach to the tech described in the OSAID.

When I have that experience I have a think about, read it again, and think about it some more before coming back with a reply. But that’s just me :grin:

Ah, you wrote with an AI just to prove a point. :rofl:

So … am I right in thinking that F-Droid has no policy on when “AI” counts as libre and when it doesn’t, and it hasn’t even been discussed?

QQ: is there actually Libre in AI/LLM etc, & is AI actually doing any hard work by itself?

Somewhat related topic…Gentoo added no-AI policy for their wiki.
Project:Council/AI policy - Gentoo wiki | comments

It is expressly forbidden to contribute to Gentoo any content that has been created with the assistance of Natural Language Processing artificial intelligence tools. This motion can be revisited, should a case been made over such a tool that does not pose copyright, ethical and quality concerns.

Offtopic:

https://archive.is/VJ1DN | The Job Market Is Hell

Young people are using ChatGPT to write their applications; HR is using AI to read them; no one is getting hired.

2 Likes

I mean… These concerns and issues aside.

There’s a good amount of people that look at AI as what would describe an anti feature regardless. And honestly in it’s current condition I do too. But that’s a topic for another day.

So it might be good to consider making a anti feature for it for the time being anyways? After all these anti features are meant to allow the user to make an informed decision. Not punish the developer.

I don’t know maybe I’m wrong here I just think it could be something worth considering.

1 Like

Yes, but it should still be an Anti-Feature instead of a feature. If the AI model is not under a free license then it’s covered by NonFreeAsset. If it’s a free model then I don’t know why it’s an Anti-Feature.

3 Likes

The OP and the thread are bit all over the place, as is expected from a buzzword topic, but the overall concern is certainly justified. I don’t think it’s enough to just treat neural networks as assets. It’s a program that runs on your device that cannot be audited or modified. Exactly how undesirable it can be depends on what it is allowed to interface with. It can be as harmless as an asset or as dangerous as a binary blob, and slapping a creative commons license on a binary blob doesn’t suddenly make it free software.

I hope F-Droid (and/or other even better established names in FOSS like debian or gnu) will someday become big enough to be able to provide verifyable reproducable neural netwoks trained on their servers, then you could treat the models as any other project, requiring the training setup as source code and data sets as assets. Untill then I think it warrants a new anti-feature tag, that will allow an informed user to infer the level of danger: if it’s an image generator, then yeah - equivalent of a non free asset; if it’s a personal assistant that would take over your entire device, then need to be absolutely sure you trust where the models come from.

I personally ocasionally use Sayboard which is a speech recognition keyboard. I have no complaints about it specifically, and the models they are using come from Vosk, which at a glance seems promising, as its main “selling feature” seems to be small models than can run without problems on low end devices, and therfore should also be not as resource intensive to train from scratch independently, and their FAQ seems to encourage that, but just to illustrate my point in theory, the provided models could be trained to, for example, intentionally modify or completely cencor certain words or phrases. I would notice that and be mildly annoyed and go back to a normal keyboard, but someone who is visually impaired might not have that option or might not even notice it happening… so yeah, even something as seemingly harmless as speech to/from text can be a problem in some contexts.

1 Like

Other examples:

All the issues I laid out there stem from the various copyright and software freedom questions inherent in generative models, or as I like to call it, MOLE Training. I thought it might be helpful to get a general sense of where the community’s thinking is on all this.

From reading the responses so far, I can see that I’m not alone in being concerned about this. I think it would be worth the F-Droid community’s time to do some reading and thinking about this, in some depth.

I’m happy to help with this is any way I can. Some good reading materials are linked in my MOLE Training blog piece, and I particular recommend David Chapman’s book.

EDIT: Drew DeVault’s blog piece on the software freedom issues created by “machine learning“ also gives a basic intro.

EDIT 2: The Register article on copyright violations potentially inherent in using source code under libre licenses for MOLE Training.

1 Like

You don’t need a separate definition of open source AI anymore than you need a seperate definition of open source java bytecode. It’s all already covered by the normal definition of FOSS. The datasets that it’s trained on are no different than any dataset any other type of software might rely on, like a map or a weather forecast. There are only practical differences that make it difficult for any grassroots ecosystem to support

  • the training is computationally intensive and pretty much requires specialized hardware
  • most existing projects rely on proprietary toolchains for said hardware (thanks LLVM)
  • the datasets are hugenourmous

you need big $$$ to train (or even run) state of the art models, and even if you stick to smaller/simpler ones you’ll probably blow your budget and still be a sad snail everyone would laugh at for being a snail. This is the biggest threat to FOSS from this tech, that no definitions are going to save you from, doesn’t matter how freelicious and opentastic the source is if the big bros model is always going to be better than yours cause they got all the GPUs.

The AI infirging on copyright is a completely different issue, that has nothing to do with definition of FOSS. As far as I’m concerned inanimate objects have no rights or obligation, all right to publish anything and obligations to follow licenses lie on the individual or the organization that does it. If you can prove that what they have published is a copy or a derivative work it doesn’t matter what they used to produce it. I can take a piece of your code compile, decompile, transpile, fuzz, cross-compile and publish as binary blob, then you’d have to resort to foreniscs to prove it’s derivative. If AI is the new shiny haxor tool that defeats your inspector gadget forensics suite, well then get yourself a better AI powered one or something. If anything it’s those who would overuse the AI that should be paranoid, lest their little helper spits out someone elses code verbatim and brings a lawsuit on their heads.

A fully libre Goggle Maps app, for example, could be tweaked to use OSM as its data source instead. The source code gives you everything you need to reproduce the app from scratch.

As Tara’s blog post explains (and mine, links above), with an app that includes a generative model, the original training data is required to reproduce the app. If that data is not under compatible libre licenses, you can’t reproduce the app from scratch without complying with (or breaking) proprietary licensing terms.

These are very different cases.

Tweaked google maps that uses osm is not a reproduction of google maps. If the propriatary map data set is somehow superior to the libre data set then the the map app that uses it will be superior. Same with these models, if you have an equivalent libre data set you can produce an equivalent (“tweaked” as you call it) model, it wouldn’t be a reproduction and it doesn’t need to be a reproduction, it only needs to be competitive.

Sidenote, while I concider the various bans on machine assited contributions totally “off-topic, ban, reeeeee” in discussion of F-Droid policies, I have to point out that they are aditionally extremely short sighted. You don’t need something the size or scope of copilot to help someone write/dictate and edit code comfortably and efficiently. That is what the thing is actually good for, and smaller lighter solution can be competitive on those merits and you should hope the author of that new project feels included enough in FOSS movement to release the relevant parts under AGPL, instead of being alienated by all this political BS to put it under MIT and get promptly snached up by the big bro next door. If you want to boycot a specific company and their product, then you should do it by name.