Chinese pronunciation guide apps

Hi, I’ve had some Chinese pronunciation-guide browser apps on Google Play since 2014 (basically a web browser + EPUB reader that, when given Chinese text, adds pronunciation guides, dictionary entries, text to speech options etc) and I’m wondering about submitting to F-Droid but I’m not sure if they’re suitable. Would be nice to get some feedback from the community first.

The compilation process involves a form of “machine learning” I made in 2012 (before current LLMs were all the rage), it takes a corpus of example text already marked up with pronunciation guides, and a Python script runs on this for several hours to generate some bytecode in a simple invented virtual machine, which the app’s Java code then executes to quickly convert Chinese to pinyin or whatever while taking context into account.

The obvious problems are (1) you folks aren’t going to want a 5-hour build process and (2) the corpus includes copyrighted material, obtained using a UK legal exemption for responsible private download for non-commercial text mining (as well as informal permission from someone who worked for the organisation it came from, for use in my apps only). So I do not have permission to redistribute the corpus in its “source” form, and I’m not sure if I can give permission for any models generated from it to be used in other apps.

So it looks like the best I can currently give you is full source code of the app + binary blob data file (the data file currently gets updated every month, and on Google Play I just incorporate it in the apk rather than set up separate download infrastructure).

I was thinking an F-Droid release is fairly low priority because anyone who doesn’t have Google Play can still side-load the APKs and the source is available separately (both for the “machine learning” part and the app), it just won’t rebuild as-is without my corpus I can’t open.

Does F-Droid have any established policies for “source is open but uses a binary blob of model data” situations?

Thanks.

Binary data is allowed.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.