Thanks for the positive update! I suppose there’s no way to get access to these failed build artifacts though? I guess I would just have to set up the fdroid build environment myself if I were to want to try to resolve this kind of issue myself?
Thanks for getting back. For instance, both x86 and x86_64 are still not building reproducibly, and therefore the updates to element-x are not migrating to f-droid. Presumably, to fix that issue, someone would need to understand what exactly is causing differences in those built apks, and then modify the build scripts to not trip over that issue.
I do not know where else to begin diagnosing that issue without the unreproducibly-built artifact. There are tools to disassemble binaries and archives that can help identify what exactly is different, going beyond just reporting that particular bits in the file are different. And, if those tools do not yet support particular files, those tools could be extended so that they do.
Stated differently: how did you (I was originally going to say “someone” but you’re the hero of this story!) figure out that limiting the number of cores on which you compile element X to 7 caused the apk to compile reproducibly? Did you not look at the broken apk and notice that some sections of some bit of some binary within it were out of order, and then guess that it was some race that might be mitigated by reducing the amount of paralellization?
we have some contrib machines so we can build, besides the CI. The VMs allow us to enter and look an the state directly, maybe try some tricks there. As opposed to the Ci or the main build server where you get a log and an APK (if so).
now, not all machines are the same, we have several with 24cores/32GbRAM but also a smaller one with 8 cores/16Gb.
In past apps, we saw the same error, baseline.prof file non-deterministic, so when testing I used the smaller machine and its 6cores/12Gb RAM VM and was confused to see that… the error was not present here.
Then retried on the bigger ones and voila… error again
difference being cores? let’s try limiting
badum-tss… discovered that even on 24cores machine, IF WE LIMIT to 6 (or 7 for some reason), suddenly apps that errored out on main server either error out less (say 3 times out of ten vs ten) or not error out at all