@Joost great find on those graphs! I can’t believe there was a day with 1TB of traffic to the mirror! No wonder f-droid.org is overloaded, that’s a lot of traffic. There is 3 mirrors and f-droid.org, so 4x that fau number gives us a rough total.
With the caching servers acting as mirrors, we’re in effect building our own CDN. It would be a great recipe to have documented, so people can easily setup a low cost regional mirror. @Joost any interest in trying to run a caching server anywhere outside of the EU? f-droid.org and almost all the mirrors are in the EU.
@Ox0p54 you need about 400GB of disk space to run a full mirror, currently.
About profiling the bandwidth usage, some big spikes could be when people make a local mirror from ftp.fau.de. That would be a very large single transfer. I think we can see those by looking at the file count graph:
@Joost it would be very useful to have the ideas that you are proposing in this thread broken out into issues on gitlab so that we can track them, and remember them. You can file them in Issues · F-Droid / admin · GitLab, then I can move any if they belong better elsewhere.
The icon and feature graphic now is served with a filename based on the sha256 of the file contents. So that means they can safely be cached forever without even a network check. For example, see the image URL of the icon here:
The Android client was always set to cache graphics forever based on name, and graphics with SHA256 are now set to have 1 year cache time on the website.
That is a good step forward. I checked with curl and got “Cache-Control: max-age=31536000, public, immutable” for that icon.
Filing issues is on my todo list.
I noticed two days ago that Aurora Droid has changed its configuration for the F-Droid repo. They now use F-Droid as the default mirror instead of F-Droid. This change likely took a significant load off the servers.
Wow, that’s a stunning improvement. Unfortunately, I think that change comes from something besides our efforts. Someone mentioned that Aurora Store recently switched to using mirror sites. I’m guessing that happened around then. But then again, that change looks too sudden to have happened via software updates. So maybe it was something else.
As part of fixing the thousand paper cuts of inefficiency of the website, I’m pushing two changes to the .htaccess, one makes the HEAD cache check responses ~1/3 the size and the other sets permanent caching for the extracted icons.
Sorry for the very slow response, I expected an email notifcation, but it seems in never came.
good to hear There are still key optimizations to be done to make
the site more resilient. For example, if the site used the Jekyll
plugin to add a hash to the file names for all the JS, CSS, PNG files,
then those files could be set to cache forever, and there would only be
a cache check if those files changed. That means for most page loads,
there would only be a cache check on the initial index.html load, then
the rest would load straight from the local cache.
Yes I did, but that was around 2-10-2019, two weeks later and as you said, the change would not be that sudden. We did not rule out DDOS and perhaps there was an attack that suddenly stopped?
That raises the question again, how to monitor servers and debug issues like this?
We don’t know if that will be a major improvement or not. Maybe the actual apk packages are 99% of the data and we are focussing on the 1%. We don’t know.
We do know that cache headers can improve performance and reduce traffic, at least between the origin server and the caches, so I suggest we focus on applying cache headers on all content and then ask the server maintainer to look at the logs and tell us what load remains.
If we can cache website assets for only 1 hour with immutable flag, that means that the origin server only sees (number of caches) / (caching time) = 3 requests per hour for each object. Extending this to a year or so may not be a massive improvement. It is mostly the immutable flag that prevents a possibly huge number of “304 Not Modified” responses.
Further downstream are the client web browsers accessing the website and the F-droid client app. We do not currently know what the traffic ratio is between those two.
Mirrors are only used by the client app for downloading APK files. That leaves a lot of other data like index, icons, screenshots, etc. going through f-droid.org even if the user disabled the main mirror.
In case F-droid is experiencing slowness or downtime for whatever reason, people cannot just disable the main mirror or rely on the mirror logic to select another for all requests. The only way to get around that is to completely remove the fdroid repo and re-add it from a mirror link.
Ok, the traffic dropoff is a mystery, perhaps it was a DDoS then.
Caching was rolled out at different times around then.
My guess is that those files are rarely downloaded, especially in a
quantity that would affect the load. Plus they are security-sensitive,
so I’d rather be sure people are getting the correct file than save a
little by caching these. _fdroidclient already has its own caching
logic for those.
fdroidclient will automatically try a mirror if it cannot download the
index file from the main site. That’s also a security-sensitive area,
the index is the key to getting updates, so reliability and timeliness
need to come before caching for the index.
Since there is already lots of great info in this thread, I wanted to reopen it. We know have a host that can give us instances in Amsterdam, Hong Kong, and Miami as well as an Anycast IP. So we just need a reproducible design for a caching frontend server. @Joost you have that running already, can you publish the config for that? Ideally that would be put into ansible, since the rest of our servers are in ansible.
I have been tweaking the ssl parameters to get the best combination of performance, security and compatibility. The site is using my own acme.sh certificate manager to get both RSA and EC certificates from letsencrypt. I think it is an improvement over the current fdroid ssl configuration. You may run the site through ssllabs to check.
That organisation is an extension of US government. Consider all traffic and everything on their servers compromised.
From their own site “Partners: Open Technology Fund”
→ Parent organisation: U.S. Agency for Global Media
→ Predecessor: United States Information Agency
→ Superseding agencies: United States Department of State, U.S. Agency for Global Media.
They attract “Liberation technology developers”. I had never heard of that neoconservative slang, but “Liberation technology” appears to be the art of toppling foreign governments and instigating wars all over the planet using digital means.