Slow webserver response

hans · September 20, 2019, 3:51pm

@Joost great find on those graphs! I can’t believe there was a day with 1TB of traffic to the mirror! No wonder f-droid.org is overloaded, that’s a lot of traffic. There is 3 mirrors and f-droid.org, so 4x that fau number gives us a rough total.

With the caching servers acting as mirrors, we’re in effect building our own CDN. It would be a great recipe to have documented, so people can easily setup a low cost regional mirror. @Joost any interest in trying to run a caching server anywhere outside of the EU? f-droid.org and almost all the mirrors are in the EU.

@Ox0p54 you need about 400GB of disk space to run a full mirror, currently.

hans · September 20, 2019, 3:53pm

About profiling the bandwidth usage, some big spikes could be when people make a local mirror from ftp.fau.de. That would be a very large single transfer. I think we can see those by looking at the file count graph:

hans · September 20, 2019, 4:12pm

For anyone who wants to work with an archive of f-droid.org index files, I maintain an unofficial one here: f-droid.org - Google Drive

hans · September 30, 2019, 9:58am

@Joost it would be very useful to have the ideas that you are proposing in this thread broken out into issues on gitlab so that we can track them, and remember them. You can file them in Issues · F-Droid / admin · GitLab, then I can move any if they belong better elsewhere.

hans · October 4, 2019, 8:55am

The website was feeling pretty fast yesterday afternoon, when it would normally be slow. I think the fixes are starting to add up. One specific fix hit recently: Graphic hash filename for caching (!669) · Merge requests · F-Droid / fdroidserver · GitLab

The icon and feature graphic now is served with a filename based on the sha256 of the file contents. So that means they can safely be cached forever without even a network check. For example, see the image URL of the icon here:

The Android client was always set to cache graphics forever based on name, and graphics with SHA256 are now set to have 1 year cache time on the website.

Joost · October 4, 2019, 9:49am

That is a good step forward. I checked with curl and got “Cache-Control: max-age=31536000, public, immutable” for that icon.

Filing issues is on my todo list.

I noticed two days ago that Aurora Droid has changed its configuration for the F-Droid repo. They now use F-Droid as the default mirror instead of F-Droid. This change likely took a significant load off the servers.

hans · October 23, 2019, 7:46am

Just checking in on whether you’ve found time to file those issues.

Also, since those graphs you made are quite nice, here’s a recent data dump from my monitor box:
5.9.48.82.csv.zip (381.2 KB) 107.150.51.2.csv.zip (350.5 KB) 148.251.140.42.csv.zip (377.7 KB) webserver.csv.zip (355.6 KB)

Joost · October 23, 2019, 10:12am

Did someone change anything on 19-9?

hans · November 4, 2019, 8:52am

Wow, that’s a stunning improvement. Unfortunately, I think that change comes from something besides our efforts. Someone mentioned that Aurora Store recently switched to using mirror sites. I’m guessing that happened around then. But then again, that change looks too sudden to have happened via software updates. So maybe it was something else.

As part of fixing the thousand paper cuts of inefficiency of the website, I’m pushing two changes to the .htaccess, one makes the HEAD cache check responses ~1/3 the size and the other sets permanent caching for the extracted icons.

Sorry for the very slow response, I expected an email notifcation, but it seems in never came.

Licaon_Kter · November 4, 2019, 9:28am

Aurora Droid maybe, Store is about a different “store”

Anyway, lately the site loads rather instantly most of the time, I’m impressed given the old situation.

hans · November 4, 2019, 9:46am

good to hear There are still key optimizations to be done to make
the site more resilient. For example, if the site used the Jekyll
plugin to add a hash to the file names for all the JS, CSS, PNG files,
then those files could be set to cache forever, and there would only be
a cache check if those files changed. That means for most page loads,
there would only be a cache check on the initial index.html load, then
the rest would load straight from the local cache.

Joost · November 4, 2019, 12:28pm

Yes I did, but that was around 2-10-2019, two weeks later and as you said, the change would not be that sudden. We did not rule out DDOS and perhaps there was an attack that suddenly stopped?

That raises the question again, how to monitor servers and debug issues like this?

We don’t know if that will be a major improvement or not. Maybe the actual apk packages are 99% of the data and we are focussing on the 1%. We don’t know.

We do know that cache headers can improve performance and reduce traffic, at least between the origin server and the caches, so I suggest we focus on applying cache headers on all content and then ask the server maintainer to look at the logs and tell us what load remains.

If we can cache website assets for only 1 hour with immutable flag, that means that the origin server only sees (number of caches) / (caching time) = 3 requests per hour for each object. Extending this to a year or so may not be a massive improvement. It is mostly the immutable flag that prevents a possibly huge number of “304 Not Modified” responses.
Further downstream are the client web browsers accessing the website and the F-droid client app. We do not currently know what the traffic ratio is between those two.

Joost · November 4, 2019, 12:56pm

I notice many objects have the cache header now, did that change on 19-9?
Some missing are:
*.apk
*.apk.asc
*.tar.gz

The apk is the most important I think.

Example:
$ curl -IsS “https://f-droid.org/assets/favicon-16x16.png” | grep -i cache
Cache-Control: max-age=43200, public, immutable

Joost · November 4, 2019, 2:14pm

This would also improve performance: Repo Details does not show all mirrors, its a scroll within a scroll (#1865) · Issues · F-Droid / Client · GitLab
(point 3)

Mirrors are only used by the client app for downloading APK files. That leaves a lot of other data like index, icons, screenshots, etc. going through f-droid.org even if the user disabled the main mirror.

In case F-droid is experiencing slowness or downtime for whatever reason, people cannot just disable the main mirror or rely on the mirror logic to select another for all requests. The only way to get around that is to completely remove the fdroid repo and re-add it from a mirror link.

hans · November 4, 2019, 4:36pm

Ok, the traffic dropoff is a mystery, perhaps it was a DDoS then.
Caching was rolled out at different times around then.

My guess is that those files are rarely downloaded, especially in a
quantity that would affect the load. Plus they are security-sensitive,
so I’d rather be sure people are getting the correct file than save a
little by caching these. _fdroidclient already has its own caching
logic for those.

fdroidclient will automatically try a mirror if it cannot download the
index file from the main site. That’s also a security-sensitive area,
the index is the key to getting updates, so reliability and timeliness
need to come before caching for the index.

system · January 3, 2020, 4:36pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

hans · February 11, 2020, 8:38am

Since there is already lots of great info in this thread, I wanted to reopen it. We know have a host that can give us instances in Amsterdam, Hong Kong, and Miami as well as an Anycast IP. So we just need a reproducible design for a caching frontend server. @Joost you have that running already, can you publish the config for that? Ideally that would be put into ansible, since the rest of our servers are in ansible.

Here’s the host:
https://eclips.is/

Joost · February 19, 2020, 6:13pm

Hi @hans this config file is from https://fdroidmirror.net/ and it includes a cached copy of the fdroid website. The comments should explain things, but just ask if it is not clear.

I have been tweaking the ssl parameters to get the best combination of performance, security and compatibility. The site is using my own acme.sh certificate manager to get both RSA and EC certificates from letsencrypt. I think it is an improvement over the current fdroid ssl configuration. You may run the site through ssllabs to check.

Joost · February 19, 2020, 6:45pm

That organisation is an extension of US government. Consider all traffic and everything on their servers compromised.

From their own site “Partners: Open Technology Fund”
→ Parent organisation: U.S. Agency for Global Media
→ Predecessor: United States Information Agency
→ Superseding agencies: United States Department of State, U.S. Agency for Global Media.

They attract “Liberation technology developers”. I had never heard of that neoconservative slang, but “Liberation technology” appears to be the art of toppling foreign governments and instigating wars all over the planet using digital means.