Slow webserver response

Those boxes are most likely Xen VMs, so the kernel is updated without rebooting guest instances. The load does seem to reliably follow index updates, so I think this issue is most likely to be related to that. But that is something to test. Could you run your tests on a schedule? Then we can compare that to the schedule of the index updates.

2 Likes

I crafted a simple script to schedule the test and plotted the response time in seconds for fetching the tiny favicon from 5.9.48.82. Timeout is set to 60s. It has been running for about a day now.

The results are weird and I am not sure what to make of it. Just some thoughts:
Certain response times are much more common than others. It looks like there is a threshold effect caused by something that hits a certain limit. I know it sounds vague, I am just guessing and speculating. We could be looking at a server misconfiguration, an underpowered VM acting up or perhaps other things are running on the VM that eat resources? I probed a couple other nearby IP’s for https services and those are very vast, probably just idling. It does not look like a general ISP issue.

But no matter what the cause, this should not be happening in the first place. It has been going on for a very long time. Apparently nobody is monitoring those two proxies and carefully tuning its performance. The caches report “Server: Apache” in their http response headers and just a quick google search reveals there are better options available.

Since I got a nearly google free phone I depend on F-droid for many apps and I would like to see things improve. My home server has only 30Mbit/sec upload, not fast enough for a cache or mirror I guess, but I am thinking VM’s are cheap and perhaps I could get one to tinker with. Please let me know if I could be of any help.

2 Likes

Nice plot! Looks like the majority of those are very fast. It will be interesting to see it going through the whole release cycles to see if clear patterns emerge.

We have been monitoring this and know at least one promising solution: replacing the wiki with static files and switching fully to a CDN model for hosting. That requires a chunk of work, so that’s the most effective place to contribute right now. That mostly means replacing the code in fdroidserver that posts build updates to the wiki and making it instead post static files, e.g. build logs.

1 Like

I just set your script to monitor both IPs on another server

1 Like

Also, another thing that someone could take on as a project is improving the website caching. That is a clear win no matter what the server configuration is.

1 Like

I have been away for a few days and just created this new chart. It does not look good.
Note that times are in CEST (UTC +2)

I fully agree with improving the caching. Unfortunately I am not much of a programmer so I would hesitate to contribute to the F-droid sources, but I have some experience managing Linux (fedora) servers. Perhaps I can contribute with the caching stuff and read up on CDN.

Some thoughts about that:

  1. I dont think we need geographically distributed caches at this point, but we do need enough capacity. Currently the delays are mostly in the servers, not much in the transit over the internet.
  2. Cache-Control: immutable can be implemented for static content without changing anything else.
    When a client supporting immutable sees this attribute it should assume that the resource, if unexpired, is unchanged on the server and therefore should not send a conditional revalidation for it (e.g. If-None-Match or If-Modified-Since) to check for updates.
    Current configuration for the favicon is “Cache-Control: max-age=43200, public”, that is 12 hours expiration time. To get started, we could bump the expiration to 24h and apply ‘immutable’ only to images. Getting an outdated image would be relatively harmless I suppose. Something like
    <filesMatch ".(png|jpg)$"> Header set Cache-Control "public, max-age=86400, immutable" </filesMatch>
    Then monitor the effect on the servers. Later we could add other types like css, js and eventually the actual apk downloads.
  3. The Hetzner VPS may be very resource limited, they are cheap and you get what you pay for I suppose. Unless someone provides the cash for a more powerful solution, we can only get more performance with careful tuning. I could get a VPS and see what I can do if you think that is a good idea.
2 Likes

Cache-Control: immutable sounds like the best quick solution. I temporarily enabled it on f-droid.org, until the next website auto-deploy. That should give us about a day to test.

1 Like

I opened this to discuss the details:

1 Like

Thanks for the quick action.
The f-droid app displays the images instantly if they were loaded in a previous session.
But with my desktop browser I dont see the immutable property anywhere.
(tested with Opera/Chrome Devtools, click network tab then force reload the page with shift-F5)

1 Like

I just found a bug in the HTML caching: the pattern is *.html but the actual files include the locale in them, e.g. index.html.en, index.html.zh_hans. So they were not getting Cache-Control headers.

1 Like

Since it looks like the webserver is far from overloaded, so the issue must either be in the caching front-facing servers, or the interaction between those and the webserver. I think it would be worthwhile, if you are still up for it, to set up an Apache reverse proxy caching webserver and see how that performs.

1 Like

I ran your timing script on a server in Oregon. It runs against the two caching servers, and the webservers. The data is attached here:

Could you make your nice graphs out of those?

1 Like

Unfortunately the files you posted are a bit messed up because we have different locales on our systems and the ‘time’ command behaves differently.
On US locale it outputs floating point numbers and in (I guess) DE locale it outputs floating comma. On yours it also echos the curl commandline.
This modified version may work better: #!/bin/bashexport LC_ALL=en_US.UTF-8FILE="/mnt/data/temp/timelog.csv"CMD=" - Pastebin.com

But dont worry, the information is in there, just need to apply some text filtering. I am working on it.

Edit: Here they are!



3 Likes

The banding pattern from my first charts is not very visible in yours, maybe that is because ping times are longer. I seem to be very close to the server in terms of network delays.

The single dot in the 5.9.48.82 graph at 0.2s is a curl error 7: “Failed to connect() to host or proxy.”

All three show the same kind of delays, only “Webserver” much less, likely due to much lower traffic volume.

To find the cause of the delays someone should log into one of the proxies and do some detective work. There is not enough traffic on the main server to do it there. If this is not possible than we need to set up another proxy as you suggested. I could do that.

1 Like

Only @CiaranG has access to those proxies, and these days, he’s rarely available for this kind of thing.

1 Like

If these are private servers it makes perfect sense he is not sharing accounts, but IMHO f-droid should have at least one proxy under full control. This means at least two people with sudo power and one actively maintaining it.

With your permission I will register and get the € 3.01 VPS from Hetzner, create two accounts and we’ll see what we can do with it. That’ll be my contribution to f-droid.

I have seen the immutable property in action, that is great! Thanks for your efforts. When hitting F5 (not shift-F5 that forces full reload) the images and other resources are loading in 25-150 microseconds from cache on my pc. The one slow resource request appeared to be a bug that results in “500 Internal Server Error”:

Some packages listed in | F-Droid - Free and Open Source Android App Repository (like /d/gapps and Addi) have no icon associated and this results in a broken image URL pointing to F-Droid while it probably should point to a default icon like https://f-droid.org/assets/ic_repo_app_default.png

3 Likes

Another round of profiling data, which should also cover the roll out of the new cache settings:

I updated my copy of the script with your changes. Sorry, those files I just posted are still the old format, hopefully you still have your script to clean them up.

1 Like

Also, I’ve started asking @CiaranG to add another caching instance. I already have sponsored VMs from a trusted host for the official one. Maybe it is worthwhile to have you setup another caching webserver for staging.f-droid.org? Then we can have a full copy of the setup to test with.

1 Like

I found it in the command history:

cat webserver.csv|perl -pe 's/\+\+.*\n//'| sed 's/,/./2' > webserver_fixed.csv

Ok I will get a VPS then :slight_smile:

1 Like

if you need some hosting ping me

2 Likes