Troubleshooting f-droid.org slowness periods

website

#1

So the website goes through periods of slowness, so we need to map it out to figure out how best to handle it. The good news is that we have solid mirrors, and more coming online. Before we push more users to mirrors, we have to make sure that this will actually improve things.

There is a nice ping monitor that I think @krombel setup:
https://monitor.msg-net.de/d/kXzI4Jliks/worldping-endpoint-f-droid-org

@Bubu @krombel nice service! That should be helpful. Can it ping per IP? f-droid.org is usually mapped to two IPs on two different servers via roundrobin DNS. Also, given that we’ve been seeing successful sessions that require ~8s to connect, it would be worthwhile to include the ping time error level in that monitor
app.


API version Survey
#2

I added another endpoint so you are able to view them separately.
While doing so I also was able to increase the timeout to 10s


#3

Looks like https is kinda down ?!


#4

Turns out I’m working on making the client app aggressively switch to mirrors. There will be an alpha soon, within a week if all goes well.


#5

So I was thinking about setting default timeout for installs to something like 1-5 seconds for TLS Connects and Reads. This would make fdroidclient rapidly switch to mirrors with f-droid.org is slow.

I tested this by setting the default mirror timeout to 1ms, and it went smoothly through all the failures until it hit the second tier timeout value. Then it successfully downloaded.

The tricky thing is that when someone is on slow internet, where it takes slightly longer than the default timeout to connect to a TLS site. There would be quite a bit of churn on every connection, since it would try every mirror first waiting the default timeout for each mirror, then cancel and move on. So someone on slow internet would have mirrors X default-timeout delay added to every connection.

I also checked the mirrors and some sites around the world to get a rough idea of what a normal connect time for TLS is. Connecting to a site on the other side of the world can easily take 1-2 seconds even on a fast connection:

$ for f in f-droid.org fdroid.tetaneutral.net mirror.cyberbits.eu bubu1.eu ftp.fau.de mirror.jarsilio.com www.baidu.com cafebazaar.ir cnnic.com.cn www.ecured.cu; do echo $f; tlsping ${f}:443;done
f-droid.org
tlsping: TLS connection to server f-droid.org:443 (10 connections)
tlsping: min/avg/max/stddev = 4.05s/4.36s/4.69s/186.88ms
fdroid.tetaneutral.net
tlsping: TLS connection to server fdroid.tetaneutral.net:443 (10 connections)
tlsping: min/avg/max/stddev = 273.39ms/278.35ms/284.09ms/3.41ms
mirror.cyberbits.eu
tlsping: TLS connection to server mirror.cyberbits.eu:443 (10 connections)
tlsping: min/avg/max/stddev = 322.27ms/517.76ms/673.33ms/120.69ms
bubu1.eu
tlsping: TLS connection to server bubu1.eu:443 (10 connections)
tlsping: min/avg/max/stddev = 134.85ms/138.98ms/141.55ms/2.54ms
ftp.fau.de
tlsping: TLS connection to server ftp.fau.de:443 (10 connections)
tlsping: min/avg/max/stddev = 184.73ms/188.55ms/190.24ms/1.74ms
mirror.jarsilio.com
tlsping: TLS connection to server mirror.jarsilio.com:443 (10 connections)
tlsping: min/avg/max/stddev = 371.07ms/452.02ms/505.62ms/39.30ms
www.baidu.com
tlsping: TLS connection to server www.baidu.com:443 (10 connections)
tlsping: min/avg/max/stddev = 621.47ms/634.71ms/641.88ms/6.81ms
cafebazaar.ir
tlsping: TLS connection to server cafebazaar.ir:443 (10 connections)
tlsping: min/avg/max/stddev = 348.31ms/358.49ms/366.57ms/5.80ms
cnnic.com.cn
tlsping: TLS connection to server cnnic.com.cn:443 (10 connections)
tlsping: min/avg/max/stddev = 1.21s/1.23s/1.24s/8.94ms
www.ecured.cu
tlsping: TLS connection to server www.ecured.cu:443 (10 connections)
tlsping: min/avg/max/stddev = 665.08ms/668.58ms/671.53ms/2.65ms

I also looked around a little bit for data on the normal latency of mobile networks. It seem that 3500ms is pretty common for 3G connections:

So based on this, I think the default timeout should remain 10 seconds.


#6

So it will wait 10 seconds before using the next one? Ain’t nobody got time for that :confused:


#7

Its not waiting for 10 seconds, its a timeout. It wants for a connection for up to 10 seconds. The timeout is then considered an error, so it then switches to the next mirror. It will then stick with that mirror until it errors out, or F-Droid is restarted. I’m adding a new feature where it will remember the current mirror across restarts, with an Expert pref to toggle it on until it proves stable.


#8

Then I’d start using a random mirror from the start, as we know that for a good amount of people it’ll otherwise take 10 Seconds till anything happens.


#9

Sticking to one repo until it fails would have the benefit that we can assume that there wont be inconsistencies between index and available apps.

So do not require an index update on each app start - only on fail/switch to be absolutely secure.

But I would pick a random mirror in case that “stick mirror” is unset.


#10

Just for usability and simplicity, is it possible for f-droid to use a cdn instead of complex interactions between main server, mirrors and keeping track of what is up and what is down ? What would be the estimated cost to f-droid for using a decent cdn ?