I am reposting this because I was shadowbanned from Reddit with no reason given by their Anti-Evil Operations Team for several days which means nobody saw this even after it was restored:
Relevant content on the failure of anonymization:
Our experimental analysis estimates the rate of such collisions and shows that hashing and truncation fails to prevent re-identification when a user visits small-sized domains or certain URLs of larger domains. We further materialize this in the form of an algorithm that Google and Yandex could potentially employ to track users. We conclude this work by providing an analysis of the databases of Google and Yandex (Section 7). By crawling their databases, we detect a number of “suspicious” prefixes that we call orphans. Orphans trigger communication with the servers, but no full digest corresponds to them. We also observe several URLs which have multiples prefixes included in the blacklists. These provide concrete examples of URLs and domains that can be easily tracked by Google and Yandex.
Does it have the potential of sending full URL-s to Google? Yes, it does. From the page given by you:
“Otherwise, send the binary file’s metadata to the remote application reputation server (browser.safebrowsing.downloads.remote.url) and block the download if the server indicates that the file isn’t safe.”
with the link on “metadata” leading to parts of code where there is setting in request properties of origin URL. If I read code correctly - https://dxr.mozilla.org/mozilla-central/source/toolkit/components/reputationservice/ApplicationReputation.cpp#1306 - it is stripped from query params, but full hostname + path ARE included in this case.
#ff #firefox #Google