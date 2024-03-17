For more than 15 years, Google Safe Browsing has been protecting users from phishing, malware, unwanted software and more, by identifying and warning users about potentially abusive sites on more than 5 billion devices around the world. As attackers grow more sophisticated, we’ve seen the need for protections that can adapt as quickly as the threats they defend against. That’s why we’re excited to announce a new version of Safe Browsing that will provide real-time, privacy-preserving URL protection for people using the Standard protection mode of Safe Browsing in Chrome.↫ Jasika Bawa, Xinghui Lu, Jonathan Li, and Alex Wozniak on the Google blog
Reading through the description of how this new feature works, it does indeed seem to respect one’s privacy, but there could be so many devils in so many details here that you’d really need to be a specialist in these matters to truly gauge if Google isn’t getting its hands on the URLs you visit through this feature.
But even if all that is true, it doesn’t really matter because Google has tons of other ways to collect more than enough data on you to build an exact profile of you are, and what advertisements will work well no you. Any time Google goes out of its way to announce it’s not collecting some type of data – like here, the URLs you type into the Chrome URL bar – it’s not because they care so much about your privacy, but because they simply don’t need this data to begin with.
This uses both a whitelist and a blacklist. I don’t know what criteria google uses to populate either list, but it’s not relevant to whether the mechanism itself leaks data. “Safe” sites that are in the whitelist will come up faster than “safe” sites that aren’t in the whitelist. Arguably this could be used to give google sites an unfair performance boost over non google sites. But in terms of privacy, the whitelist leaks nothing.
They’ve taken certain steps to mitigate leaking via the blacklist queries:
1. The urls are all hashed and the hashes are truncated to 32bit. This is not enough to statistically identify a page by itself, however the paper leads me to believe that the algorithm doesn’t just perform a single query per url, but uses the following algorithm to generate several hashed queries per url.
If this is so, then the amount of information leaked is actually somewhat higher than a single 32bit hashes would.
The google paper did not cover this threat model, but IMHO it should have.
2. if a browsing session contains a long sequence of such hashes over time, then collectively they might significantly whittle down the set of possible candidates. Enough queries might correlate the session to a set of related urls.
3. The 32bit url hash queries are sent through an intermediary party to strip off user metadata and provide anonymity. Furthermore chrome encrypts the hashes so that this intermediary party doesn’t know which hashes are being queried, only google. This can be considered private at face value. But it does assume that neither the 3rd party or google are working in cahoots with each other or with government agencies, which obviously breaks the mobile. As such there’s a degree of trust in play.
4. Given that google controls chrome, clearly they could configure chrome to bypass the 3rd party if they wanted to. Or even more devious would be to use the encrypted “safe browsing” channel as a secret channel for leaking information right through the oblivious 3rd party. Not saying they actually do this but just pointing out how we have to trust that chrome’s implementation is actually faithful to the published spec and doesn’t take liberty in introducing hidden “features”.
5. Assuming the NSA somehow got the privacy key to decrypt chrome’s hash requests, then theoretically the safe browsing hash queries might aid the NSA to identify urls in otherwise fully encrypted TLS traffic. With a wiretap they’d know the server and hostname the TLS traffic is connected to, so the 32bit url hashes along with other metadata would likely prove extremely valuable for their signals intelligence operations.
if you believe the governments have access to the safe browsing hash decryption key, then safe browsing could be an additional risk for those who are being tracked by covert government entities.