
Crawling on internet scale
Brief info
Crawling on internet scale
Crawling on the scale of the internet seems to be an easy task, jumping from one link to the other, downloading HTML and images, saving it for analysis. However, how do you do all that without breaking websites and without crawling duplicates?
In this talk we'll focus on how Google's crawling systems figure out what to crawl, when, and how often to revisit the page.
Level: Everybody