Skip to content Skip to footer

Crawling on internet scale

Brief info

Crawling on internet scale

Crawling on the scale of the internet seems to be an easy task, jumping from one link to the other, downloading HTML and images, saving it for analysis. However, how do you do all that without breaking websites and without crawling duplicates?

In this talk we'll focus on how Google's crawling systems figure out what to crawl, when, and how often to revisit the page.

Level: Everybody

About Gary