Common Crawl maintains a free,open repository of web crawl data that can be used by anyone.
Common Crawl is a 501(c)(3) non–profit founded in 2007. We make wholesale extraction, transformation and analysis of open web data accessible to researchers.
We are pleased to announce a new release of host-level and domain-level web graphs based on the crawls of July, August, and September 2025. The host-level graph consists of 628.7 million nodes and 6.9 billion edges, and the domain-level graph consists of 184.6 million nodes and 5.4 billion edges.
Laurie Burchell
Laurie is a Senior Research Engineer with Common Crawl.