Common Crawl maintains a free,open repository of web crawl data that can be used by anyone.
Common Crawl is a 501(c)(3) non–profit founded in 2007. We make wholesale extraction, transformation and analysis of open web data accessible to researchers.
We are pleased to announce a new release of host-level and domain-level web graphs based on the crawls of February, March, and April 2026. The graphs consist of 269.0 million nodes and 9.4 billion edges at the host level, and 124.6 million nodes and 4.8 billion edges at the domain level.