Common Crawl Blog

Common Crawl Foundation Opt-Out Registry




April 2021 crawl archive now available




June 2021 crawl archive now available




May 2021 crawl archive now available




February/March 2021 crawl archive now available




Host- and Domain-Level Web Graphs October, November/December 2020 and January 2021




Host- and Domain-Level Web Graphs Jul/Aug/Sep 2020




November/December 2020 crawl archive now available




January 2021 crawl archive now available




October 2020 crawl archive now available




Interactive Webgraph Statistics Notebook Released




September 2020 crawl archive now available




August 2020 crawl archive now available




July 2020 crawl archive now available




Host- and Domain-Level Web Graphs Feb/Mar/May 2020




May/June 2020 crawl archive now available




February 2020 crawl archive now available




March/April 2020 crawl archive now available




Host- and Domain-Level Web Graphs Nov/Dec/Jan 2019 – 2020




January 2020 crawl archive now available




December 2019 crawl archive now available




Host- and Domain-Level Web Graphs May/June/July 2019




November 2019 crawl archive now available




Host- and Domain-Level Web Graphs Aug/Sep/Oct 2019




October 2019 crawl archive now available




September 2019 crawl archive now available




August 2019 crawl archive now available




July 2019 crawl archive now available




May 2019 crawl archive now available




June 2019 crawl archive now available




Host- and Domain-Level Web Graphs Feb/Mar/Apr 2019




April 2019 crawl archive now available




February 2019 crawl archive now available




March 2019 crawl archive now available




Host- and Domain-Level Web Graphs Nov/Dec/Jan 2018 - 2019




October 2018 crawl archive now available




January 2019 crawl archive now available




November 2018 crawl archive now available




December 2018 crawl archive now available




Host- and Domain-Level Web Graphs Aug/Sep/Oct 2018




June 2018 Crawl Archive Now Available




September 2018 crawl archive now available




August Crawl Archive Introduces Language Annotations




3.25 Billion Pages Crawled in July 2018




Host- and Domain-Level Web Graphs May/June/July 2018




May 2018 Crawl Archive Now Available




Host- and Domain-Level Web Graphs Feb/Mar/Apr 2018




April 2018 Crawl Archive Now Available




Index to WARC Files and URLs in Columnar Format




February 2018 Crawl Archive Now Available




March 2018 Crawl Archive Now Available




Host- and Domain-Level Web Graphs Nov/Dec/Jan 2017-2018




January 2018 Crawl Archive Now Available




December 2017 Crawl Archive Now Available




November 2017 Crawl Archive Now Available




Host- and Domain-Level Web Graphs Aug/Sept/Oct 2017




October 2017 Crawl Archive Now Available




September 2017 Crawl Archive Now Available




August 2017 Crawl Archive Now Available




June 2017 Crawl Archive Now Available




Now Available: Host- and Domain-Level Web Graphs




July 2017 Crawl Archive Now Available




May 2017 Crawl Archive Now Available




Common Crawl's First In-House Web Graph




April 2017 Crawl Archive Now Available




March 2017 Crawl Archive Now Available




February 2017 Crawl Archive Now Available




February 2016 Crawl Archive Now Available




January 2017 Crawl Archive Now Available




December 2016 Crawl Archive Now Available




October 2016 Crawl Archive Now Available




September 2016 Crawl Archive Now Available




News Dataset Available




May 2015 Crawl Archive Available




Data Sets Containing Robots.txt Files and Non-200 Responses




August 2016 Crawl Archive Now Available




July 2016 Crawl Archive Now Available




June 2016 Crawl Archive Now Available




May 2016 Crawl Archive Now Available




April 2016 Crawl Archive Now Available




Welcome, Sebastian!




August 2015 Crawl Archive Available




November 2015 Crawl Archive Now Available




5 Good Reads in Big Open Data: February 27 2015




Web Image Size Prediction for Efficient Focused Image Crawling




September 2015 Crawl Archive Now Available




July 2015 Crawl Archive Available




June 2015 Crawl Archive Available




5 Good Reads in Big Open Data: March 6 2015




April 2015 Crawl Archive Available




March 2015 Crawl Archive Available




Announcing the Common Crawl Index!




Evaluating graph computation systems




February 2015 Crawl Archive Available




5 Good Reads in Big Open Data: March 20 2015




5 Good Reads in Big Open Data: March 26 2015




5 Good Reads in Big Open Data: March 13 2015




Analyzing a Web graph with 129 billion edges using FlashGraph




January 2015 Crawl Archive Available




Lexalytics Text Analysis Work with Common Crawl Data



