Common Crawl Blog

Common Crawl Foundation Opt-Out Registry




Common Crawl Foundation Opt-Out Registry




Trip Report: AI_dev (Linux Foundation) August 2025




Common Crawl Foundation at Stanford HAI: A Shared Legacy of Data and Innovation




July/August 2025 Newsletter




Host- and Domain-Level Web Graphs June, July, and August 2025




August 2025 Crawl Archive Now Available




Common Crawl Foundation at ACL 2025




AI Optimization Is Here: Are You Ready for Search 2.0?




IETF 123 Report




Host- and Domain-Level Web Graphs May, June, and July 2025




July 2025 Crawl Archive Now Available




WMDQS Shared Task on Language Identification




The First WMDQS-Masakhane LangID Hackathon




Host- and Domain-Level Web Graphs April, May, and June 2025




Common Crawl at the United Nations Open Source Week, June 2025




June 2025 Crawl Archive Now Available




May/June 2025 Newsletter




Announcing the Whirlwind Tour of Common Crawl's Datasets using Python




Host- and Domain-Level Web Graphs March, April, and May 2025




May 2025 Crawl Archive Now Available




Announcing the First Workshop on Multilingual Data Quality Signals




Host- and Domain-Level Web Graphs February, March, and April 2025




April 2025 Crawl Archive Now Available




Introducing the Host Index




IIPC General Assembly & Web Archiving Conference 2025




March/April 2025 Newsletter




Providing Authenticity & Data Provenance for Common Crawl Using Blockchain: Our Work with Constellation Network




Host- and Domain-Level Web Graphs January, February, and March 2025




March 2025 Crawl Archive Now Available




Introducing Common Crawl AI Agent by ReadyAI




Submission to the UK’s Copyright and AI Consultation




Host- and Domain-Level Web Graphs December 2024 and January/February 2025




February 2025 Crawl Archive Now Available




Opening the Gates to Online Safety




January/February 2025 Newsletter




Host- and Domain-Level Web Graphs November/December 2024 and January 2025




January 2025 Crawl Archive Now Available




Introducing cc-downloader




Host- and Domain-Level Web Graphs October, November, and December 2024




December 2024 Crawl Archive Now Available




Common Crawl Foundation at NeurIPS 2024: Expanding Horizons and Building Connections




Expanding the Language and Cultural Coverage of Common Crawl




October/November 2024 Newsletter




Host- and Domain-Level Web Graphs September, October, November 2024




November 2024 Crawl Archive Now Available




Reflections on Recent Talks at the Turing Institute and UCL




Introducing the Common Crawl Errata Page for Data Transparency




Host- and Domain-Level Web Graphs August, September, and October 2024




October 2024 Crawl Archive Now Available




White House Briefing on Open Data’s Role in Technology




IAB Workshop on AI-CONTROL




Host- and Domain-Level Web Graphs July, August, and September 2024




September 2024 Crawl Archive Now Available




August/September 2024 Newsletter




Host- and Domain-Level Web Graphs June, July, and August 2024




August 2024 Crawl Archive Now Available




The Increase of Common Crawl Citations in Academic Research




Host- and Domain-Level Web Graphs May, June, and July 2024




July 2024 Crawl Archive Now Available




Common Crawl Statistics Now Available on Hugging Face




The Environmental Impact of the Cloud - the Common Crawl Case Study




Host- and Domain-Level Web Graphs April, May, and June 2024




June 2024 Crawl Archive Now Available




Dialog and Discovery at AI_dev 2024




May/June 2024 Newsletter




Host- and Domain-Level Web Graphs February/March, April, and May 2024




May 2024 Crawl Archive Now Available




Host- and Domain-Level Web Graphs November/December 2023, February/March 2024, and April 2024




April 2024 Crawl Archive Now Available




March/April 2024 Newsletter




Host- and Domain-Level Web Graphs September/October, November/December 2023 and February/March 2024




February/March 2024 Crawl Archive Now Available




Web Archiving File Formats Explained




A Further Look Into the Prevalence of Various ML Opt–Out Protocols




Balancing Discovery and Privacy: A Look Into Opt–Out Protocols




Host- and Domain-Level Web Graphs May/Sep/Nov 2023




November/December 2023 Crawl Archive Now Available




Oct/Nov 2023 Performance Issues




Host- and Domain-Level Web Graphs Mar/May/Oct 2023




September/October 2023 crawl archive now available




Bridging Digital Exploration and Scientific Frontiers




May/June 2023 crawl archive now available




March/April 2023 crawl archive now available




Host- and Domain-Level Web Graphs September/October, November/December 2022 and January/February 2023




January/February 2023 crawl archive now available




September/October 2022 crawl archive now available




Host- and Domain-Level Web Graphs February/March, April and May 2021




November/December 2022 crawl archive now available




June/July 2022 crawl archive now available




Host- and Domain-Level Web Graphs May, June/July and August 2022




August 2022 crawl archive now available




July/August 2021 crawl archive now available




May 2022 crawl archive now available




Host- and Domain-Level Web Graphs October, November/December 2021 and January 2022




January 2022 crawl archive now available




Introducing CloudFront as a new way to access Common Crawl data as part of Amazon Web Services’ registry of open data




November/December 2021 crawl archive now available




October 2021 crawl archive now available




Host- and Domain-Level Web Graphs June, July/August and September 2021



