Common Crawl Blog

CC-Citations: A Visualization of Research Papers Referencing Common Crawl




CC-Citations: A Visualization of Research Papers Referencing Common Crawl

CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data

Host- and Domain-Level Web Graphs November/December 2025 and January 2026




January 2026 Crawl Archive Now Available




Web Archives for Social Sciences Datathon, Bristol




How SEOs Are Using Common Crawl's Web Graph Data for AI Ranking Signals




GneissWeb Annotations Examples




Common Crawl at the Mozilla Festival 2025




Host- and Domain-Level Web Graphs October, November, December 2025




December 2025 Crawl Archive Now Available




A Sampling of 2025 Research Referencing Common Crawl

Host- and Domain-Level Web Graphs September, October, and November 2025




November 2025 Crawl Archive Now Available




Common Crawl Celebrates World Digital Preservation Day




Setting the Record Straight: Common Crawl’s Commitment to Transparency, Fair Use, and the Public Good




October/November 2025 Newsletter




Common Crawl Foundation at Stanford HAI




Host- and Domain-Level Web Graphs August, September, and October 2025




October 2025 Crawl Archive Now Available




Common Crawl Foundation at COLM 2025




Announcing GneissWeb Annotations




Web Languages Needing Review by Native Speakers




Host- and Domain-Level Web Graphs July, August, and September 2025




From SEO to AIO: Why Your Content Needs to Exist in AI Training Data




September 2025 Crawl Archive Now Available




Common Crawl Foundation Opt-Out Registry




Trip Report: AI_dev (Linux Foundation) August 2025




Common Crawl Foundation at Stanford HAI: A Shared Legacy of Data and Innovation




July/August 2025 Newsletter




Host- and Domain-Level Web Graphs June, July, and August 2025




August 2025 Crawl Archive Now Available




Common Crawl Foundation at ACL 2025




AI Optimization Is Here: Are You Ready for Search 2.0?




IETF 123 Report




Host- and Domain-Level Web Graphs May, June, and July 2025




July 2025 Crawl Archive Now Available




WMDQS Shared Task on Language Identification




The First WMDQS-Masakhane LangID Hackathon




Host- and Domain-Level Web Graphs April, May, and June 2025




Common Crawl at the United Nations Open Source Week, June 2025




June 2025 Crawl Archive Now Available




May/June 2025 Newsletter




Announcing the Whirlwind Tour of Common Crawl's Datasets using Python




Host- and Domain-Level Web Graphs March, April, and May 2025




May 2025 Crawl Archive Now Available




Announcing the First Workshop on Multilingual Data Quality Signals




Host- and Domain-Level Web Graphs February, March, and April 2025




April 2025 Crawl Archive Now Available




Introducing the Host Index




IIPC General Assembly & Web Archiving Conference 2025




March/April 2025 Newsletter




Providing Authenticity & Data Provenance for Common Crawl Using Blockchain: Our Work with Constellation Network




Host- and Domain-Level Web Graphs January, February, and March 2025




March 2025 Crawl Archive Now Available




Introducing Common Crawl AI Agent by ReadyAI




Submission to the UK’s Copyright and AI Consultation




Host- and Domain-Level Web Graphs December 2024 and January/February 2025




February 2025 Crawl Archive Now Available




Opening the Gates to Online Safety




January/February 2025 Newsletter




Host- and Domain-Level Web Graphs November/December 2024 and January 2025




January 2025 Crawl Archive Now Available




Introducing cc-downloader




Host- and Domain-Level Web Graphs October, November, and December 2024




December 2024 Crawl Archive Now Available




Common Crawl Foundation at NeurIPS 2024: Expanding Horizons and Building Connections




Expanding the Language and Cultural Coverage of Common Crawl




October/November 2024 Newsletter




Host- and Domain-Level Web Graphs September, October, November 2024




November 2024 Crawl Archive Now Available




Reflections on Recent Talks at the Turing Institute and UCL




Introducing the Common Crawl Errata Page for Data Transparency




Host- and Domain-Level Web Graphs August, September, and October 2024




October 2024 Crawl Archive Now Available




White House Briefing on Open Data’s Role in Technology




IAB Workshop on AI-CONTROL




Host- and Domain-Level Web Graphs July, August, and September 2024




September 2024 Crawl Archive Now Available




August/September 2024 Newsletter




Host- and Domain-Level Web Graphs June, July, and August 2024




August 2024 Crawl Archive Now Available




The Increase of Common Crawl Citations in Academic Research




Host- and Domain-Level Web Graphs May, June, and July 2024




July 2024 Crawl Archive Now Available




Common Crawl Statistics Now Available on Hugging Face




The Environmental Impact of the Cloud - the Common Crawl Case Study




Host- and Domain-Level Web Graphs April, May, and June 2024




June 2024 Crawl Archive Now Available




Dialog and Discovery at AI_dev 2024




May/June 2024 Newsletter




Host- and Domain-Level Web Graphs February/March, April, and May 2024




May 2024 Crawl Archive Now Available




Host- and Domain-Level Web Graphs November/December 2023, February/March 2024, and April 2024




April 2024 Crawl Archive Now Available




March/April 2024 Newsletter




Host- and Domain-Level Web Graphs September/October, November/December 2023 and February/March 2024




February/March 2024 Crawl Archive Now Available




Web Archiving File Formats Explained




A Further Look Into the Prevalence of Various ML Opt–Out Protocols

















