The Data
Overview
Web Graphs
Latest Crawl
Statistics
Errata
Resources
Get Started
Blog
Examples
Use Cases
CCBot
Infra Status
FAQ
Community
Research Papers
Mailing List Archive
Hugging Face
Discord
Collaborators
About
Team
Jobs
Mission
Impact
Privacy Policy
Terms of Use
Search
Contact Us
Read about the Increase of Common Crawl citations in academic research
Research Papers
Research on Free Expression Online
Jeffrey Knockel, Jakub Dalek, Noura Aljizawi, Mohamed Ahmed, Levi Meletti, and Justin Lau
Banned Books: Analysis of Censorship on Amazon.com
Improved Trade-Offs Between Data Quality and Quantity for Long-Horizon Model Training
Dan Su, Kezhi Kong, Ying Lin, Joseph Jennings, Brandon Norick, Markus Kliegl, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro
Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset
Web Graph Strategies Against Unreliable News
Peter Carragher, Evan M. Williams, Kathleen M. Carley
Misinformation Resilient Search Rankings with Webgraph-based Interventions
Analyzing the Australian Web with Web Graphs: Harmonic Centrality at the Domain Level
Xian Gong, Paul X. McCarthy, Marian-Andrei Rizoiu, Paolo Boldi
Harmony in the Australian Domain Space
The Dangers of Hijacked Hyperlinks
Kevin Saric, Felix Savins, Gowri Sankar Ramachandran, Raja Jurdak, Surya Nepal
Hyperlink Hijacking: Exploiting Erroneous URL Links to Phantom Domains
Enhancing Computational Analysis
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y.K. Li, Y. Wu, Daya Guo
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Computation and Language
Asier Gutiérrez-Fandiño, David Pérez-Fernández, Jordi Armengol-Estapé, David Griol, Zoraida Callejas
esCorpius: A Massive Spanish Crawling Corpus
The Web as a Graph (Master's Thesis)
Marius Løvold Jørgensen, UiT Norges Arktiske Universitet
BacklinkDB: A Purpose-Built Backlink Database Management System
Internet Censorship
University of Maryland, Nourin, Sadia, et al
Measuring and Evading Turkmenistan’s Internet Censorship
Internet Security: Phishing Websites
Asadullah Safi, Satwinder Singh
A Systematic Literature Review on Phishing Website Detection Techniques
More on Google Scholar
Curated BibTeX Dataset
Text Link