< Back to Blog
February 23, 2026

Introducing the New Examples & Resources Browser

We've replaced our old Examples and Use Cases pages with a single searchable, filterable browser. 119 resources from 115 contributors, all in one place. Search, filter by type or language, sort, and share links. We welcome community submissions.
Thom Vaughan
Thom Vaughan
Thom is a Principal Engineer at the Common Crawl Foundation.
An image showing an assortment of colourful gems
We’ve put together a collection of wonderful stuff.

For a long time, finding tools, code, and community projects built on Common Crawl data has meant browsing two separate pages on our site: /examples and /use-cases.  Both were static lists, manually maintained, and (honestly) starting to show their age.  If you wanted to find, say, a Python library for working with WARC files, you'd have to scroll through the giant list and we weren’t happy with that.

We've replaced both pages with a single, searchable, sortable, filterable Examples & Resources browser.  It's a much better way to explore what the community has built with Common Crawl data, and we think you’ll love it.

What’s new

Today, the browser brings together 119 resources spanning tools, code examples and libraries, articles, presentations, and videos from 115 contributors across the Common Crawl ecosystem. Everything is now in one place, and you can actually find what you're looking for.

Search everything.  The search bar matches across titles, descriptions, authors, and keywords. Looking for something that works with CDX indexes?  Type "CDX" and you'll have your answer in a keystroke.

Filter by what matters.  Multi-select dropdown filters let you narrow down by resource type (Tool, Code, Presentation, Video, Article), programming language, keyword, and license.  Filters combine, so you can find all Apache-licensed Python tools with a couple of clicks.

Sort your way.  Sort by title, type, author, or date in either direction.  The default sort puts the newest entries first, because that's usually what you want.

Official projects.  Resources maintained by the Common Crawl team are marked with a small gem icon.  You can filter for official projects using the "Official" toggle to quickly find our recommended starting points, including our cc-downloader tool.

Shareable views.  Every combination of search, filters, and sort order is encoded in the URL.  Found a useful filtered view?  Copy the link and share it.  Bookmark it.  Put it in your project's README.  The URL preserves everything.

Works on mobile.  The desktop view shows a sortable data table, but on smaller screens it switches to a mobile-optimised card layout.  It’s built with phones and tablets in both orientations in mind, so you can browse on the train or settle an argument in a meeting.

A quick tour

Here's the browser in action.  Try searching, clicking the filter buttons, or just browsing:

By the numbers

The collection currently includes 55 code libraries, 37 tools, 12 presentations, 8 videos, and 7 articles.  Python leads the language chart (as it does everywhere in data science), followed by Java, Spark, and Go.  There's plenty for JavaScript, Rust, SQL, and Bash users too.

We’d love your submissions

This page is for the community, and we want it to reflect the full range of what people are building with Common Crawl data.  If you've written a tool, published code, given a talk, or written about working with our datasets, we'd love to include it.

To submit a resource:

  • Use our contact form and tell us about your project.  Include the URL, a brief description, and any relevant details (language, license, etc.).
  • Or join our Discord server and share it in the community channels.  Discord is also a great place to ask questions, get help with Common Crawl data, and connect with other users.

We review submissions and add them to the collection regularly.

What happened to the old URLs?

They're retired.  Both old pages now redirect to the new Examples & Resources browser with an HTTP 301.  If you had bookmarks pointing to either one, they'll still work.  We’re all too familiar with dead links.  All the resources from both pages are included in the new collection, along with a number of entries that were never listed on the old pages.

What's next

This is a living resource.  We'll keep adding new projects as we find them and as the community submits them.  If you spot an entry that's out of date, has a broken link, or is missing important details, get in touch.

Cheers! 🎉

This release was authored by:
Thom is a Principal Engineer at the Common Crawl Foundation.
Thom Vaughan

Erratum: 

Content is truncated

Originally reported by: 
Permalink

Some archived content is truncated due to fetch size limits imposed during crawling. This is necessary to handle infinite or exceptionally large data streams (e.g., radio streams). Prior to March 2025 (CC-MAIN-2025-13), the truncation threshold was 1 MiB. From the March 2025 crawl onwards, this limit has been increased to 5 MiB.

For more details, see our truncation analysis notebook.