A non-profit called Current AI just released the Gap Map, a first attempt to index the entire open source AI ecosystem in one place. Simon Willison reports that the group launched version 0.1 a couple of days ago, on July 3rd. Current AI describes itself as “a global partnership building a public option for AI,” founded at the AI Action Summit in Paris back in February 2025, and it’s not working on a shoestring. Roughly $400 million is already committed to the effort.
What stands out here is the scale of the cataloging job, and the fact that the raw data is open for anyone to use.
What the Gap Map covers
The v0.1 release documents 421 products in depth, produced by 228 organizations. The breakdown:
- 266 software tools and libraries, the bulk of the map, covering the code layer that most builders touch daily.
- 85 models, the open weights and architectures people actually run.
- 50 datasets, the training and evaluation data underneath the models.
- 20 hardware projects, a smaller but notable slice, since open hardware rarely gets indexed at all.
Everything is sorted into 14 categories across three layers of the stack: model components, product and UX, and infrastructure. That structure is the point. It’s an attempt to show where open source is strong and where the gaps are, hence the name.
The long tail is huge
The 421 detailed entries are just the researched core. Current AI says another 24,400 artifacts sit in an uncategorized long tail. Those carry no score until someone researches and cites them. So the map you see today is an early cut, not a finished survey. Expect the numbers to grow as more of that backlog gets processed.
The data matters more than the map
Willison makes clear the interactive map is worth a look, but he’s more interested in what sits underneath it. Current AI released the whole dataset under an MIT license through the currentai-org/os-ai-map GitHub account. That’s 1,184 YAML files, plus the notebooks, schemas, and scripts used to gather them.
An MIT license on the underlying data is the real story. It means researchers, tool builders, and journalists can pull the dataset, remix it, and build their own views without asking permission. A pretty front-end is useful. Open, reusable data is durable.
Because the files live on GitHub, you can poke at them without any local setup. Willison points to Datasette Lite as a quick way in, and highlights a CSV of 16,185 GitHub repos the project is tracking, loaded straight into the browser-based tool.
Why this is significant
The open source AI world moves fast and stays messy. Models, datasets, and tools scatter across thousands of repos with no shared map. A funded, non-profit attempt to index all of it, and then hand over the data freely, is the kind of infrastructure the ecosystem has been missing. The $400 million backing suggests this isn’t a weekend side project.
The caveat is that v0.1 is early. With 421 entries scored against a 24,400-item backlog, most of the ecosystem is still uncharted. The value will come from how quickly Current AI works through that tail, and how many people build on the open data in the meantime.
For the full breakdown and links to explore the map and the dataset yourself, check the original write-up from Simon Willison.