How to Get Cardano Data Without Infrastructure
Anyone can use this quick solution to get structured tables with clear information about what happens on the Cardano blockchain
With the ongoing support of the Cardano Foundation, BloxBean has just released Yaci Store 3.0.0-beta3, featuring the new analytics module. It lets developers export the full chain to open Parquet files that they can easily query on a laptop.
Public data but not quite usable
Cardano is an open, public blockchain: every block, transaction, reward, and vote is there for anyone to see. The catch is that visible doesn’t mean the same as usable.
If someone wants to study how staking rewards have changed over time, chart transaction fees across epochs, or even look at how governance voting is shaping up, they usually have to:
• run their own Cardano node,
• run an indexer next to it to turn raw chain data into something queryable,
• wait days for it to sync the full history,
• keep a large database running and pay for the servers, storage, and maintenance behind it.
That's a lot of work and real cost just to ask a few queries. For a hobbyist or a student, even for a researcher, a journalist, or a small startup, it often becomes enough of a hurdle to make them give up before they begin. The data may be public, but it has a very heavy door at the entrance.
The Analytics Store in Yaci Store was built to close this gap.
Exporting the chain into open data files
Yaci Store is an open-source Cardano indexer from the BloxBean project. It reads the blockchain and organizes it into clean, structured tables — blocks, transactions, addresses, stake, rewards, governance, and so on.
The Analytics Store adds one important capability on top of that: it continuously exports those tables into Parquet files.
Parquet is a widely used open file format for analytics. It's compact, fast to scan, and supported by almost every data tool in existence. The wider data industry already uses this exact same format for large datasets. So instead of locking Cardano's history inside one running database that only a few can reach, the Analytics Store turns it into a folder of standard data files anyone can copy, share, and analyze.
The entire Cardano mainnet history, exported and compressed this way, currently fits in around just 135 GB. The size will grow over time as the chain does, but that’s still impressively little for almost nine years of Cardano history. Every transaction, every reward, every stake snapshot, all in a set of files small enough to fit on a cheap USB drive. You can copy it, hand it to a colleague, and start asking questions in minutes.
Why having the blockchain’s history in the pocket matters
Today, developers typically need the full stack to analyze Cardao: node + indexer + database. The Analytics Store does that heavy lifting once and produces a self-contained set of files ready to use and reuse whenever necessary.
Once the Analytics Stores produces the set of files, you don't need the node, the indexer, or the live database to explore the data; a laptop is enough. In other words, you can do serious analytics without running infrastructure.
Upcoming Model Context Protocol (MCP) support, planned for the next beta release, will make access even easier. AI assistants and applications will be able to query the Parquet datasets directly using natural language, helping journalists, researchers, governance participants, hobbyists, and other non-technical users explore Cardano data without writing SQL.
Right now, the Analytics Store already makes the data shareable and reproducible. Because the export is just files in an open format, the dataset is portable. You can copy it, archive it, or publish it. This also means that two people analyzing the same set of files will get the same answers, exactly as needed for research, audits, and reporting.
Anyone running Yaci Store can produce these files themselves. But in the future the files should become readily available even outside the application: A next step will host the Parquet files in a shared, public location everyone can download from.
Imagine a public library of Cardano's analytical history. A developer who wants to create a dashboard, a DeFi tracker, a governance explorer, or a rewards calculator could simply download the files and start building, without ever running their own indexer. That removes one of the biggest costs and complexity barriers in the Cardano ecosystem and lets builders focus on their product instead of plumbing.
Making blockchain data genuinely accessible to everyone — not just to teams who can afford the infrastructure — is the whole point.
How it works
Users don't need to understand the internals to benefit from the data, but here's the shape of it for the curious.
Tables become folders of files
Out of the box, the Analytics Store ships with 47 ready-made exporters — one for each table in Yaci Store. That covers the breadth of Cardano data, including:
• core activity – blocks, transactions, transaction outputs (UTXOs), inputs, scripts, datums, metadata, assets;
• staking and rewards – stake registrations, delegations, epoch stake snapshots, rewards, reserves and treasury movements;
• protocol and economics – the ada pots (reserves, treasury, deposits), protocol parameters, cost models, epoch info;
• governance – DReps, votes, governance proposals, committee activity, constitution, and more.
Each table is written into its own folder of Parquet files.
Getting started
The Analytics Store is part of Yaci Store 3.0.0-beta3. Anyone already running Yaci Store can be exporting in three simple steps:
- Enable the analytics profile. Analytics export is turned on by activating the analytics Spring profile:
# Docker — set in config/env SPRING_PROFILES_ACTIVE=analytics # Zip (JARs) — pass to the start script ./bin/start.sh analytics
If you also want rewards and stake-snapshot data in your export, enable the ledger-state profile alongside it (ledger-state,analytics). Files are written to ./data/analytics by default; change the location with yaci.store.analytics.export-path.
-
Let it run. On mainnet, exports begin automatically once the initial sync reaches the chain tip. Yaci Store then writes finalized data to Parquet, partitioned by day and epoch, staying about two days behind the tip.
-
Query it with DuckDB (or any Parquet tool):
SELECT epoch, COUNT(\*) FROM read_parquet('data/analytics/main/transaction/**/*.parquet', hive_partitioning=true) GROUP BY epoch;
Two ways the data is partitioned and organized
Big datasets are easier to work with when split into sensible chunks. The Analytics Store uses two natural strategies for Cardano.
• Daily partitions — used for the 31 tables that flow continuously, like transactions, UTXOs, blocks and address activity. Files land in folders like date=2026-06-01/.
• Epoch partitions — used for the 16 tables naturally tied to Cardano's ~5-day epochs, like the ada pots, stake snapshots and rewards. Files land in folders like epoch=450/.
This layout tidies things up. More importantly, though, it makes queries faster because a tool can skip straight to the days or epochs it requires instead of reading everything.
Only finalized data, no rollback surprises
Blockchains can briefly roll back their very latest blocks as the network settles. To keep the exported files stable and trustworthy, the Analytics Store deliberately stays a little behind the chain tip and doesn’t provide real-time alerting. It only exports finalized, immutable data.
By default, the export lags the tip by about two days. This period, however, can be configured under the finalization-lag-days setting.
The approach trades a little freshness for a guarantee: What's in the Parquet files will never suddenly change. For analytics and research especially, such stability is crucial.
Bring your own tool to use the data
Because Parquet is an industry-standard format, developers are not tied to any single piece of software when using the Analytics Store. Virtually any modern data tool that understands Parquet can read these files directly.
The easy on-ramp: DuckDB
The simplest way to start is DuckDB, a free, lightweight analytics engine that runs directly on a laptop without any server to set up. You can point it straight at the files and run plain SQL.
For example, transaction statistics per epoch will look like this:
SELECT epoch, COUNT(*) AS tx_count, SUM(fee) AS total_fees, AVG(fee) AS avg_fee FROM read\_parquet('data/analytics/main/transaction/**/*.parquet', hive_partitioning=true) GROUP BY epoch ORDER BY epoch;
Or consider instead the top stake pools by delegated stake in a given epoch:
SELECT pool_id, COUNT(DISTINCT address) AS delegators, SUM(amount) AS total_stake FROM read_parquet('data/analytics/main/epoch_stake/epoch=450/*.parquet') GROUP BY pool_id ORDER BY total_stake DESC LIMIT 10;
Note that the main in the path is just the source schema name; adjust the path to wherever you pointed your export.
These examples run against the full mainnet dataset. Because each query reads only the slice of data it needs, partition-aware queries like these typically return in well under a second.
Two storage modes: DuckLake (default) and plain Parquet
Out of the box, the Analytics Store runs in DuckLake mode. DuckLake is a catalog layer that tracks the exported tables and their metadata, adding niceties like ACID transactions, time-travel queries, and schema evolution so you can treat the whole export like a tidy, queryable data lake and reference tables by name (for example, analytics.block) instead of juggling file paths.
If you prefer the simplest possible setup, you can switch to plain Parquet mode (yaci.store.analytics.storage.type=parquet), which writes the same partitioned Parquet files without a catalog.
Either way the output is open Parquet you can query with any tool. DuckLake even writes its files under the same export path, so you can always bypass the catalog and read them directly.
And whenever your needs grow, the same files will work with the heavyweight tools of the data world: Apache Spark, Trino, ClickHouse, Polars, pandas, and cloud query engines like Amazon Athena or Google BigQuery all read Parquet natively. You can start on a laptop and, without changing the data, move the exact same files into a large cloud setup. Nothing about the format locks you in.
Custom exporters for every need
The 47 built-in exporters mirror Yaci Store's tables one-to-one. But real analytics questions often span several tables at once, and you may require a dataset tailored specifically to your use case.
For that, the Analytics Store lets you define custom exporters entirely through a simple YAML configuration file, with no code required. You write the SQL query you want – including joins across tables –, give it a name and a partition strategy, and the Analytics Store produces a fresh Parquet dataset for it on the same schedule as the built-in tables.
See the example of a dataset that joins transactions with their metadata:
yaci: store: analytics: custom-exporters: - name: tx_with_metadata partition-strategy: DAILY query: >- SELECT t.tx_hash, t.block, t.slot, to_timestamp(t.block_time) AS block_time, t.fee, m.label, m.body FROM {source}.transaction t JOIN {source}.transaction_metadata m ON t.tx_hash = m.tx_hash WHERE t.slot >= {start_slot} AND t.slot < {end_slot}
Your query can use a few placeholders that the exporter fills in for each partition: {source} (the source schema), {start_slot} / {end_slot} (the partition's slot range), and {epoch} (for epoch partitions).
Custom exporters are switched on with the custom-exporters profile alongside analytics (for example, ./bin/start.sh analytics,custom-exporters). Because it's just configuration, anyone running Yaci Store can tailor the exported data to their own analytics or product without touching the indexer's code.
Who this is for
Developers creating applications that interact with the Cardano blockchain might seem like the primary users for Yaci’s Analytics Store, but we’ve actually built it with a much larger audience in mind. The solution addresses the needs of multiple groups, such as:
• researchers and academics studying staking economics, decentralization, fee markets, or governance participation;
• builders prototyping dashboards, explorers, and analytics products without first standing up a full indexing stack;
• data scientists plugging Cardano history straight into the tools they already use — pandas, Spark, notebooks — for modeling and machine learning;
• the general audience as a whole, sharing a common, reproducible dataset so everyone works from the same numbers, and potentially saving real infrastructure cost by downloading published files instead of every team running its own indexer.
Dune has already put the Analytics Store to use. Cardano datasets published on Dune show how Parquet-based exports can power dashboards, research, and ecosystem analytics without requiring every team to maintain its own indexing infrastructure.
A few honest caveats for the current version of the Analytics Store: Until a public community mirror is available, you still need to run Yaci Store yourself to generate the files.
Cardano's data has always been public. The Analytics Store makes it genuinely open. It’s small enough to copy, simple to query, and free for anyone to build on. Check the repository of this BloxBean open-source Cardano indexer, developed in collaboration with the Cardano Foundation, and discover all details.