Accessing Cardano Blockchain Data with Ledger Sync

09 November 2023 • Activities & Updates
Sebastian Bode image
Sebastian Bode
Director of Engineering
Cardano Foundation logo over a blue gradient.

Enhancing development with streamlined data provisioning and open-source support

The Cardano Foundation's Engineering Team has recently taken on the work of developing a Java-based data provisioning tool called Ledger Sync, which gives access to Cardano blockchain data. Ledger Sync aims to achieve the same level of data completeness as Cardano DB Sync—also referred to as db-sync—while supplying an option in a programming language well-established among diverse companies as well as within a broad user base. In line with the Foundation’s efforts to foster Cardano’s open source maturity, the framework will be released under an open source license to equip developers and partners with an additional tool for chain indexing, increasing the diversity of the Cardano developer ecosystem.

The Foundation has chosen to publish the repository under the Apache 2.0 license, which allows the use of the framework in commercial and non-commercial applications as well as in other open source tools. The license also provides legal certainty in the form of patent rights for tools built on or with it, plus for changes to the framework itself that need pointing out but not necessarily a public disclosure. This arrangement ensures optimal accessibility and legal certainty for companies as well as other developers and community members.

Blockchain as a data structure

Many blockchain-based systems face a common challenge: data cannot be efficiently retrieved in a random access way because of the storage’s organization in the format of a linked list. For instance, to access the transactions stored in block number 200 on Cardano, an application would have to iterate over the first 199 blocks before reaching the block that contains the desired information. If a blockchain network continuously expands, this approach becomes infeasible. Cardano, for instance, currently has more than 9.5 million blocks minted on mainnet.

The fact that certain information is not explicitly stored on the blockchain presents yet another challenge. Due to Cardano’s EUTxO-based accounting model, this applies even to the current or historical balance of a wallet. In order to know the unspent transaction outputs (UTxOs) in a wallet, it would become necessary to track all UTxOs that have gone into and out of the wallet, then aggregate them to determine the final balance.

Until roughly the beginning of 2022, db-sync had been the only available option for builders in the ecosystem to access data from the Cardano blockchain in an efficient way. However, in order to reach the tip of the chain, db-sync also requires the processing of an immense amount of data during the initial sync. This makes it infeasible to run db-sync for all projects. Additionally, in many cases projects do not need all the data provided by db-sync, a reality that led to the development of so-called scoped chain-indexers, allowing users to specify precisely which data needs to be indexed and made available for a certain application. Projects like Kupo, Scrolls, Oura, and Carp all furnish the community with different ways of doing so.

On top of those more technical requirements, businesses also frequently have needs regarding the reliability of such data provisioning services. Generally, these requirements relate not to the actual implementation in source code but to how such a system is operated. The Cardano Foundation developed Ledger Sync to address precisely those kinds of business requirements. While on the one hand it shares some similarities with db-sync, certain design choices make it a better fit for distributed architectures as they allow for high availability setups. In fact, the Foundation’s new explorer has already put Ledger Sync to use as its underlying data pipeline.

If projects require very specific data from the Cardano blockchain but can forgo features like in-built high availability and offloading of data consolidation, modular chain indexers like yaci-store, scrolls, and Kupo present suitable choices. Ledger Sync, on the other hand, equips projects with a reliable option for a Java-based data layer on top of Cardano, especially when projects need dependable access to blockchain-related data based on a framework built in an easily accessible programming language.

The initial design of Ledger Sync aimed to achieve three goals:

  1. Implement the network mini protocols in Java so that network data is accessible in a way native to the Java programming-language without dependencies on other frameworks.
  2. Decouple the pure crawling and data consolidation steps to enable scale-out architectures, meaning that parts of the computation can be offloaded to multiple instances, such as containers or servers, with the consequent evaluation of the negative impact inherited from transferring data via a network instead of keeping it locally in a single server.
  3. Implement a node-independent version of the treasury, reserves, and rewards calculation, which the Foundation already recently open sourced.

The original implementation that went live with the beta version of the Foundation’s explorer covered the first two points. The Foundation’s Engineering Team is currently working on the third point as part of another open source project.

Journey to the Ledger Sync repository

The team reached a number of learnings during the first development phase, all already tackled in the version now released under an open source license.

We started developing the network mini protocols in Java from scratch because, at the time the project began, the well-known Yaci library was not yet available in a state we deemed usable. Nonetheless, we replaced the previous network mini protocol implementation with Yaci in the current version to help push the adoption of this community project. The change means that only one Java tech stack needs to be maintained and kept up-to-date with any changes to the protocol.

For coordinating tasks between different services—like decoupling the crawling from the downstream processing and data consolidation—, the team initially used Redis and Apache Kafka, a widely known open source streaming data platform. While this approach proves effective and Kafka, as well as Redis provide robust components in a distributed architecture, the deployments are quite costly and can become cumbersome, especially for smaller scale setups. With the latest release, the team replaced the necessity for Kafka and made it optional. Although Ledger Sync operates without Kafka, developers can use an in-app event mechanism made available by Yaci Store and opt to publish block data to various messaging platforms, such as Kafka and RabbitMQ, then consume it from another application.

We also initially started with the same database schema used by db-sync. This approach proved useful while developing the Foundation’s explorer in parallel as it enabled the explorer backend to use real data for testing. However, it also resulted in design challenges for Ledger Sync as we deviated from the Domain Driven Design approach and did not create a built-to-purpose schema for the first Ledger Sync version. While the database schema of db-sync serves its purpose, it does not support the chosen architecture and aim for parallelization during the data processing of Ledger Sync. We are therefore currently working on designing a new schema to replace the present one.

Furthermore, the majority of recent source code additions include support for the Conway era in Yaci. This update enables Ledger Sync, as well as all other Java-based tools to work with the latest ledger and network mini protocol version.

The Foundation’s Engineering Team also started a workstream to implement parallel processing of multiple blocks, plus transactions in single blocks during the initial sync process. Parallelizing those computations is not straightforward as it becomes necessary to take into account dependencies between UTxOs, consequently decreasing the performance gained from the parallelization. Additionally, ongoing development efforts include offering blueprints for hosting Ledger Sync in various deployment scenarios, like in Kubernetes based setups or simply via docker-compose.

The road ahead

Besides its technical features, Ledger Sync comes with a commitment from the Cardano Foundation to provide continuous maintenance, plus operation of the repository as an open source project. Other than the framework as a library and tool itself, this effort aims at working with community groups, looking to define a standard and potential reference implementation for decentralized data APIs where Ledger Sync might be used as the underlying data layer.

Delivering a hosted data API that guarantees high availability, accuracy, and correctness with a professional support model stands as an additional goal. Guaranteeing this would address the needs of organizations and enterprises, making it easier for them to seamlessly implement blockchain solutions and thus fostering greater enterprise adoption. Such an undertaking, however, requires alignment and collaboration with existing projects. Only by doing so does it become possible to leverage the full capacity of the already existing Cardano ecosystem while simultaneously ensuring an approach that supports the diversity of emerging business models.

The Cardano Foundation encourages everyone to try Ledger Sync and engage with our team either via Discord or by filing a feature request or bug report in the Ledger Sync repository. We look forward to your feedback and contributions.