The Rise of Blockchain Indexers: Optimizing Data Retrieval to Drive the Development of Web3 Applications

2025-07-29 05:25:04

The Importance of Blockchain Data and the Rise of Indexers

Data plays a key role in Blockchain technology and is the foundation for developing decentralized applications. While most discussions currently focus on data availability, data accessibility is equally important but often overlooked.

In the era of modular Blockchain, data availability solutions have become an indispensable part. These solutions ensure that all participants can access transaction data, enabling real-time verification and maintaining network integrity. However, the functionality of the data availability layer is more akin to a billboard rather than a database, meaning that data is not stored indefinitely but is instead deleted over time.

In contrast, data accessibility focuses on the ability to retrieve historical data, which is crucial for developing decentralized applications and conducting Blockchain analysis. Although data accessibility is discussed less frequently, it is equally important as data availability. Both play different but complementary roles in the Blockchain ecosystem, and a comprehensive data management approach must address both issues simultaneously to support robust and efficient Blockchain applications.

Traditional Blockchain Data Retrieval Methods

The emergence of blockchain technology has driven the creation of decentralized applications in various fields. However, building these applications requires access to a large amount of blockchain data, which is both difficult and costly.

For developers, one option is to host and run their own archive nodes. These nodes store all historical Blockchain data from the beginning, allowing full access to the data. However, maintaining archive nodes can be costly, and their query capabilities are limited. Running cheaper nodes is another option, but these nodes have limited data retrieval capabilities, which may affect the operation of applications.

Another approach is to use commercial remote procedure call node providers. These providers are responsible for the costs and management of the nodes and provide data through remote procedure call endpoints. Public endpoints are free but have rate limits, which may negatively impact the user experience of the application. Private endpoints offer better performance by reducing congestion, but even simple data retrieval requires a significant amount of back-and-forth communication. This makes them request-heavy and inefficient for complex data queries. Furthermore, private endpoints are often difficult to scale and lack compatibility across different networks.

Blockchain Indexer: A Better Alternative

Blockchain indexers play a crucial role in organizing data on the chain and sending it to a database for easier querying, which is why they are referred to as "the search engine of the blockchain." They work by indexing blockchain data and making it readily available through a query language similar to structured query language. By providing a unified interface for querying data, indexers allow developers to quickly and accurately retrieve the information they need using standardized query languages, greatly simplifying the process.

Different types of indexers optimize data retrieval in various ways:

Full Node Indexer: These indexers run full Blockchain nodes and extract data directly, ensuring data completeness and accuracy, but require a large amount of storage and processing power.
Lightweight Indexers: These indexers rely on full nodes to fetch specific data as needed, thus reducing storage requirements but potentially increasing query time.
Specialized Indexers: These indexers are specifically designed for certain types of data or specific Blockchains, optimizing retrieval for particular use cases, such as non-fungible token data or decentralized finance transactions.
Aggregated Indexers: These indexers extract data from multiple blockchains and sources, including off-chain information, providing a unified query interface, which is particularly useful for cross-chain applications.

Ethereum alone requires 3TB of storage space, and as the Blockchain continues to grow, the data storage of archive nodes will also increase continuously. The indexer protocol deploys multiple indexers, which can efficiently index and quickly query large amounts of data, something that remote procedure calls cannot achieve.

The indexer also allows for complex queries, easily filtering data based on different criteria, and extracting data for subsequent analysis. Some indexers also allow for the aggregation of data from multiple sources, thus avoiding the need to deploy multiple interfaces in cross-chain applications. By being distributed across multiple nodes, the indexer provides enhanced security and performance, whereas remote procedure call providers may experience interruptions and downtime due to their centralized nature.

Overall, compared to remote procedure call node providers, indexers improve the efficiency and reliability of data retrieval while also reducing the cost of deploying a single node. This makes the Blockchain indexer protocol the preferred choice for decentralized application developers.

Application Scenarios of Indexers

Building decentralized applications requires retrieving and reading Blockchain data to operate their services. This includes any type of application, including decentralized finance, non-fungible token platforms, games, and even social networks, as these platforms need to read data before executing other transactions.

Decentralized Finance

Decentralized finance protocols require different information to quote specific prices, rates, fees, etc. Automated market makers need price and liquidity information about certain liquidity pools to calculate swap rates, while lending protocols need utilization rates to determine borrowing rates and the debt-to-collateral ratio for liquidation. It is essential to input this information into their applications before calculating the rates executed by users.

Game

Blockchain games require quick indexing and access to data to ensure users can play smoothly. Only through lightning-fast data retrieval and execution can Web3 games be on par with traditional games in terms of performance, thereby attracting more users. These games need data such as land ownership, in-game token balances, and in-game operations. By using indexers, they can better ensure stable data flow and uptime to guarantee a perfect gaming experience.

Non-Fungible Token

The non-fungible token market and lending platforms need to index data to access various information, such as token metadata, ownership and transfer data, royalty information, etc. Quickly indexing such data can avoid browsing each token individually to find ownership or attribute data.

Analysis

The indexer provides a method to extract specific data from the raw Blockchain data (including smart contract events in each Block). This creates opportunities for more specific data analysis, thus providing comprehensive insights.

For example, perpetual trading protocols can identify which tokens have high trading volumes and which tokens incur fees, thus deciding whether to list these tokens as perpetual contracts on their platform. Developers of decentralized exchanges can create dashboards for their products to gain insight into which liquidity pools offer the highest returns or the strongest liquidity. They can also create public dashboards that allow developers to freely and flexibly query any type of data they want to display on the charts.

Overview of Main Blockchain Indexers

The Graph

The Graph is the first indexing protocol launched on Ethereum that allows for easy querying of previously inaccessible transaction data. It uses subgraph definitions and filters to collect subsets of data from the Blockchain, such as all transactions related to a specific liquidity pool of a DEX.

Using index proof, indexers stake native tokens for indexing and query services, and delegators can choose to stake their tokens here. Curators can access high-quality subgraphs to help indexers determine which subgraphs to compile data for to earn the best query fees. In the transition towards greater decentralization, The Graph will eventually stop its hosting services and require subgraphs to upgrade to its network while providing upgrade indexers.

Its infrastructure reduces the average cost per million queries to $40, which is much lower than the cost of self-hosted nodes. Using file data sources, it also supports parallel indexing of both on-chain and off-chain data for efficient data retrieval.

The Graph's indexer rewards have been steadily increasing over the past few quarters. This is partly due to the increase in query volume, but also attributed to the rise in token prices, as they plan to integrate AI-assisted queries in the future.

Subsquid

Subsquid is a peer-to-peer, horizontally scalable decentralized data lake that efficiently aggregates large amounts of on-chain and off-chain data, protected by zero-knowledge proofs. As a decentralized worker network, each node is responsible for storing data from a specific block subset, accelerating the data retrieval process by quickly identifying the nodes that hold the required data.

Subsquid also supports real-time indexing, allowing indexing before a block is finalized. It also supports storing data in formats chosen by developers, making it easier to analyze using various tools. Additionally, subgraphs can be deployed on the Subsquid network without migrating to the Squid SDK, enabling no-code deployment.

Despite still being in the testnet phase, Subsquid has achieved impressive statistics, with over 80,000 testnet users, over 60,000 Squid indexers deployed, and more than 20,000 verified developers on the network. Recently, on June 3rd, Subsquid launched the mainnet of its data lake.

In addition to indexing, the Subsquid Network data lake can also replace remote procedure calls in use cases such as analytics, ZK/TEE co-processors, AI agents, and Oracles.

SubQuery

SubQuery is a decentralized middleware infrastructure network that provides remote procedure calls and indexing data services. It initially supported Polkadot and Substrate networks and has now expanded to include over 200 chains. Its operation is similar to The Graph, which uses indexing proofs; indexers index data and provide query requests, while delegators stake their shares to the indexers. However, it introduces consumers to submit purchase orders to ensure that the indexers' revenue is guaranteed, rather than the managers.

It will introduce SubQuery data nodes that support sharding to prevent continuous synchronization of new data between each node, thereby optimizing query efficiency while moving towards greater decentralization. Users can choose to pay approximately 1 SQT token as a computation fee for every 1000 requests, or set custom fees for indexers through the protocol.

Although SubQuery launched its token earlier this year, the issuance rewards for nodes and delegators have also increased in USD value on a month-to-month basis, which also represents a continuous increase in the number of query services offered on its platform. Since the token generation event, the total staked SQT has increased from 6 million to 125 million, highlighting the growth of network participation.

Covalent

Covalent is a decentralized indexing network that creates copies of blockchain data through batch exports by a network of block sample producer nodes, and publishes proofs on the Covalent Layer blockchain. This data is then refined by block result producer nodes according to established rules, filtering out the data that meets the requirements.

With a unified application programming interface, developers can easily extract relevant Blockchain data in a consistent request and response format without having to write custom complex queries to access the data. The CQT token, settled on Moonbeam, can be used as a means of payment to extract these pre-configured datasets from network operators.

The rewards of Covalent seem to show an overall upward trend from the first quarter of 2023 to the first quarter of 2024, partly due to the increase in the price of Covalent token CQT.

Considerations for Choosing an Indexer

Customizability of Data

Some indexers are general-purpose indexers that provide standard pre-configured datasets solely through application programming interfaces. While they may be fast, they lack the flexibility to provide developers with the custom datasets they require. By using an indexer framework, it allows for more customized data processing to meet application-specific needs.

Security

Index data must be secure, otherwise decentralized applications built on these indexers can also be vulnerable to attacks. For example, if transactions

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

9 Likes