Speeding Up Ethereum: A New Database Design

Author: Denis Avetisyan

Researchers have developed a novel database architecture that dramatically boosts transaction throughput and reduces storage demands for Ethereum-compatible blockchains.

The Ethereum state database evolves through iterative updates, constructing a new state, $statex2x\_{2}$, from a prior version, $statex1x\_{1}$, by modifying the value $v4v\_{4}$ to $v4′v\_{4}^{\prime}$ associated with a specific key, $b⋅fb\cdot f$.

This paper introduces a forkless database leveraging a mutable Merkle Patricia Trie and optimized data pruning to minimize read amplification and storage overhead.

Modern blockchain design prioritizes fast consensus and finality, yet Ethereum’s original state database architecture-built for forking chains-remains a performance bottleneck for newer, non-forking systems. This paper introduces ‘A Fast Ethereum-Compatible Forkless Database’, a novel state database implementation optimized for modern blockchains while maintaining compatibility with the Ethereum ecosystem. By employing a native database design with a mutable tree structure and optimized data organization, our approach achieves ten-times speedups and 99% space reductions for validators. Could this represent a critical step towards scaling blockchain technology and reducing the resource demands of network participation?

The Inevitable Storage Bottleneck

As blockchain technology matures and adoption increases, traditional database approaches struggle to accommodate the exponentially growing volume of data. Each new block added to the chain necessitates replication across numerous nodes, creating substantial storage demands and network congestion. This inherent design, while ensuring immutability and security, results in diminishing scalability; the more data stored, the slower transaction processing becomes. Consequently, throughput is constrained, and the cost of maintaining a fully replicated database becomes prohibitive, hindering the widespread implementation of blockchain solutions for applications requiring high transaction rates or large datasets. The limitations of current architectures necessitate innovative data storage solutions capable of handling the increasing demands without compromising the core principles of decentralization and security.

Current blockchain data storage solutions frequently encounter performance limitations due to a phenomenon known as read amplification. This occurs when a single logical read operation necessitates multiple physical reads from the storage device, significantly increasing latency and reducing throughput. The root cause often lies in the immutable nature of blockchain data and the way it’s structured – data is appended sequentially, leading to fragmented storage and inefficient access patterns. Coupled with suboptimal storage utilization – where a substantial portion of storage capacity remains unused due to data organization – these factors create a bottleneck that hinders scalability. Each transaction requires accessing and verifying historical data, and as the blockchain grows, the overhead associated with these amplified reads and inefficient storage becomes increasingly pronounced, ultimately impacting the speed and cost of processing transactions.

The future scalability of blockchain technology hinges on resolving limitations within its underlying data storage mechanisms. Current blockchain systems, while secure and transparent, struggle to process transactions at a rate commensurate with widespread adoption; typical throughput is capped around 165 transactions per second. This bottleneck arises from the inefficient manner in which data is stored and accessed, demanding significantly more reads and writes than necessary. Without a more efficient data storage layer – one that minimizes read amplification and optimizes storage utilization – blockchain applications will continue to face constraints, hindering their potential in areas like decentralized finance, supply chain management, and beyond. Innovations in storage are therefore not merely incremental improvements, but rather foundational requirements for unlocking the full capabilities of blockchain technology and ensuring its viability as a mainstream platform.

LiveDB utilizes a distributed architecture enabling real-time data access and collaborative editing.

Mutable Trees: A Pragmatic Approach

LiveDB departs from the immutable data structures typically employed in blockchain technology by utilizing a mutable tree-based database. This approach allows for direct data modification, eliminating the need for complete data replication with each state change and enabling significantly faster data access and processing. Unlike traditional blockchain storage which relies on appending new data to an ever-growing chain, LiveDB’s tree structure facilitates targeted updates and deletions, improving storage efficiency and reducing the computational burden associated with data retrieval. The mutable nature of the database necessitates mechanisms for maintaining data consistency and integrity, which are addressed through techniques such as versioning and checkpointing.

LiveDB utilizes a key-value store as its foundational data storage layer, coupled with a file-mapped array to facilitate rapid data access. This combination enables direct memory mapping of data files, eliminating the overhead associated with traditional disk I/O and significantly reducing latency. By structuring data in this manner, LiveDB achieves a reported 10x increase in throughput compared to conventional blockchain data storage solutions. The file-mapped array allows for efficient random access and modification of data, while the key-value store provides a flexible and scalable approach to managing blockchain state. This architecture prioritizes speed and efficiency in handling blockchain transactions and data requests.

LiveDB’s architecture relies on cryptographic hash functions, specifically SHA2-256, to guarantee data integrity by creating unique fingerprints of data blocks; any modification to a block results in a different hash, immediately revealing tampering. Efficient serialization is achieved through Recursive Length Prefix (RLP) encoding, a compact binary format that minimizes data size and optimizes transmission and storage. RLP encodes data by prefixing the length of each element with its value, allowing for efficient parsing and reconstruction of data structures. The combination of hashing and RLP encoding ensures both the authenticity and compactness of data within the LiveDB system, contributing to its performance characteristics.

Checkpointing and pruning are integral to LiveDB’s data management strategy. Checkpointing periodically saves a consistent snapshot of the mutable tree to persistent storage, providing a recovery point and enabling efficient validation. Pruning removes obsolete data, specifically historical states no longer required for current consensus, thereby reducing storage demands without compromising the ability to reconstruct recent states. The combined implementation of these methods allows LiveDB to sustain a demonstrated transaction throughput of 1479 transactions per second, representing a significant performance improvement over traditional blockchain storage solutions, while managing storage overhead.

LiveDB organizes data using keys and custom fields (C fields) for flexible data management.

ArchiveDB: Separating the Wheat from the Chaff

ArchiveDB functions as a distinct storage layer specifically designed for historical blockchain data, operating in conjunction with LiveDB, which manages current state. This architectural separation optimizes storage efficiency by allowing historical data, which is rarely accessed during active node operation, to be stored using different compression and data structures than live data. By isolating historical data, ArchiveDB avoids the performance overhead of searching through large volumes of infrequently used information, leading to improved query speeds for current data within LiveDB. This approach allows for a significant reduction in overall storage requirements for full nodes, as demonstrated by a 66% decrease from 2TB to 730GB when using ArchiveDB alone, and a 99% reduction when integrated with LiveDB.

ArchiveDB employs $Vector Commitment$ schemes to guarantee the integrity and verifiability of historical blockchain data. This cryptographic technique allows for efficient proof that a specific data block is included within a larger dataset without revealing the entire dataset. A commitment is generated for each data block, and these commitments are aggregated into a single, compact proof. Any modification to the original data will invalidate this proof, ensuring data tamper-evidence. Verification can be performed with constant computational cost, regardless of the dataset size, allowing nodes to efficiently validate the authenticity of archived data without requiring full data downloads or extensive computations.

The integration of Zero-Knowledge Proofs (ZKPs) into the ArchiveDB system significantly enhances data privacy and security during the archival process. ZKPs allow for verification of data integrity without revealing the underlying data itself. Specifically, ArchiveDB utilizes ZKPs to prove that archived blockchain data conforms to predefined rules and hasn’t been tampered with, without disclosing the transaction details or account balances. This is achieved by generating a succinct proof that can be independently verified by any party, ensuring data authenticity and confidentiality. The use of ZKPs minimizes the risk of data breaches and maintains user privacy while providing a robust and verifiable historical record of blockchain transactions.

Separating live blockchain data from historical data in ArchiveDB significantly reduces the storage requirements for full node operation. Current full node databases typically require approximately 2TB of storage; ArchiveDB reduces this to 730GB, representing a 66% storage reduction. When integrated with LiveDB, which manages current state, the overall storage reduction reaches 99%. This streamlined architecture lowers the operational costs associated with maintaining a full node by minimizing disk space and related infrastructure needs, while still retaining complete blockchain history for verification and analysis.

LiveDB and Archive exhibit comparable throughput, demonstrating that archiving does not significantly impact performance.

Beyond Blockchain: A Foundation for Efficient Data Handling

The LiveDB and ArchiveDB systems represent a deliberate departure from conventional database architectures, most notably those employing the Log-Structured Merge Tree (LSM Tree) found in systems like LevelDB. While LSM Trees offer advantages in write-heavy workloads, they can introduce performance bottlenecks due to amplification – the need to rewrite data multiple times as it moves through different layers of the tree. This inherent limitation becomes particularly pronounced in the context of blockchain data, where immutability and consistent read performance are paramount. LiveDB and ArchiveDB address these challenges by prioritizing mutability and employing optimized data structures designed to minimize write amplification and enhance read speeds, ultimately offering a more efficient foundation for managing the ever-growing demands of blockchain applications.

The architecture of LiveDB and ArchiveDB prioritizes data mutability and employs specialized data structures to overcome challenges inherent in blockchain data management. Traditional databases often struggle with the constant writing and modification of data blocks characteristic of blockchain technology; this design directly addresses those limitations. By embracing mutability as a core principle, the system efficiently handles data updates without the performance penalties associated with immutable data stores. Furthermore, optimized data structures, distinct from conventional approaches, minimize storage overhead and accelerate data access, leading to substantial improvements in overall system efficiency and scalability. This focus on adaptable data handling isn’t merely about speed; it’s about building a foundation for blockchain solutions that can realistically accommodate growing datasets and increasing transaction volumes.

The novel database architecture delivers substantial gains in both speed and efficiency for blockchain applications. Performance benchmarks reveal a remarkable tenfold increase in throughput, enabling significantly more transactions per second. Simultaneously, the optimized data structures achieve a 99% reduction in storage requirements compared to the previously utilized database. These improvements aren’t merely incremental; they represent a fundamental shift towards scalability, allowing blockchain systems to handle growing datasets and user bases without prohibitive costs or performance bottlenecks. This advancement positions the architecture as a key enabler for the next generation of blockchain technologies and potentially opens doors for broader applications in data management.

The architectural innovations within LiveDB and ArchiveDB, initially conceived for blockchain data management, extend far beyond that single application. The core principles of optimized mutability and specialized data structures address common bottlenecks in any system grappling with large, rapidly changing datasets. Industries such as financial modeling, real-time analytics, and scientific simulations – all characterized by intensive data processing and storage demands – stand to benefit from this approach. By prioritizing efficient data handling over strict immutability, the design unlocks performance gains and reduces storage footprints, offering a compelling alternative to traditional database solutions that often struggle with scalability and responsiveness in data-rich environments. This suggests a broader applicability, potentially revolutionizing how various data-intensive applications are built and maintained.

LiveDB provides persistent storage, ensuring data is retained even after system interruptions.

The pursuit of database optimization, as detailed in this work, echoes a sentiment felt across countless production deployments. The paper’s focus on minimizing read amplification and storage-through a mutable tree structure-isn’t about achieving elegance, but about delaying the inevitable entropy. It’s a pragmatic battle against the forces of scale. As Edsger W. Dijkstra observed, “It’s not enough to have good intentions; one must also be lucky.” Luck, in this case, manifesting as clever data pruning and hash organization. The core idea-a forkless blockchain-is merely a postponement of complexity, a temporary reprieve from the constant rebuilding that defines the lifecycle of any system. It’s a memory of better times, briefly extended.

What’s Next?

The pursuit of a faster, smaller blockchain state database invariably encounters the law of diminishing returns. This work, by rearranging hashes and muttering incantations over Merkle Patricia Tries, achieves notable gains. But every optimization will, inevitably, be optimized back. The real challenge isn’t merely reducing read amplification – it’s acknowledging that production will always discover novel ways to amplify it. Future iterations will likely focus not on fundamentally new data structures, but on adaptive pruning strategies – intelligent forgetting, if you will – that balance historical completeness against the relentless march of block height.

The promise of a ‘forkless’ blockchain is a compelling one, but the term itself deserves scrutiny. Architecture isn’t a diagram; it’s a compromise that survived deployment. Truly seamless upgrades are a mirage. The goal isn’t to eliminate forks entirely, but to reduce their cost – to make them surgical rather than catastrophic. Research will likely shift towards mechanisms for safely and efficiently rolling back or correcting state in the face of unforeseen consequences – essentially, building in the capacity for controlled failure.

Ultimately, this work, like all its predecessors, adds another layer to the complexity. It doesn’t solve the blockchain trilemma; it simply reshuffles the trade-offs. The field doesn’t refactor code; it resuscitates hope. The next iteration won’t be about a better database, but about a more honest accounting of its limitations.

Original article: https://arxiv.org/pdf/2512.04735.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Inevitable Storage Bottleneck

Mutable Trees: A Pragmatic Approach

ArchiveDB: Separating the Wheat from the Chaff

Beyond Blockchain: A Foundation for Efficient Data Handling

What’s Next?

See also: