Securing Web3 Data Availability with Blockchain
1/ Making Data Available for Web3 Users with Blockchain
The centralized nature of Web2-based data hosting frameworks results in web users lacking access to their data. Web2 cloud hosting companies notably Google Cloud offer Back-end-as-a-Service (BaaS) by housing a backend code that powers web applications. The centralized nature of its backend codes enables Google Cloud to build walled gardens that restrict web users’ access to their data which then allows Google to secure and monetize these data. All this is set to change with the dawn of Web3 which brings with it the promise of user autonomy and control over their data through the use of blockchain for data availability.
Leading the way for the transition of data hosting frameworks from the Web2 model to that of Web3 is Amazon with its Amazon Managed Blockchain platform that facilitates the use of public blockchain networks or the creation and management of scalable private blockchain networks through the use of popular open-source frameworks such as Hyperledger Fabric and Ethereum.
With this, let’s explore the concept of blockchain data availability including the underlying consensus protocols, the pertinent risks as well as how these risks can be addressed.
2/ Data Availability vs Data Retrievability
Conceptually, data availability is related to, yet distinct from, data retrievability. Data availability refers to the access that nodes of a blockchain network have to the data of the network’s transactions whilst the data is still pending the reaching of consensus to be added to the blocks of the network. Data retrievability on the other hand refers to the access that nodes of a blockchain network have to the historical data of a blockchain network i.e. data which have already secured the requisite consensus to be added to the blocks of the network.
In terms of technical complexities, data availability is more complex than data retrievability as the latter only requires the storage of historical data by one honest node of a network whereas the former requires the participation of the majority of the nodes of a network to validate the data which is proposed to be added to a block on the network.
Data Availability Landscape of Ethereum (Source: Blog.Celetstia.org)
3/ Blockchain Network Communications for Data Availability
The decentralised nature of blockchain networks and the distributed nature of blockchain-based records mean that the data of these networks are shared with participants i.e. nodes of the networks. Coupled with the automated recording functions of blockchain networks and the immutable features of these records, it is clear as to why blockchain has much to offer Web3 users in terms of data availability. The main features of blockchain protocols which support data availability are as below.
(1) Peer-to-Peer (P2P) Structure
The P2P structure of blockchain networks allows nodes to communicate directly with one another. This bypassing of any central entity facilitates not only the fluidity of communications but also the transparency of the data communicated.
(2) Consensus Agreement Mechanisms
The [consensus agreement mechanisms](https://www.investopedia.com/terms/c/consensus-mechanism-cryptocurrency.asp#:~:text=A consensus mechanism is any,the most prevalent consensus mechanisms.) of blockchains such as Proof-of-Work (PoW) and Proof-of-Stake (Pos) which require the validation of transactions supports data availability by providing an indication as to which are the validating nodes that would be the nodes from which the data of a transaction would be originating.
(3) Gossip Protocol
The use by certain blockchain networks of the gossip protocol which is based on the spreading of information on social networks and transmission of viruses in epidemics allows nodes to share data with a particular group of nodes from which the information is then shared onwards with other groups of nodes. The use of this asynchronous mode of data distribution is not only more efficient due to its lower bandwidth requirements but also has a higher degree of fault tolerance thanks to the multiple streams of distribution channels.
Gossip Protocol (Source:Twitter.com/Proassetz Exchange)
4/ Risks to Data Availability
Whilst the decentralised nature of blockchain networks help ensure the data availability of these networks, it also renders these networks vulnerable to a few pertinent risks as below.
(1) Data Withholding Attack
A data withholding aka selfish mining attack occurs when a malicious node produces a block but refuses to publish the block. In addition to causing issues for data availability, data withholding attacks also result in other nodes being unable to verify the data in the block which enables the malicious node to subvert the protocol rules of the blockchain network by advancing invalid state transitions on the network.
(2) Hard Fork
In contrast with soft forks which merely updates a blockchain network, hard forks result in the creation of a new blockchain. This means that after a hard fork, the original set of data on the genesis blockchain would branch into two separate sets. In terms of data availability, this would mean that nodes of the network may not be able to access the data on the newly forked blockchain.
Soft Forks vs Hard Forks (Source:Koinly.io)
(3) Node Failures
The [failure of nodes](https://www.ibm.com/docs/en/spectrum-scale/5.0.5?topic=clusters-node-failure#:~:text=In an FPO deployment%2C each,these cases are node failures.) on a blockchain network, be it due to technical issues such as hardware failure or software bugs, or security risks such as malicious attacks, could result in the unavailability of access to the data stored on the failed nodes.
- Addressing the Risks to Data Availability
The risks to the data availability of blockchain networks can be addressed through the use of the mechanisms below.
(1) Full Nodes
The risks to data availability arising from data withholding attacks can be addressed through the use of full nodes which maintain a complete copy of every transaction carried out on a blockchain network. The function of full nodes in enabling the independent verification of data on a blockchain network enables it to detect any incompatibility in the data chain of the network arising from the non-publication of data by a malicious node. In this way, full nodes help address the risks to data availability arising from data withholding attacks.
A project that addresses the risks of data availability using the full node compatibility mechanism is Polygon Avail. How Polygon Avail works is that it conducts data availability checks by ensuring that block producers are only able to release block headers if the requisite data can be found in the underlying full nodes. In this manner, Polygon Avail addresses data availability risks by warranting that a block is only agreed upon i.e. considered by the network to be valid if the data behind it are accessible for the purposes of ascertaining their compatibility with those of other full nodes on the network.
Types of Nodes (Source: GeekFlare.com)
(2) Ledger Replication
The risks to data availability arising from hard forking can be addressed through the use of ledger replication which involves the replication of data in the ledgers of different chains of a blockchain network. In this way, when the chains separate upon the hard forking of the network, the forked networks would each have a full set of the historical data of the original network.
A project that addresses the risks of data availability using the ledger replication mechanism is Covalent. How Covalent’s software works is that it indexes the entire ledger records of a blockchain network including its smart contract, wallet address and transaction details. The indexed data is then converted into standardized formats i.e. block-specimens which enable these data to be queried using a unified API. In this manner, Polygon Avail addresses data availability risks by replicating the ledger records of blockchain networks to facilitate cross-blockchain data availability including for nodes of newly forked blockchains.
(3) Redundant Storage
The risks to data availability arising from node failures can be addressed through the use of redundant storage which involves the data of a blockchain network being duplicated in multiple copies before being stored in different nodes on the network. In this way, the failure of the nodes on the blockchain network would not significantly affect the data availability of the network as other nodes on the network would have a copy of the data of the network.
A project that addresses the risks of data availability using the redundant storage mechanism is BNB Greenfield. How BNB Greenfield works is that it replicates data from blockchain networks in segments before using erasure codes to encode the replicated data and thereafter saving the encoded data on multiple modes. The data segmentation and encoding processes are usually quite slow due to their computing-intensive nature. BNB Greenfield addresses this issue of low efficiency through the use of parallel processing which optimizes the speed of the processes through the encoding of data segments in a synchronous manner based on their shard sizes.
Based on the discussions above, it is clear that the decentralized nature of blockchain networks and the distributed nature of blockchain-based records would be facilitating access for web users to their data. In this manner, the data availability of blockchain networks render these networks to play a central role in bringing to life the user autonomy ideals of Web3 which entails the idea of granting web users access to and control over their data, both of which necessitate as a prerequisite the availability of these data.