Jason C


Historical Blockchain Data 2025-07-06

There is a debate in the big blocker community about keeping the entire blockchain history. I call the two sides "Pruners" and "Archivists".

  1. Pruners argue that it is not necessary to keep the entire history of the blockchain, and at scale it is not feasible to do so.

  2. Archivists argue that it is important to keep the entire history of the blockchain, and that archiver companies can exist to provide this service.

SPV and Transaction Verification

The original bitcoin whitepaper talks about "Simplified Payment Verification" (SPV) clients, which do not require the entire blockchain history to verify transactions. SPV clients only require block headers and merkle proofs to verify transactions.

One limitation of SPV is that it does not tell you if an output is spent, or more importantly, if an output is unspent. This means when receiving a P2P transaction you must ask a miner whether it is valid or not.

Archivists can check if an output is spent by consulting the full blockchain history. Pruners instead rely on the UTXO set, which contains all unspent outputs and allows transaction verification without full history. However, some Archivists argue that the UTXO set alone is insufficient without cryptographic guarantees.

Wallet Backup Considerations

Archivists also argue pruning creates too much burden on wallets and users. Instead of being able to backup a short seed phrase that can be written down on paper, users must now backup their UTXO set too.

According to the Archivists, if miners stop providing historical data then businesses will likely show up to fill the gap. In that case users could backup just their seed phrase and then pay a business to restore their wallet.

EDIT: Since Pruners would have the UTXO set, in theory that can be used to restore an old wallet – just without historical transactions.

Scaling and Feasibility

One question is how high will blockchain usage scale? Could we get to where transactions are like TCP/UDP network packets, where almost all internet traffic is on-chain?

If blockchain scaled to a certain point, then it would be infeasible to keep the entire history of the blockchain. That would be like keeping every single packet that has ever been sent over the internet.

Data Storage and UTXO Set Size

Many people store data on-chain as an "immutable record", expecting it to persist forever. But if pruning becomes common, that data might disappear. The blockchain would timestamp that the data existed, but not necessarily retain the data itself.

How much scaling improvement do you get from pruning? Some wallets may have 1,000s of transactions but only a few UTXOs. In those cases you could get 1,000:1 scaling improvement. The larger the transaction-to-UTXO ratio, the more scaling improvement you get.

Data Incentives

In a Pruner world, if non-UTXOs are removed and people still want to use blockchain for data storage, then one option is to put data in UTXOs. Instead of using OP_RETURN, or similar storage mechanisms, you could put the data in a small spendable output. Then as long as that output is not spent, miners will keep it in the UTXO set, making the data available. Could this incentive lead to large growth of the UTXO set for data storage?

Perhaps counter incentives would arise, e.g., miners could evict UTXOs with too much data and not enough value if dormant or something. Or maybe just the sat/B transaction fee would be enough to keep the UTXO set size in check.

UTXO Set Verification

If the UTXO set is used to verify transactions, then how do you verify the UTXO set itself? If you have the entire blockchain history, you can verify the UTXO set by checking each transaction. But if you don't have the entire history, how do you know which outputs should be in the UTXO set?

Some people suggest adding UTXO-set "commitments" to the blockchain, arguing they would provide a method to verify the UTXO set without needing the entire history. Others argue it is too computationally expensive and unnecessary.

One level of verification is to check the total satoshis in the UTXO set. If the total satoshis in the UTXO set does not match the total satoshis in the blockchain, then something is wrong. This would not prove 100% that the UTXO set is correct, but it would provide a level of assurance. If there were a dispute about the UTXO set, then the total satoshis could at least be a clue who is correct.

Conclusion

The debate between Pruners and Archivists will have a big impact on the future of blockchain technology. There are trade-offs to both sides. It will be interesting to see how this debate plays out.

Personally, I have been more of an Archivist. If blockchain is a global financial ledger, then I think it is important to keep all the accounting records. But I also see the practical challenges of doing so. Balancing scalability, usability, and data integrity is a complex problem.


← Back to all blog posts