Storing Documents on the Blockchain

One question that often comes up when building smart contracts is how to attach documents to them. For non-technical users, attaching documents to payments is a feature that is taken for granted. After all, if you look at your email inbox you will almost surely find an invoice attached to each of your payment confirmations.

Attaching documents to blockchain transaction is not simple, the main problem is that documents (especially PDF documents) can be huge compared to the size of a transaction sent to the network. Blockchains are designed to store transactions that are a few bytes in size, not documents that can weigh several megabytes!

Another issue is that the data on the blockchain is visible to everyone, this is great from a transparency point of view but documents may contain sensitive information that we are not willing to share with the world. This poses an additional challenge since we may want to prove the existence of a document (for example, an itemized invoice) while at the same time keeping the content of the document private, to be shared only with trusted parties.

You can use a service like LockerX to create smart-contract transactions that also include document attachments, without being an expert yourself. In the next section, we will describe how we implemented document attachments in LockerX.

Hash it!

As mentioned in the previous section, the challenge in attaching data to blockchain transactions is the limited amount of storage space and the need to keep documents private. Luckily, both problems can be solved with the same tool: hash functions!

Hash functions are mathematical functions that map data of different sizes (for example documents) to values of the same size. Cryptographic hash functions are hash functions with some additional guarantees:

  • Given a hash value, it’s infeasible to generate a message that produces that hash value,
  • It’s infeasible to produce two different messages that have the same hash value, that is the function is collision-resistant,
  • A small change to the message produces a completely different hash value.

Link your document

Computing the hash of documents it’s not enough, we also need to make sure this data is included on the distributed ledger. Storing the hash on the ledger is important because it proves that the data itself existed at the time the ledger was computed:

  • Blockchain ledgers are immutable. Storing a hash value in a ledger is enough to prove that we had that value at the time since it is impossible to retroactively change a ledger.
  • The document was not altered in any way. Since any small change to a document produces a completely different hash value, then if the document produces the same hash value that is stored on the ledger it means it was not modified in any way. Additionally, since it’s infeasible to have two documents produce the same hash value, then the document must be the same as the original document.

One last question is where to store the document itself since only its hash is stored on the ledger. The answer to that depends on your use case and how private your document is. Publicly available documents can be stored unencrypted on distributed file systems such as IPFS, while private documents should be stored encrypted and only shared with trusted parties.

On LockerX, documents are stored securely by us and are shared only with the parties you give access to.

Why not give it a try?

In this post, we described a strategy to use blockchain to prove the existence of a document at a given point of time. If this sounds exciting to you, make sure to check a live version of these ideas on LockerX.