Why would you want to use blockchain to build a database solution? And how would you actually do that? BigchainDB has answers.
First Wall Street, then the database world. While most people are still trying to wrap their heads around blockchain and its difference from Bitcoin, others are using it in a wide range of domains. Is it hype, a case of having a hammer and seeing problems as nails, or could blockchain actually have a purpose in the database world?
BigchainDB‘s creators argue there is a reason, and a way, for blockchain and databases to live happily ever after.
Blockchain was introduced by Bitcoin, which despite its oft discussed issues has illustrated a novel set of benefits: decentralized control, where “no one” owns or controls the network; immutability, where written data is “forever” tamper-resistant; and the ability to create and transfer assets on the network, without reliance on a central entity.
The initial excitement surrounding Bitcoin stemmed from its use as a token of value, for example as an alternative to government-issued currencies. Now the separation between Bitcoin and the underlying blockchain technology is getting better understood, the scope of the technology itself and its applications are being extended.
With this increase in scope, single monolithic blockchain technologies are being re-framed into building blocks at four levels of the stack:
2. Decentralized (blockchain) computing platforms
3. Decentralized processing (smart contracts) and decentralized storage (file systems, databases) and communication
4. Cryptographic primitives, consensus protocols, and other algorithms.
Blockchain operations work with data, and that data is also stored as part of the blockchain. For example, when transferring assets from one node to another, the amounts transferred as well as the sender, receiver, and time of transfer are stored. So the option to leverage the benefits blockchain brings by using it as a database is tempting.
The problem is, the blockchain as a database is awful, measured by traditional database standards: throughput is just a few transactions per second (tps), latency before a single confirmed write is 10 minutes, and capacity is a few dozen GB. Furthermore, adding nodes causes more problems: with a doubling of nodes, network traffic quadruples with no improvement in throughput, latency, or capacity. Plus, the blockchain essentially has no querying abilities.
How could that possibly ever work? Trent McConaghy and his co-founders in BigchainDB have tackled this issue by turning it on its head: instead of using blockchain as a database, they are taking a database and adding blockchain features to it. Initially they started working with RethinkDB, the reason being that RethinkDB leveraged a clean and efficient node update protocol.
Under the hood, BigchainDB utilizes two distributed databases, S (transaction set or “backlog”) and C (blockchain), connected by the BigchainDB Consensus Algorithm (BCA). The BCA runs on each signing node, with signing nodes forming a federation. Non-signing clients may connect to BigchainDB, and depending on permissions they may be able to read, issue assets, transfer assets, and more.
Each of the distributed DBs, S and C, is an off-the-shelf big data DB. BigchainDB does not interfere with their internal workings, so it gets to leverage their scalability properties, as well as features like revision control and benefits like battle-tested code. Each DB is running its own internal consensus algorithm for consistency.
At this point BigchainDB has moved towards using MongoDB, and is in fact in a partnership with them. But why MongoDB? It could have been any other open source distributed database. “We did consider a number of DBs, but we wanted a document DB to begin with as we’re working with JSON at this point, and MongoDB is an obvious choice.”
But, again, isn’t BigchainDB afraid that combining the notorious blockchain with the recently targeted MongoDB could raise multiple red flags in terms of security? McConaghy has openly acknowledged that the underlying DB may be a security vulnerability at this point, but is neither critical of MongoDB nor apologetic.
“MongoDB has been clear about providing ease of access by removing hard security, so it’s not their fault if people left their installations on the internet unsecured. As for us, at this point we are no better or worse than a centralized solution, and we will definitely add improved security features before moving to production,” he says.
BigchainDB works by offering an API on top of the underlying database, with the aim of acting as a substrate-agnostic layer that adds the key blockchain features of decentralization, immutability, and asset transferability. But that leads to some interesting issues.
For example, what if for some reason users would like to use a different database as a substrate? BigchainDB offers a Service Provider Interface that can be used to plug in other databases. It is what has been used to integrate and operate on top of MongoDB, and according to McConaghy could also be used to do the same with any other database, be it relational or key-store or anything else.
Of course, that is easier said than done, and brings up another issue: querying. Although BigchainDB’s querying support is not fully operational at this point, the goal is to offer one unified querying interface over whatever underlying database nodes BigchainDB may be using. That is a hard problem to solve, as not all databases have the same query languages or capabilities.
However, the current trend towards feature convergence in the database world, and in particular the renewed interest and turn to SQL as the standard for querying may offer a way out of this. Even so-called NoSQL databases like MongoDB offer SQL capabilities these days, so this is the most promising way forward for BigchainDB as well: a SQL interface.
At this point, BigchainDB queries are mostly done by directly using MongoDB’s API, but this is a sort of hack that tightly couples BigchainDB to MongoDB, so it is seen as an interim solution that will eventually give way to querying via BigchainDB’s own API.
As should be evident by now, BigchainDB is not a typical database by any measure. It is also not a typical startup run by a typical founder. McConaghy has a rich background in AI before it was cool and a hacker ethos: “doing AI in the 90s was one of the least popular things one could possibly do, so I certainly didn’t do it for the hype.”
McConaghy could have been part of the Facebooks of the world had he chosen to, as he has actually turned down such offers. This is not what drives him, and by extension BigchainDB. The drive behind BigchainDB is not getting to a successful exit or IPO, but rather reshaping the internet and the world at large.
McConaghy believes that centralization leads to concentration of power, citing examples such as social media ownership and control of data or the conundrum that both creators and consumers of art, and content in general, face on the internet.
This is what McConaghy’s previous venture, Ascribe, was about: helping digital artists transfer ownership of their work to customers. Although whether this is really applicable to everyday art like music or videos is unclear, Ascribe aims to provide a solution for digital artists with unique creations and collectors that want to own them, and uses decentralization to achieve this. At some point Ascribe’s evolution gave birth to BigchainDB.
Some might say this is an overly complicated solution, but McConaghy is not one to shy away from complexity. When asked on his take on Numerai and the criticism that has been expressed towards it for example, he is adamant: “I don’t think it’s overly complicated, on the contrary, I think it’s brilliant, maybe the best combination of blockchain and AI out there. I think they are doing a really good job of aligning incentives for founders, employees and users. Think of Facebook, what if it operated on the basis of giving its users a stake in the value it generates? This is what Numerai is doing, and in the process it is bringing a shift in the power structure and creating incentives for cooperation. So it is turning a zero-sum game to a positive-sum game.”
So where on that long and winding road is BigchainDB at the moment? Berlin-based BigchainDB has raised a total of 5 million euros, with a recent series A of 3 million. It is working in close collaboration with a number of early adopter clients, including the likes of RWE and Internet Archive.
The Internet Archive, along other organizations such as Open Media or the Human Data Commons Foundation, are also the caretakers of IPDB, or Inter-Planetary DB: a public instance of BigchainDB, used to collectively store and manage content in a safe and decentralized way. IPDB has an equally grand vision: its goal is to be a database for the internet.
For Internet Archive for example, it would mean moving away from traditional storage technology and towards the decentralized and cooperative storage model that BigchainDB stands for. As Internet Archive is looking into options such as moving its data to Canada to avoid data sovereignty issues, the potential of adding immutability on top of decentralized storage is appealing.
For RWE on the other hand, the stakes are a bit different. Traditionally, large electric utilities would connect the energy producers with the energy consumers. Deregulation changes things, as anyone can now connect to anyone. RWE is getting in front of that by exploring several blockchain projects, such as energy exchanges, electric car charging, and billing.
BigchainDB has recently released version 0.9, and its roadmap for 2017 is to reach a stable version 1.0 in the summer and to have fully operational, production-ready open-source and enterprise versions available by the end of the year.
Whether that goal is feasible, or whether its grand vision is likely to be achieved remains to be seen. It certainly does not lack in ambition or skills however.
Addendum, March 8th 2017: After the article was published, we received the following clarification from Bigchain’s CEO regarding scalability:
“When we first released BigchainDB, we gave too strong of an impression that it was *already* doing 1M writes/s whereas that was actually just in the underlying database (RethinkDB at the time), though we had designed the algorithm such that BigchainDB could eventually hit that (after more hardening and optimizations).
After feedback, we revised things to set a more appropriate expectation: *towards* 1M writes/s. And we also discovered that users didn’t care as much about that benefit compared to other benefits, like high capacity and usability; so we spent more of our resources towards user asks than towards 1M writes/s so far. (That is however still in the roadmap; it’s just not a priority).