A step closer to a (semi) permanode.
The discussion of permanodes is one with a long history and is one that always seems to come back. With the release of local snapshots it becomes more and more uncertain if your data will stay alive for longer then 30 days in the tangle. However with local snapshots separation of concerns also become possible! And this is really great!
For our project www.mysoundsafe.com, just like many other projects, we want to choose if we keep data available on the tangle or not for application specific purposes. Snapshotting is simply put -a pain in the ass- when it comes to this and with localsnapshotting being activated it makes harder and harder to keep data alive in the tangle. Ideally we want to selectively keep data and prune what we don’t want to store but to do so we first need to be able to store everything and move from there.
So I decided to run a little experiment to implement a persistence provider other then RocksDB and ZeroMQ. One for an external database, couchbase!.
By using an external persistence provider we can keep the running instance of IRI relatively small without sacrificing the storage and data availability capabilities.
- We can run multiple IRI nodes that use a clustered back-end storage so we can run with a near 100% update.
- We can setup IRI nodes with different roles: Active larger part of the live tangle and a few IRI nodes exclusively for accessing historical parts of the tangle.
- The API didn’t change!
- We can independently scale the back-end storage service without the need to ever shutdown our IRI nodes.
- Opens up research for semi permanodes that can store data selectively.
A few months ago during a great community effort to create a permanode (localsnapshotting etc was all unknown back then) I already wrote a document with some considerations that still hold in this discussion(Wide-column, Graph, Key value etc vs CAP): https://docs.google.com/document/d/1cnXv8-aUigxgXpc0aX_WIfq_KEY2REouoreEMPtpOH0/edit?usp=sharing
The conclusion there was to choose for RiakDB with a personal bias to having a better experience compared to couchbase and the more advanced features being part of the open source version(like cross data center replication). Community members notified that Riak’s team from Basho was in some trouble with a big chance of Riak being discontinued.
So now… why couchbase?
- As from the document : A key value store is the best all round storage for tangle type of data.
- Couchbase has a master-less setup, for use this means all nodes can participate in write operations. This opposed to other setups like mongodb which are master-slave where writes are a bottleneck. Master-less allows for proper horizontal scaling on both writing and reading.
- Since the tangle is asynchronous in nature having backend nodes and indexes not immediately consistent is not an issue.(It is writer consistent, meaning that if you write something to a node, the same node will have a consistent view and update of the data)
- It allows for key-index eviction: most systems keep primary keys in memory. For something like the tangle with very high-cardinality data this feature is of utmost importance to keep the memory usage of an ever growing database system in check.(Most data is unused)
- It is darn easy to setup and manage
- It can handle binary data, this is important since storage of Trits in a UTF-8 encoded text format is just not efficient on a massive scale.
- Services(Storage, indexing, caching etc) are independently scalable. For performance tuning each specific use-case this is very important.
This was an effort to show to myself, mysoundsafe and the community it can be done and totally love to get some feedback on this. The project is in absolute infancy (1 commit) and far from finished even though most API functionality works normally. Some side notes on the current version:
- No tests yet (all previous tests pass and is backwards compatible!)
- It has only been tested on a personal testnet (with coo and all) easily dealing with 40tx a second (a bit cheated with MWM3)
- It has been a very long time ago I did Java, constructive feedback is welcome 😉
- Ordering of transactions changed, with IRI the ordering is dependent of the order IRI ‘saw’ the transactions and with couchbase the ordering is based on the ASCII order 9ABCDE… of the transaction ID’s
- Probably more things pop up
Test it yourself!
Follow the normal installation instructions about building from git but use my repo instead: https://github.com/ovanwijk/iri