Abstract
We summarize the results and perspectives from a companion article, where we presented and evaluated an alternative architecture for data storage in distributed networks. We name the bio-inspired architecture RAIN, and it offers file storage service that, in contrast with current centralized cloud storage, has privacy by design, is open source, is more secure, is scalable, is more sustainable, has community ownership, is inexpensive, and is potentially faster, more efficient, and more reliable. We propose that a RAIN-style architecture could form the backbone of the Internet of Things that likely will integrate multiple current and future infrastructures ranging from online services and cryptocurrency to parts of government administration.
1 Background
Recently our physical technologies (e.g., the converging bio-, info-, nano-, and cognotechnologies) have started to advance beyond our social technologies (e.g., governance, laws, educational systems, and social norms). This rapidly growing gap generates challenges and opportunities within most areas of modern society [9], including privacy and security in cyberspace as well as environmental issues.
The Internet was originally designed with robustness in mind, as a means to guarantee communications in times of war. Instead of focusing on the protection of central points of failures, its protocols allowed redundancy, self-repair, and self-organization: While single nodes can fail, and new nodes can be connected, the overall functionality of the network is guaranteed by a resilience rooted in ecology.
Despite the ecosystematic nature of the infrastructure of Internet services, they are becoming progressively more centralized, with fewer and fewer organizations in charge of managing information on a planetary scale, thus creating monopolies and raising significant issues of privacy, security, and democracy.
The Internet data storage services provided today violate privacy, are expensive, and come at a high environmental cost. Today more than 3% of the world's power consumption is attributed to data centers, with a CO2 footprint surpassing that of global air traffic and a rapidly growing power consumption rate [1]. The high entrance cost to the data storage market creates monopolies, in that only the largest companies are capable of offering scalable, cost-efficient services (e.g., [11]).
2 Basic Design Concepts
This article summarizes the results and perspectives from [7], outlining how current Internet of Things (IoT) technology could enable further decentralization and a more bio-inspired, distributed paradigm not only for information delivery, but also for storage and processing. We offer preliminary results on the development of RAIN,1 an alternative and potentially superior software backbone for storage of data in distributed networks.
Our network architecture offers a distributed file storage service that is faster, is more efficient and reliable, is more secure, offers privacy by design as well as community ownership, and is open source, scalable, and more sustainable and less expensive than the current, centralized paradigm.
Owned by the community of its users (e.g., citizens, businesses, and organizations), this network service will be lower cost, democratic, and designed to guarantee the privacy of the data it stores. Embedded in citizen-owned computing devices (e.g., inexpensive Raspberry Pis with flash drives), it is now possible to have cheap, energy-efficient, always online computing nodes in our homes and businesses. The RAIN network design leverages on the collective storage power of these devices: Every node will store parts of other nodes' data to guarantee redundancy and reliability, and an elegant cryptographic designed architecture will prevent unwanted access to the stored data.
Such a bio-inspired architecture offers redundancy, distributed control, error correction, self-repair, and obvious potential for autonomous adaptation (learning) in later versions, with no central point of failure or trusted third parties. Each node operates via local interactions with a limited set of other nodes that it does not need to trust a priori.
Similarly to blockchains eliminating many banks as middlemen for standard financial transactions (see, e.g., [8]), RAIN could disrupt current cloud storage facilities and eliminate the need for centralized data centers overseeing many market segments, offering a solution to growing concerns about personal privacy and democracy, stemming from increasingly pervasive and unnecessary surveillance by private and public organizations.
Our preliminary results (see [7]) include: a feasibility study, where we quantitatively estimate the reliability of a decentralized storage network in comparison with a data-center-based architecture; provide an overview of the main security challenges to developing this infrastructure; identify how new security mechanisms can be designed to guarantee data security—even against government-grade attackers; and offer an outlook for the potential applications of this network to a broader spectrum of services than cloud-based storage can provide.
In particular, we estimate network size requirements to port to a distributed paradigm: a content delivery network for public Web content (with a more in-depth study of the resources needed to host Wikipedia); an end-to-end encrypted, peer-to-peer messaging platform; a social network without centralized control, free from targeted advertising and surveillance; and a distributed search engine (we argue on the one hand the high performance of distributed Web crawling, and on the other the limited querying capabilities of a high-latency, peer-to-peer distributed database).
Finally, we discuss how high-uptime, low-power nodes enable the development of a highly efficient cryptocurrency based on authenticated hash tables (see, e.g., [6]) instead of blockchains, with logarithmic space, time, and communication complexity, and no need for proof-of-work-based mining for initial currency distribution. Thus such a cryptocurrency should be significantly more memory and energy efficient than blockchains.
3 How is RAIN Different?
RAIN lies at the intersection of the well-explored field of decentralized and distributed systems security and that of low-cost, pervasive networked computation. A paradigm shift from software instances running on personal computers to permanently online, but still unreliable, dedicated low-energy nodes may seem minor, but allows us to ground our architecture design in far more stringent reliability assumptions. Until a few years ago, only expensive, dedicated servers could guarantee such reliabilities.
Peer-to-peer file distribution, for example, is a well-known technology that today aids the distribution of open-source operating systems and creative commons media. The challenge of translating this download-only paradigm to one where data can be reliably uploaded to a network of nodes has so far been undertaken only by storage-trading projects (like Storj; see [12]) that make stronger reliability assumptions than those offered by personal computers.
As we have seen, globally used, blockchain-based cryptocurrencies and distributed ledgers exist today, but limited uptime assumptions force their architectures to a paradigm where consensus needs to be verifiable asynchronously. This often leads to CPU-intensive security procedures and limited overall transaction throughput. Our preliminary results show that using proofs of space (see, e.g., [3]) on semi-reliable devices, we can guarantee security at a significantly smaller hardware and energy cost.
Finally, as is often seen in biological systems, subsystems integration and multipurpose interaction play a significant role in RAIN. This is in contrast, for example, to the bitcoin mining process. It has to run dedicated hardware whose sole purpose is to solve costly and otherwise useless computational challenges. RAIN, a community-owned, distributed storage network, could make use of its spare storage space to collectively guarantee its own security, while offering a variety of useful services to the community of its users.
4 RAIN Architecture Highlights
Optimal erasure codes (e.g., [10]) exist based on polynomial oversampling and interpolation that allow us to organize an S-byte-long string of data in K = rN (with r > 1) blocks of size S/N, so that S can be recovered by any N of those blocks. The design of our network (which leverages only local, scalable interactions between the nodes and requires no mediation of a central decision-making authority) organizes embedded computers, persistently connected to home-grade Internet connections in villages of size K. Within the same village, each node trades its storage space with the others, offering to store redundancy blocks for the other nodes in exchange for space to store its own in a peer-to-peer fashion.
A village-wide distributed ledger is kept between the villagers to keep each file under real-time control. Nodes securely monitor each other's data availability (which can be done with logarithmic time and communication complexity using Merkle tree hashes; see [5]) to readily detect failures. When a node experiences an unrecoverable failure (e.g., hardware failure or permanent disconnection), the village signals new nodes to join it. When the availability of data reaches a threshold value T = hN (with 1 < h < r), a self-repairing, distributed recovery procedure is triggered and new redundancy blocks are generated.
Using experimental data for hard disk drive (HDD) and solid state drive (SSD) failure rates (see [2] and [4]) and gathering experimental data on home-grade Internet connection uptime and speed (see [7]), from the above model we could determine the expected lifetime L∗ and the expected downtime (i.e., the fraction of time something is unreachable due to temporary malfunctioning of its connection) d∗ of a file in our network.
Figures 1 and 2 show the expected data lifetime and downtime for a file stored by a village, determined by our analytical model, as a function of its size K and its recovery ratio h = T/N. Here each node is contributing with Z = 100 GB of storage space. Note how, without having to affect the storage ratio r (which determines how efficiently data is stored on the network), we can make the data lifetime arbitrarily large, and the data downtime arbitrarily small, just by changing the size of the village.
5 Discussion
Our proposed bottom-up, low-energy, bio-inspired technology offers a more cooperative, civic-centered ownership structure to preserve critical aspects of online privacy as well as freedom from the steering power of today's invasive marketing, behavior manipulation, and high-financed data attackers. We have demonstrated, for example, the feasibility of our proposed architecture, based on Solomon-Reed redundancy, in which 36 nodes provide an expected data lifetime of the same order of magnitude as the age of the Earth [7].
Additionally, RAIN could support the development of communitarian services, including telecommunication, content delivery, cryptocurrency, and distributed administration (nation-state and regional governmental), which currently are services managed in a centralized manner through trusted third parties [7]. Implementation of a RAIN-style architecture could thus distribute the power from global centralized trusted third parties to local citizens and businesses, while at the same time presumably reducing the significant energy requirement and resulting CO2 burden of centralized data storage.
Acknowledgment
We are grateful for constructive suggestions from Alex Penn and Piper Stover, and we thank Lucinda Voldsgaard for proofreading the manuscript. Partial financial support was provided by the European Commission-sponsored synenergene project.
Note
RAIN is a metaphor for what comes after the clouds.
References
Author notes
Center for Fundamental Living Technology, University of Southern Denmark; Santa Fe Institute, Santa Fe, NM 87501, USA. E-mail: [email protected]