Contents
Concept
What's commonly referred to as "cloud computing" is really just "running everything on someone else's server". That's more like a mainframe than a cloud.
Real cloud computing involves sharing computing and storage resources with other people. It involves migrating data to the places where it'll most efficiently serve the people who need it -- locality-based routing. Secure, private, shared, distributed storage and computing.
That's a cloud, baby!
Here's an idea for the distributed storage and security parts.
Basic Features
Content-oriented data cloud (Van Jacobson - A New Way to look at Networking)
summary: The internet should be rearchitected so that it's content oriented instead of endpoint oriented. People don't care about where things are stored, they just wanna know it's authentic.
- trust-network based key authorization scheme
LHRS P2P file transfers (Witold Litwin - LHRS P2P (Scalable Distributed Data Structure for P2P)
Is this scheme performant enough to stream content? If not, how could it be modified to stream?
Can it take advantage of IPv6 Multicast?
- Universal Metadata format
- incrementally-indexable
- queryable
- taggable
- popular spatial organization algorithms for the format
- fingerprinting (using SIFT features, for videos or images)
- fingerprints could feed machine learning clustering algorithms!
what's a good metadata storage/incremental-indexing scheme? (must be searchable and extensible enough to support any future features) Existing opensource project for extracting metadata: Hachior
- Buckets
- stores objects or other buckets
- same metadata system as files
- Dynamic Licenses
- authors can create programmatic licenses for their works
- content owners can change the license on an existing work, if that's in its license
- Royalties
Users own their data (see: Personal Data Ecosystem) and can sell usage statistics, or deny them
- Limited access to personal data (Time-to-live)
- Simple auditing of personal data
- Charges for access to data
- Reverse-spamming (data hoovering) can be blocked and punished
- Filesystem Integration
- A lot of these features would be amazing with filesystem support
- Secure Identity
url-based identity, like OpenID?
cryptographic ID?
ID based on public votes/karma/trust?
- Privacy
- Encrypted anonymous storage / scratch areas
Computations on encrypted data (using homomorphic encryption)
Long-tail Micropayment scheme (Jaron Lanier on connected media, universal micropayments and attribution)
- Users who enjoyed some content can find the author and give them money
- The authors can publish information about their current financial needs
- Revisioning and versioning system
- for both metadata and content
- for modifications by the original author as well as by the public
- popular but untrusted "patches" to a document can be optionally viewed
- copyrights and royalties can be fairly dispensed amongst those who contributed
- theft of ideas will be much more difficult
- Meshable
- Doesn't require centralized servers; queries for content can be sent to your neighbour who will relay it for you
Optional anonymity & cryptographic security
- Cloud storage protocol
- Allows any service (local or on the web) to store data on your remote "data stores" (via a simple standard protocol)
- A "data store" can be a local machine or a remote server
- A "data store" can be run independantly, or offered by an institution
- Allows any service (local or on the web) to store data on your remote "data stores" (via a simple standard protocol)
- Synching/Backup
- Crates/categories of objects, or just indexes, can be automatically synched
- Objects can have properties such as "redundancy" (aka. multiplicity or cardinality) which causes them to automatically mirror themselves
- Sharing
- Apply security level (only share on trusted machines, share with friends, share with everyone)
Distributed collaboration (Talks Clay Shirky: Institutions vs. collaboration)
- Backwards compatability
- All the content can be searchable/accessible via HTTP.
Might've been left out
(Imported from another page)
- Open distributed database
- Limited access to personal data (Time-to-live)
- Simple auditing of personal data
- Charges for access to data
- Reverse-spamming (data hoovering) can be blocked and punished
Migrating from HTTP/Bittorrent
The content cloud can contain content addressable by HTTP URL, or magnet hash.
The cloud could replace:
- the local HTTP cache (locality based routing for content keyed by HTTP)
- a neighbourhood's common HTTP pages (news, weather, sports)
- bittorrent metadata storage (both .torrent files, and info. about media in the torrents) addressable by Magnet Hash
Design Principles
To ensure that the system is adopted, it would be a good idea to learn from the design principles of successful internet protocols like HTTP/HTML, Bittorrent, and MPEG. The principles are outlined here: In Praise of Evolvable Systems by Clay Shirky.
The basic idea: "Strong, well-designed protocols are bound to fail. Infrastructure built on evolvable protocols will always be partially incomplete, partially wrong, and ultimately better designed than its competition."
Summarized principles:
Partial implementation: The protocol is partially implementable so that it can partially work right away and then grow. (It's bound to fail if it requires everyone to adopt it and implement it before it can work.)
Subsume other protocols: If everyone already uses some other protocols, embrace and extend them (the way HTML did with FTP, gopher, nntp, etc.)
Decentralized: Central coordination makes it harder to setup, harder to scale, and slow to evolve.
Work with your neighbours: "Failed designs were internally cohesive, and each operated in a kind of hermetically sealed environment where they didn't interact with their neighbors at all."
Evolution is cleverer than you are: You can't put everything into the spec because you haven't thought of it yet!
"Centrally designed protocols start out strong and improve logarithmically. Evolvable protocols start out weak and improve exponentially."
Potential Features
- Distributed, decentralized, networked applications
- Application runs locally
- Can communicate with other neighbouring applications
- Similar to GIT's decentralized version control scheme -- people collect cool diffs from each other and the good things naturally aggregate together
- Able to support novel media
- Future uses like streaming 3D models and maps for online virtual worlds
Negotiate direct connections between people or groups of people for collaboration (just implemented: wave.google.com)
- Whiteboards
- Chat
- Screencasting
- Editing game levels
Initial Implementation
Something that can run on shitty modern OSes like Windows, Linux and Mac.
Features:
- Basic GBridge functionality
- Metadata/indexes stored in a .metadb file in the root of each device
- automatically updated based on FS events
- stores git-like revisions and such
Applications
Web 3.0
Distributed Internet Archive
Public Music Distribution Network
Distributed Multimedia Wiki
GNUcash: Real Virtual Money
- Make GNUcash the brand for a new kind of crypto-currency.
- Visualize and track trends in the GNUcash markets
- Encrypt/decrypt/print cryptomoney.
Research
Elliptics (DHT replicated object storage)
Swarmplayer (Bittorrent streaming player)
Tahoe-LAFS (distributed, encrypted cloud storage which can be used securely by web applications)
Apps that can be integrated with online storage
- All IM clients (contacts, logs, etc.)
- All browsers (history, bookmarks)
- Documents (shared storage of duplicated data)
- Music
- Videos
IRC discussion about identity, distributed storage, trust rings, authentication
1 18:20 <+epitron> have you guys noticed a lot of social-network-aggregators coming out lately?
2 18:20 <+Skiz_> like over the past few years? yes
3 18:20 <+epitron> not all social actually... but lots of "manage all your logins in one place" things
4 18:21 <+epitron> there seems to have been a spike in the last 3 months :)
5 18:21 <+Skiz_> yeah lots of places are doing that. I don't like it
6 18:21 <+Skiz_> I prefer to have a bunch of passwords and different logins
7 18:21 <+epitron> i think it's revealing something i've thought we needed for a long time... some kind of user-controlled personal online appserver like thing
8 18:22 <+Skiz_> like openid on crack?
9 18:22 <+epitron> exactly
10 18:22 <+epitron> some kind of standard that lots of 3rd parties can implement
11 18:22 <+epitron> and users can pick who they like
12 18:22 <+epitron> "your computer" isn't as important as "your data"
13 18:23 <+epitron> and currently, everyone else owns "your data"
14 18:23 <+epitron> :)
15 18:24 <+epitron> i have no idea how you'd go about convincing people to use it
16 18:24 <+epitron> or implement compatability with it
17 18:24 <+epitron> it seems like it requires a total rethinking of online data storage and how software works
18 18:28 <+epitron> but, of course, nobody's going to do that..
19 18:28 <+epitron> we're just going to mess around, and eventually someone will find a hacky patch that works okay for now and that will become the permanent standard
20 18:28 <+epitron> go humans \o/
21 18:29 <+Skiz_> why would it?
22 18:29 <+epitron> why would what
23 18:30 <+Skiz_> redefine storage... if everyone had a s3 or sdb(ish) that they trusted (set up your own or use someone elses) as a standard for data storage, a simple AMQP client for passing data around would work fine
24 18:31 <+epitron> oh
25 18:31 <+epitron> i just mean, the general concept of cloud-based storage
26 18:31 <+epitron> basically the internet should be rearchitected so that it's content oriented instead of endpoint oriented
27 18:31 <+epitron> people dont' care about where things are stored, they just wanna know it's authentic
28 18:31 <+Skiz_> right
29 18:32 <+Skiz_> it would also handle authentication requests using keys and such. I think zapnap and I touched on that back at rc08. that and everything being signed as in requests and all data
30 18:32 <+epitron> what happened at rc08?
31 18:33 <+Skiz_> too many beers. just discussing a twitter killer and it ended up refining the entire way data is stored and requested
32 18:33 <+epitron> haha
33 18:33 <+epitron> yeah
34 18:34 <+epitron> i think google is working on this
35 18:34 <+epitron> but i don't know how they're going to get it off the ground
36 18:34 <+epitron> (i'm sure they CAN, i just don't know what their plan is)
37 18:34 <+epitron> they could be doing it through those phones, or their dark fiber cache nodes...
38 18:35 <+epitron> (previously dark fiber :)
39 18:35 <+Skiz_> tcp and udp is fine. we already have a transport.
40 18:35 <+epitron> haha
41 18:35 <+epitron> SCTP is better
42 18:35 <+epitron> and that's not really the issue
43 18:36 <+epitron> the hard thing is that hashing data has possibility of collisions
44 18:36 <+epitron> and that authentication is tricky
45 18:36 <+epitron> at the very least, you'd need those two components to be replaceable
46 18:37 <+epitron> (by authentication i mean.. "identity")
47 18:37 <+epitron> like.. imagine someone cracked or stole the new york times' private key
48 18:37 <+epitron> and started releasing fake articles by them
49 18:38 <+Skiz_> who says you cant do that now?
50 18:38 <+epitron> ok, so you solve that with a mechanism where the new york times can invalidate their old key and publish a new one
51 18:38 <+epitron> but now that means that someone who stole their private key can FAKE an invalidation of the old key
52 18:38 <+Skiz_> we already have 2048bit gpg
53 18:38 <+epitron> pfft :)
54 18:38 <+epitron> all encryption protocols are temporary
55 18:38 <+Skiz_> yup
56 18:39 <+Skiz_> and so is your binary data
57 18:39 <+epitron> but identity is not
58 18:39 <+epitron> (i suppose that's also debateable)
59 18:39 <+Skiz_> neither is your hacked gmail account or anything else
60 18:39 <+epitron> (but in this scenario it's longer lived than your encryption keys or your data)
61 18:39 <+Skiz_> right
62 18:39 <+epitron> so the problem is the key changeover
63 18:40 * Skiz_ thinks about ssl signing for sources but eww
64 18:40 <+epitron> i suppose you could associate some kind of site of "authority"
65 18:40 <+epitron> like, nytimes' key is authentic if it comes from verisign
66 18:40 <+epitron> :)
67 18:40 <+Skiz_> public trust circle as authority
68 18:40 <+epitron> or some bullshit
69 18:40 <+epitron> yeah..
70 18:40 <+epitron> hmmm
71 18:41 <+epitron> maybe the problem is that this is too virtual
72 18:41 <+Skiz_> it is
73 18:41 <+epitron> what if key exchange was rooted in reality :)
74 18:41 <+Skiz_> thought about that but too slow
75 18:41 <+epitron> slow can be overcome with prediction
76 18:41 <+epitron> if you know your key will have to be changed in 6 months, send the new one tomorrow :)
77 18:42 <+epitron> hmmmm
78 18:43 <+epitron> this might work...
79 18:43 <+Skiz_> associate yourself to an authority, get accepted and get your key. sign your data. sounds like ssl. your signed data would reference the provider
80 18:43 <+Skiz_> hrmm internet sucks. I'm going to buy a house instead.
81 18:44 <+epitron> what if every time someone *physically* interacted with the entity authenticate (eg. nytimes.com), that data is barfed onto the internet
82 18:44 <+Skiz_> at least I can physically lock the doorts
83 18:44 <+epitron> so there will be thousands of these interactions
84 18:44 <+epitron> and it would be really hard to fake
85 18:44 <+epitron> unless you had an army of people
86 18:44 <+Skiz_> or a botnet
87 18:44 <+epitron> but then the attack would be totally obvious
88 18:44 <+epitron> because you have 2 huge pools of people
89 18:45 <+epitron> and it would probably become obvious who was the real nytimes
90 18:45 <+epitron> and if it WASN'T obvious, then they deserve to be the real nytimes ;)
91 18:45 <+Skiz_> well I was thinking for my collaborative searching thing would be about the same
92 18:45 <+epitron> haha.. that would be funny.. if someone was just better at being the nytimes than the nytimes, and they replaced 'em
93 18:45 <+Skiz_> no more squatters!
94 18:46 <+epitron> totally
95 18:46 -!- Skiz_ is now known as Skiz
96 18:46 <+epitron> what's your collaborate searching thing?
97 18:46 <+epitron> gnutella style?
98 18:46 <+Skiz> kinda like delicious but only from friends or friends of friends for your search with higher ranks for more bookmarks which are referenced to pages, etc.
99 18:47 <+Skiz> kinda like if I wanted to see what new js libs there are. I'd search it, and use kind of like a linked-in type network of people I know or trusted authorities
100 18:47 <+Skiz> more of a trend search than a global data store
101 18:48 <+epitron> you know
102 18:48 <+epitron> this idea of "friends" is overrated
103 18:48 <+epitron> i think what we want is "truested people" :)
104 18:48 <+Skiz> exactly
105 18:48 <+epitron> friends are different
106 18:48 <+dagbrown> I think of my livejournal friends list as more a "subscription" list
107 18:48 <+epitron> i'm not friends with 90% of my internet "Friends"
108 18:49 <+epitron> they're more acquaintances
109 18:49 <+Skiz> people you trust to provide good judgement and suggestions from anyway
110 18:49 <+Skiz> to a point
111 18:49 <+epitron> but yeah... that trust network thing is great
112 18:49 <+epitron> the problem is that trust can only be allocated to certain areas
113 18:50 <+epitron> like, "i trust this guy's judgement about cars, but i don't trust his judgement about brain surgery"
114 18:50 <+Skiz> right which is your place to see that
115 18:50 <+epitron> or "i trust this guy to lie about his social policies"
116 18:50 <+Skiz> and if you searched for brain surgery and theres only 1 hit from 10 levels of people you know...
117 18:50 <+Skiz> and its his... eh..
118 18:50 <+epitron> well, if you're doing automated filtering... the automated filter has to know that too
119 18:50 <+epitron> it can't just treat everyone in your "trusted list" as the same
120 18:51 <+epitron> i guess it could cast a really wide net, and then you could look down the list
121 18:51 <+Lars_G> br
122 18:51 <+epitron> reputation/karma for certain activities might make more sense
123 18:51 <+epitron> i guess what we need is for the internet to be a brain :)
124 18:52 <+Skiz> oh gawd no.
125 18:52 <+epitron> haha
126 18:52 <+epitron> sorry!
127 18:52 <+epitron> that's the way it's gotta be
128 18:52 <+Skiz> I've tried, ran my cpu through the roof
129 18:52 <+Skiz> word association systems and such with search stuff. it gets hectic.
130 18:53 <+Skiz> anyway gtg for a few sorry
131 18:53 <+epitron> np ttyl
