Concept/Rationale
It's hard to archive, organize, and index all the content that independent users on the net produce. Commonly, content will stay around for 1-2 years after it's posted, but people's interests generally shift over time, and users often stop maintaining their site. The domain will lapse, or the user will change blog software which nukes their old blog archive, or they'll stop paying for their no-longer-updated site to be hosted. There are a myriad of reasons content disappears. However, this content can still have value.
A recent example I ran into (October, 2009) was in trying to find a great blog post by Scott Adams from 2006. He was discussing how he was afflicted with a neurological problem that caused him to lose his ability to speak (unless he was rhyming, or pinching his nose), which doctors said was incurable, which he then cured through clever retraining of his speech center (using the rhyming as an anchor to reconnect his old speech center back up). It was an awesome post, and I was reminded of it when I saw a news item about a woman who had a similar neurological disorder, except she couldn't walk forward or speak properly anymore. She could walk backwards, and run just fine, and even speak normally when running, but as soon as she tried to walk or stand, she'd be crippled. Unfortunately, Scott Adams' blog archive now only goes back to 2007, and the only account I could find of his experience was a puff-piece from some newspaper at the time. 3 years later, and it's gone!
This incredible information churn is understandable -- there is a lot of crap on the internet -- but high-quality, public, independent works need be preserved, if only for historical purposes. There are works from the early 1996 internet that are still valuable today (eg. the Esperanto dictionary), but they only exist because dedicated fans mirror and preserve them.
I mean, the author has already done their part by writing the piece. Expecting them to maintain it in perpetuity is not very supportive or communal.
Users should be able to archive high-quality content in a more reliable, permanent, automated way. Mirroring should be a no-op, akin to bookmarking. If a work is part of the creative commons (or has another permissive license), mirroring and self-organizing should be built-in. This way, the commons can harness the value produced by people's free labours, without having that labour be duplicated over and over.
A specific version of a specific work should have some kind of GUID that can be permanently linked to, and retrieved from the archive. The link would only die if all the caching-heuristics of the archive determined the work to be disposable. (Offline archival would definitely be possible as well; e.g., the work could be "requested" to be put back online if it disappeared and enough people wanted to see it again. The original author could see the requests, and organizations like the internet archive could maintain offline archives.)
The goal of this system is to reduce the wasted human labour that goes into rethinking the same ideas over and over. Rethinking ideas is important, but there's a big difference between valuable duplication of effort and wasteful duplication of effort.
Proposal
A content cloud that is distributed across regular users' desktops or webservers which maintains articles based on heuristics to determine its value and quality.
Heuristics
Access over time
Generally, an work gets a burst of hits when it's first published and spread around, and that popularity falls off after everyone finishes reading it.
If the popularity was the result of "hype" (a passing fad in the group conversation), then the article will generally stop being accessed after the hype has dissipated and the group has moved on. For example, nowadays, you're not going to get much value out of most business articles published during the internet bubble. However, there are a few great ones that stand the test of time. Google manages to bring these articles to the top of the search ranking because people are still linking to them after all these years. These articles have real value, as indicated by the number of people still referencing them.
If a document keeps getting accessed consistently, year after year, it has value, and should be preserved.
Bookmarking as voting
Broken bookmark-links are annoying. If you bookmark something, it should be immediately mirrored on your computer in case the site goes down before you need the bookmark again.
Popularity-caching
The slashdot-effect is an amusing waste of resources. Too many users viewing a single site can take it down, but every user who views that site gets a copy of the page in their cache automatically. Why can't you share that with other users? P2P mirroring would be a useful application of this system.
Implementation
Many aspects are already covered on the Content Cloud page.
- Works (eg. a blog post) is encapsulated in a single standalone "object" that contains all the media and formatting of the original page
- The content creator can attach metadata to the "object" which contains their identity, licensing terms, etc.
All objects can be served up via HTTP (apache plugin, doc->html converter, etc.)
- Versioning of objects allows the author (other authors) to modify a public work.
Possibly Interested Sponsors
- The Internet Archive
- Library of Congress
