I assume that hardware can't just sit around for decades without little mutations sneaking into the code
I... um... what? Code is software; it will never change. What happens to hardware is that the physical media degrade, so that when the computer goes to read a bit of information that's at a specific physical location, it can't.
But I suspect that that explanation just leads to more questions...
Really long-term digital storage has a built-in plan for periodic testing and verification of the data, and assumes that the bits will need to be migrated to new types of media at as yet unknown intervals. Specific digital representations are assumed to have relatively short effective lifespans driven by obsolescence of retrieval software and devices as much as the media itself.
Oh yeah, and LiveJournal isn't a storage company. They probably have backups, but long-term storage isn't one of their core competencies or part of their business model, so if there's anything there you really care about, make a copy of it yourself.
1: No, I get that. I was trying to say that in quick way. Not that there are little mutations in software, but at somepoint there's something that can get rusty or damp, and then when you try to use that part of the server, you get error messages.
Oh yeah, and LiveJournal isn't a storage company. They probably have backups, but long-term storage isn't one of their core competencies or part of their business model, so if there's anything there you really care about, make a copy of it yourself.
I used to, and then the website that makes a book out of your journal stopped working on my journal, and I haven't done so in a while.
But you'd think that, given that they're storing people's personal diaries, that they'd put a high premium on not losing them.
ONE STUDENT LEFT. She has a 110 homework average. There is no way that she needs this last half hour. But I'm a kind and decent teacher, so I will sit here patiently and blog about her.
I work in this area. We basically have multiple copies, on multiple types of hardware, replicated at more than one physical site. With a bunch of error-checking, checksumming, and so on. That's the theory, anyway. For cost reasons we have quite a bit of stuff on a tape store which is gradually getting migrated to the new storage infrastructure.
I recently pulled about 150,000 images from tape [at up to 200MB per image] and checksummed the lot, comparing the checksums to checksums generated when the images were pushed to tape [about 10 years ago]. Four images failed, and those are probably repairable. So even tape storage isn't necessarily that bad.
Having the data replicated at more than one site would be pretty standard. And the storage at each site isn't really going to be a single copy, either.
Long long term digital storage is expensive. Libraries and other research institutions are having to deal with this problem.
6: You would think that, but maintaining long-term storage is *hard*, and it's one of the first things I'd expect a company that's been through as many owners as LiveJournal has to de-prioritize.
6: But you'd think that, given that they're storing people's personal diaries, that they'd put a high premium on not losing them.
A) To a point.
B) They probably have not thought as far ahead as you think.They look to be a bog-standard Internet service.
C) If you really care about preserving your diary long-term, it is up to you to do the steps in 2. Does not require anything sophisticated, just diligence.
Yes, what 9 said. This stuff costs serious money. I don't really want to quote how much we, in theory, charge (largely internally) for dtoring X terabytes of data, indefinitely, but it's not like going to PC World and buying a 1TB firewire drive.
I do see that the Wayback Machine has pictures of Jammies' ass! So you're good on that front, at least.
Last student done! I'm outtie. Smell you later.
Semi-OT regarding finding stuff on Facebook (which I almost never use). Was searching for something with regard to LB's Scrabble thread and one result came up on Google and it was in Facebook. However, my (not very extensive) attempts to get to it have failed. I would think that my being logged in and then clicking on Google would do it (or at least get a Facebook error about permissions), but I always end up redirected to my home page, even if I enter the address while in Facebook. Maybe that is the behavior if you do not have access.
then the website that makes a book out of your journal stopped working on my journal,
Does anyone have any recommendations or experience with this kind of service that works with Blogspot?
re:7 : This is why she has a 110 average.
Of course, this is also why I rarely aced stuff. I am not that patient.
re: 17
One of those authors is my boss. Or rather my boss's boss. And I work with the developers on this program. They do the born-digital stuff, and I'm working with the digitised special collections material (manuscripts, etc).
||
Heebie, are you prepared for when HP and HP are the age where they ask "Is infinity-plus-one a number?" and "Is googol plex like infinity?" and "does googol come after a million?" over and over and over?
|>
Asking isn't bad. When they insist that infinity plus one is bigger than infinity until you are forced to not only agree with them but give them a reason of your own devising for why, that gets old.
When I give a test for employment, anyone who hands it in early had best do a damn good job.
re: 17
I'm actually quite surprised to see his name on it, actually. He's more a management/policy guy, than a techie. Although I suppose he may be responsible for that side of the report.
Also, it's worth noting that truly long-term digital storage hasn't been a concern for that long. Tape drives weren't invented until the '60s, I don't think, and solid-state storage of a reasonable size is a much newer development. But there's nothing out there with the proven longevity of archival-quality paper stored well. ttaM probably knows horror stories -- the one that I've heard of is the BBC's Domesday Book project, which chose a technological dead end in 1986.
I don't know anything about LJ since it got sold to Russian ownership, but the backup plans in America were pretty bog-standard (servers at different facilities, etc.). If you're concerned about data loss, the best reasonably easy thing to do would be to export it regularly, save the data to a USB drive, and throw it up on Dropbox. Chosing standard file formats -- ASCII, TeX, HTML -- is probably a useful way to ensure that Hawaiian Punch's kids will be able to read the files and marvel at Grandma's weird technological wasteland.
Yeah. One of our teams has done work with archiving the software itself, rolling file format conversion/migration, and so on, to try and get around the problem of obsolete software or file formats. It's less relevant in my area as everything tends to be ascii text (containing metadata) or TIFF images. Although there's some concern re: the move to jpeg2000 as an archival format. We, like others, are doing it, but it's not proven in terms of long-term use.
But there's nothing out there with the proven longevity of archival-quality paper stored well.
This stuff is so fascinating to me, from an entirely non-technical perspective. I don't want to derail the thread from the technicalities -- it is a science! thread, after all -- but in my capacity as a book dealer, it's a marvel to note how many people believe that digitized content means forever, not like those crummy old paper books, which take up so much space, which is expensive and labor-intensive, unlike these lovely e-things.
Don't get me wrong, I wouldn't make the argument that paper books are clearly superior; rather, a digitized informational format is just different, coming with its own not-minor set of expenses.
(This comment brought to you partly by the fact that I heard today from an old acquaintance in the university press trade, who disclosed that sales figures trade-wide for new books since this time last year show a 50% reduction in the sale of paper books; made up for by the sale of e-books. Gosh, who's maintaining all those servers? We're gonna need more electricity!)
Personal archiving advice. I haven't read through it, but it's probably pretty useful.
But there's nothing out there with the proven longevity of archival-quality paper stored well
A friend of mine with a house full of Victrolas points out that while magnetic media degrade and CDs scratch very easily, 78s sound much as they did when they were made, 60-100 years ago.
Gosh, who's maintaining all those servers? We're gonna need more electricity!
Iceland seems like a good bet for this sort of thing: lots of geothermal power, and it's naturally cold besides.
Incidentally, one of my new housemates is an Army languages guy, currently in college so as to jump to the officer track--anyway, I was kind of shocked that he learned Icelandic for his job. There are only 310k native speakers, and they all speak better English than I do!
The link in 29 is interesting. It's from two years ago; I wonder how things have gone since then.
Also interesting, speaking of Iceland, is this article I read today. Icelanders: maybe not actually the whitest people in the world?
Actually they probably still are, even if they aren't quite 100% white.
They don't seem to be on Benjamin Franklin's whitelist.
They don't seem to be on his tawny list either.
Ben Franklin to Iceland: you don't exist.
On the original post topic, I think people sometimes underestimate* how much electricity/energy is needed to maintain current and future paper/print/photographic/etc. collections. Lots of stuff has survived for decades and centuries pre-electricity, of course, and if you knew you wouldn't have electricity for climate control, you could probably design buildings** that would still hold up pretty well for preservation purposes, but nevertheless, pretty much all modern long-term preservation strategies at major institutions are dependent on keeping up climate and environmental controls, and lots of materials in collections are not on archival-quality substances.
*Some digital advocates might be estimating too far the other way, but I don't think this is common.
**Or modify caves.
That's a good point. One of the things that's really sobering about studying archaeology is realizing that what has actually survived is such a small part of what once existed. In terms of the kinds of media we're talking about here, the most instructive comparison is probably the amount of textual material surviving from Mesopotamia (where they wrote on clay tablets and frequently burned each other's cities to the ground, baking the tablets) and Egypt (where they wrote on papyrus and were usually united in a single centralized kingdom with relatively little internal strife). There's tons of stuff from the former (still a relatively small proportion of what was actually produced, of course) and virtually nothing from the latter. Note that these are both societies that lived in river valleys surrounded by deserts, not too far from each other, so climatic variation isn't a big factor. The Greeks and Romans also wrote mainly on papyrus, and most of their output is similarly lost.
The Greeks and Romans also wrote mainly on papyrus, and most of their output is similarly lost.
There's a good preservation record for papyrus that actually got dumped in the desert (e.g. Oxyrhynchus) as opposed to being kept carefully in buildings in cities with librarians and such to look after it. Which goes to show that when it comes to the best laid plans of mice and men, the mice win hands down. I wonder if there's a small money spinner for some N.African country archiving paper out in the dunes.
Maybe a failsafe archive library should be built in the middle of the Atacama destert.
||
This guy has to be among the 'Tariat, right?
|>
Which goes to show that when it comes to the best laid plans of mice and men, the mice win hands down.
Yeah. That was part of my point about the ooooh-isn't-digital-content-awesome perspective according to which, in confused fashion, that content won't be subject to loss the way paper is. Well ... not so much. Digital content may be awesome in various ways, but that's not really one of them. We can do our best, of course, as we (collectively) often have done, but that's about it.
re: 37
Yeah, I've handled some of those (Oxythingie stuff) at work. And also other cool things like the Cairo azineG.* Parchment is amazingly robust. Not cheap to produce, though.
re: 35
FWIW, most of our stacks aren't particularly climate controlled. They are/were** just built in a way that kept them at the right temperature, and at the appropriate humidity level most of the year round with just a bit of heating and ventilation. I expect that's easier in a temperate climate like the UK than in some places, though.
* google-proofed because there's a new big public (and very cool) site going up soon, which I was partly involved in.
** they just got partly knocked down
I think the big difference is that digital stuff requires active maintenance and a continuing spend. Books and the like, it's nice if they are kept in decent conditions and have preservation specialists working on them, but acid-free paper and anything older on vellum/parchment is pretty robust and can survive quite a lot of neglect.
digital stuff requires active maintenance and a continuing spend
I know! Is this really a good idea in the long run?! (Yeah, I realize there are many people more knowledgeable than I who discuss these matters, and in fact I'm not particularly of an archaeological bent, but still! You know?)
/end consternation
Is there research going on into some kind of archival-quality digital storage? I keep on reading versions of this -- that anything digital breaks down without being recopied onto new media and so forth every decade or so. And it doesn't seem like the sort of problem that should be insoluble -- there should be some sort of (probably much more expensive) method of storing digital information that while it might not match paper, in terms of being capable of withstanding a thousand years of neglect, should at least be much better than what we've got now.
re: 43
Well, in our case, it's not as if the digital stuff is created and the originals flung away. The original stuff is preserved just as before, in many cases in better conditions and with less wear and tear as they are accessed/used much more often. The digital stuff acts as a surrogate for that, from a preservation point of view, but also can be disseminated across the internet to anyone who wants to look at it. And scholars can annotate it, and add transcriptions and translations, and reorder, and so on.*
For example, this http://tinyurl.com/c7xebme
(anonymised link) is the oldest Plato manuscript [for some of Plato's stuff] in the world, and anyone with a web browser can look at it. That's not the case with the original.
* if the stuff is presented in the appropriate way.
re: the manuscript linked in the previous thing. It sort of blows me away that someone who can read Greek [alameida, oudemia?] can read that like it was written yesterday. The hand is so clear, and the thing so well preserved and clean, and yet it's well over 1000 years old.
44: Well, you could print it all using good-quality ink on decent paper and rely on future scanning/OCR technology being up to the task ...
45.1: Well, in our case, it's not as if the digital stuff is created and the originals flung away.
Right. I'm thinking more of the time to come, perhaps the not very distant future, at which there are very few non-digital originals produced in the first place.
45.the rest: Oh, absolutely. What you and yours are doing is great.
re: 47
Accuracy and scan rates on good quality prints in modern fonts is already pretty amazingly good. Even fuggly stuff like 17th century black-letter fonts (fraktur, etc) on warped paper, they are getting over 90% accuracy.
acid-free paper and anything older on vellum/parchment is pretty robust and can survive quite a lot of neglect
Right, but a lot of stuff isn't in those formats. I'm thinking mostly of cheap paper produced more recently* - newspaper, dime novels - or things like film and photographs, which ideally would be in cold/cool storage. You can still walk away and probably have a better shot at keeping most of it than with digital stuff, of course. But anything related to long-term preservation isn't cheap.
*Although it's held up a hell of lot better than the doomsayers of the "microfilm everything!" era predicted.
re: 48.1
Yeah, I think we are pretty screwed with a lot of that 'born-digital' stuff. I've said pessimistic things in the past about it, and I still think it's true.
A lot of born-digital stuff just can't be printed.
Multiply pwned on that one.
In the spring I'm finally going to get some experience with digitization and born digital stuff (at different institutions). And then maybe I'll finally get a real job.
A physical/digital interface where you get to deal with a similar issue (differing rates of obsolescence in this case) is in manufacturing. For instance a number of our locations have stocked up on spares of things like this bad boy when it was clear they were going end-of-life. What get's increasingly cumbersome on that era of stuff is getting data to/from the rest of the network.
There's an old (as in, been around for a while) listserv out of Stanford, I think but am not sure, populated by people in the special collections and antiquarian and sheer MIS realm of the library business. They changed servers a while back and I lost track of the shift, but they talked a lot about this sort of thing. I should find them again.
55: Let me guess, it folded and the archives are lost!
Read, mark, learn, inwardly digest!
Shorter: Hard disk reliability is statistical. Acceptable error rates are defined in the SCSI spec, and the firmware in your HDD will make that work. Occasional read errors are normal and are fixed by forward error correction, and will be retried without bothering the operating system.
Write errors, though, are serious and should be treated as diagnostic of imminent failure. The smartctl command line utility lets you query and work with the drive's internal logging.
When I read the comment, I immediately ran smartctl -t short /dev/sda and was delighted to find 4 read errors in 4 years and no writes. Then I backed up my shit.
There's some journal that I end up having to read articles from for courses here that doesn't appear to do a digital version - it's possible that the library doesn't subscribe, but I don't think it's even a possibility -and the result is that when you go to the stacks, the heavily used volumes are coming apart from being photocopied so often, and some things are either missing or being rebound.
56: I don't think so. They just changed to a different locale, and you had to resubscribe or something, and I had a hard drive crash around the same time, and I never followed up. They're still around, since I see posts from there linked sometimes.
digital stuff requires active maintenance and a continuing spend
There's a case that this is a good thing. Stuff that gets looked up gets kept - stuff that you file in drawer Z and never think about again vanishes. The danger with perfect preservation is that the perfectly preserved stuff is forgotten and perhaps eventually destroyed.
This is also why people love old buildings that are still with us. It's not that the ancients were master architects - it's that we keep using the good buildings and we fix them when they break. The shitty ones, well, they get demolished and built over, or there's an insurance-fire, or something tests their structural resilience and it comes up wanting.
The report I linked in 17 is pretty good as an overview of what kind of digital recovery is possible.* But it's really expensive to run the full-scale tools.
*Sometimes it's too good: oh, did you not mean to donate all the banking account numbers and passwords that are still saved on your hard drive that you thought had been deleted?
60: Nope. I should check when I'm at the shop; I only ever read it from there.
45. What a beautiful hand! Even I can read it phonetically, though I've no idea what in means because my Greek lexicon is currently propping up my monitor for reasons of comfort.
What worries me is that previous civilisation meltdowns have permitted some texts to survive exactle because they're incredibly low tech. We can find accidentally fired tablets from Mesopotamia and Crete and work out way back into them. Likewise inscriptions from Mesoamerica (which weren't even supposed to be proper writing 25 years ago) and papyri from the Egyptian desert.
But when we lose the power lines, that's it. Our civilisation will be as lost as the people who did the cave paintings.
As a case in point of my 61, check out my 57. SMART-enabled hard disks protect your data like so: at the factory, they run a battery of tests to find bad sectors, and build up a table of bad sectors on the disk. That won't find them all, though. When you send it a write, the disk head tries to do it - if it hits a bad sector, it does a random jump away from it, retries the write, and then records the bad sector so it won't be used again.
Hence the bathtub curve; HDDs mostly fail in the first three months or years later, because if something is seriously awful this process will break down early. Otherwise, it'll be OK until the thing physically won't spin up or the table of bad sectors is full.
But you only have assurance that the thing will work so long as you're using it and specifically so long as you're doing writes. I think this is interesting and telling.
chris y gets it right. But, uh, I'm sure we'll figure something out.
See, there were thousands of literate civilizations back then, but only one used clay tablets to write on and then burned the archive down.
67 I wish I was sure. As Alex says, pecifically so long as you're doing writes. I think it's chilling.
I don't know, I'm pretty much with the pessimists in terms of the really long run, but you can't really do much more preservation-wise than try to keep things going as long as the society you're in keeps going, after which you're going to have some other problems to deal with. So within the time-scale of "as long as civilization still exists" I guess I'm less pessimistic than I was a couple of years ago, now that I know more (still without having any experience to back up my gut impressions).
re: 61
Of course you're right that there's a role for active preservation, although, ironically, in the case of the Cairo azineG, and Oxythingie stuff, both of those are/were recovered literally from stuff people threw away.
I do think it's at least kind of funny that far into the future some historians may end up debating how to characterize our society based on the printed out email correspondence of the people who refused to trust computers.
And FWIW, I've been impressed by the stuff I've read from/by people who work at ttaM's institution.
71: both of those are/were recovered literally from stuff people threw away.
I love that! I've been tangentially fascinated by palimpsests as well.
Re: 72, I was charmed, and some combination of other words, to discover after my mom died that she'd printed out those emails of mine she apparently decided to print out. Jeez. We should all pay more attention to what we say, I guess.
60 was right after all: the listserv is/was ExLibris, which is linked from that page. This information is provided from the link. That's a frowner after all.
I'm going to ask around what's up with that. I had no idea.
re: 73
FWIW, there's some vaporware-y stuff out there, too. Presentations people gave on methods for how we were doing things, or going to do things that have gone a bit by the wayside. But yeah, there's a fair bit of actually-delivered and about-to-be-delivered stuff that's both conceptually quite clever, and functional. There's a couple of very cool things that exist and are in alpha/beta that will go public soon.
However, it's easy to get a bit depressed when I go to euro-meetings on this stuff and meet people from the BnF and Nb.no, and places like that. They have _real_ money to spend, as well as clever people working for them. And there's cool interesting stuff being done at some of the eastern european institutions. Places that don't have a huge amount of money but are doing a lot of good stuff with what they have.
I've been tangentially fascinated by palimpsests as well.
You know about this presumably? The book is really informative, but I guess they've moved on since then.
What worries me is that previous civilisation meltdowns have permitted some texts to survive exactle because they're incredibly low tech. We can find accidentally fired tablets from Mesopotamia and Crete and work out way back into them. Likewise inscriptions from Mesoamerica (which weren't even supposed to be proper writing 25 years ago) and papyri from the Egyptian desert.
But when we lose the power lines, that's it. Our civilisation will be as lost as the people who did the cave paintings.
Well, yes and no. As I was saying earlier, there's a huge difference in the amount of stuff preserved on those clay tablets and on the handful of papyri that happened to get dumped in the desert. And it's not like we're completely digital now; there's still tons of paper out there, not to mention inscriptions on buildings and so forth. I think our position is pretty similar to that of Egypt, Greece, Rome, Mesoamerica, etc. I.e., almost everything will be lost eventually, but some of it will survive fortuitously under unusually good preservation conditions even if there's no continuity between our civilization and whatever comes next, and if there is some continuity more stuff will be preserved because people continue to use and copy it.
Take the manuscript ttaM linked. That's the oldest copy of Plato we have, but it's closer in time to us than it is to him. Almost all of the texts we have from the Classical world have come down to us from medieval copies, and as a result we know way more about those societies than we would if we were relying purely on the handful of original documents that have survived. Egypt is a different story, and a more common one.
So should we copy the unfogged archives onto clay tablets, so that 2000 years from we will be the record of our civilization?
There's a new Assyrian dictionary out that will make it easier to translate the posts.
77: Yeah. There's a show about it at the Walters Museum here in Baltimore, which I haven't seen yet, but I hear it's impressive and absorbing.
"As there civilization collapsed, a group of their most distractible minds gathered to await the return of the one they called Ogged."
Spelling and grammar will be different in the distant future.
54 reminds me of a) the continuing existence of vacuum tube manufacturers in Russia (that first one is the corporate parent of Tung-Sol, I believe) and b) an old article (from the WSJ, no less) on the last American maker of reel-to-reel tape folding (that quotes, among others, Jeff Tweedy and notable digital-hater Steve Albini).
I think I've mentioned the joy I felt when I watched a How It's Made episode where they made the rolls for player pianos using a process that was computerized on an Apple IIe.
re: 84
Valves are heavily used in the making guitars louder community, so I think certain core valves (EL34s, 6L6s, 12AX7s, etc) will be around for decades to come.
Famously, Radio 4 long-wave is going off air because they can't get a massive valve any more.
http://www.guardian.co.uk/media/2011/oct/09/bbc-radio4-long-wave-goodbye
certain core valves
I read this as "certain core values". Heh.
One of my Facebook friends is a player-piano expert.
Re 66, 69:
That's true to the extent you're relying on the hardware alone. There are numerous checksummable storage layers on top of that though - things like ZFS do auto-validation of data on each read, or en masse on request, with data replicated across machines and data centers. The real danger is N-year flood or system-failure in nature; something that knocks out a big chunk of one macro system with no real backup or external replication. Think Gmail accounts of famous men here, subject to deletion or a meteor striking an entire data center.
Not quite so dire as all that; still concerning that the whole system relies on more or less continual maintenance though.
88: Has he appeared on Canadian TV shows?
They don't really show faces on How It's Made and they don't mention names (just a subtle-ish view of the logo for the factory). Nobody talks except the narrator. It's for people who like to watch assembly lines.
I love how they call everybody "worker".
Some are called artisans or craftsmen.
"As there civilization collapsed, a group of their most distractible minds gathered to await the return of the one they called Ogged."
"I say, Carruthers, did these people ever talk about anything but dating and food?"