Materiali Seminario 7/9 - Discussione Digital Medievalist

« Torna all'elenco

Inserito il 03/09/2010

2) Digital Mediavlist  DEBATE ON LONG TERM PRESERVATION (with links to relevant documents)

 

I have a question about preservation of digital content, especially medieval
manuscripts. I am writing a small article on the topic and I have consulted
a lot sources (papers, handbooks) but most of them do not say anything about
the "life span" of the data in specific formats. To clarify this - a .doc
file crated in 1995. Will be most likely unreadable  in 2010.  What about
other formats? Has anyone done some research on "life span" of a specific
version of digital formats and when it becomes clear that the new version
and the old one are not compatible anymore? And here I am talking about pdf,
rtf, doc (and all office files), djvu, tiff, jpg , mpg etc. (texts and
images especially)
In my work I am also making a small remark on XML as a data container since
it is, in my opinion, the best way to go and the standard will surely be
around for years. But what kind of steps do you make to ensure the
preservation of documents that have been encoded in xml
I would also like to hear if there are opposing views on xml.
I also have the same question about the media. I found some research about
the longevity of CDs and DVDs but I am also interested in other media like
older hard disks, zip drives and magnetic media.
Thank you in advance

Daniel Mondekar

 

 

Dear Daniel,
I understand your question of 'preservation of digital content'. As far as my memory serves, it seems to be a Microsoft specific problem. DOC format, native to MS, has gone through two 'major' changes in the last two decades, as MS Word goes on improving. The .doc file created by Word 97 was unreadable on older versions; but the one created by Word 95 and older was readable on Word 97. It's called downward-compatibility, designed to serve 'business ethic' for not ruining the valuable digital content. The same problem happened again when Word 2007 was released. Word 2007 is downward-compatible, whilst Word 2007's specific doc files are unreadeable on Word 2003. However, I have no idea whether Word 2007 reads the doc files before Word 97 or not.
Other popular formats have also gone through improvements, though compatibility issues are barely heard. I remember when I opened a PDF file created by newer version of Photoshop with an older version programme, the file was properly opened despite the message 'Some information will lose, etc'. That's why I say the compability issue is probably a MS specific issue.
I think, the life of a popular file format, e.g. jpg or mpg, is rather long. There was a debate over compatibility when MS was planning its 2nd generation GUI, i.e. Win 95, to succeed Win 3.x. They seem to have come to an agreement about downward-compatibility, as stated. That's why many 20-yr old formats are still in use and 20-yr old files of those formats are still readable. For example, JPEG format was there when I was in high school. Now I have no difficulty reading those archaic files on this computer, though their 65k-colour palette violates my eyes.
I wish I can say something about XML, which is way too modern for a historian, um, politically. Personally I like databases and .txt format more than new standards, only because I am used to them.
So far CD and DVD are the most reliable media. Their life span is longer than 15 years as long as they are treated tenderly. Hard Drive is efficient when it is cool. It, however, can turn into a nightmare when it is naughty. That's why IT experts suggest everybody to make backup CD/DVDs of the HD. Older harddisks are useable as long as they are with IDE interface and NOT broken. The average life span of the older generation HD, says, 20G, is like 5 years. Don't shake it and don't feed it water, it may live longer. It is hard to tell how long IDE interface will survive though, as SATA is getting popular. ZIP drives! It was out in the market for maybe a half year? It was gone immediately when CD-R was commercialised. Magnetic tapes were terminated by CD-R, too.
Whatever media you use, regular backup is the rule. Hope this helps.

Best wishes
Gerald Liu
PhD student in medieval history, Durham
Working on late medieval manorial management and farm workers.
Personal website
http://www.durham.ac.uk/gerald.liu/



Hi Daniel,
I have been haunting this listserv for six years now, and this is one question that I can help answer! I am the digital initiatives coordinator at a special collections library, and the field and theory of digital preservation is changing rapidly.Terms you'll want to define and liberally use in your article include, but are not limited to:
- Bitstream copying: this is what you do when you back up your files onto an external hard drive; bitstream copying is the most basic level of digital preservation, and although it is necessary, it misses a lot about information itself.
-Digital sustainability: incorporates a number of actions intended to preserve digital artifacts to be accessible to an nth year.  The best rule of thumb I've heard someone use is 10 years: "If I create this JPEG now, will its format be sustainable and readable ten years from now?"  -Digital encapsulation: this refers to two things, the digital artifact itself and its descriptive information. For instance, what good is a JPEG image of a cuneiform tablet if there is no descriptive information--metadata--attached to it?  Encapsulation is the idea that information must always accompany the item.  (In addition encapsulation includes preservation metadata, which tracks file integrity information (like checksums).
-Migration: implies copying information from one technology to another; this copying includes both the digital artifact itself and any metadata attached to it. 
-Emulation: If you have that doc file, and you need a system that reads it, you eventually will find an emulator to be useful. Emulation provides a simulacrum of a digital object's original software environment, which not only allows access to the file, but it also preserves the original digital experience. Emulation and migration are often discussed as oppositional ideas, but they both play an important role in digital preservation.
-Standards and normalization, consistency: Standards are key to good digital preservation strategy.  Normalization is the dedicated adherence to those standards.  Consistency is always key to digital repositories and their preservation.
This will get you started. I'd recommend taking a listen to the new(ish) Library of Congress digital preservation podcast series:
http://www.loc.gov/podcasts/digitalpreservation/index.html
Best of luck.
Best,
Ana Krahmer


 

There’s a usefull list of some “canonical” references in the discussion on sustainablity, a “souvenir” of a paper by Bella Millet to be printed out for reasons of preservation:


http://www.i-d-e.de/wordpress/wp-content/uploads/2010/05/bella-millett-sustainable-souvenir-eets-2010.pdf

 http://www.ria.ie , http://dho.ie/confessio , http://www.i-d-e.de


Franz Fischer (Dr des.)
Royal Irish Academy
19 Dawson Street, Dublin 2, Ireland; email: f.fischer@ria.ie, tel.: +353 1 6090605

 Dear Daniel,

 The issue of long-term preservation of digital content coming from medieval manuscripts as the source is, as far as I can tell, exactly the same as it is for any other digital data anywhere, and vast amounts of ink (real and virtual) has been used discussing the issue. Although it's a few years old now, you might start with the article "Architecture and Technologies for Trusted Digital Repositories," Jantz and Giarlo, D-Lib Magazine 2005 (http://www.dlib.org/dlib/june05/jantz/06jantz.html - not really as technical as it may sound, and includes some important definitions), then move your research on from there. Although I'm not sure that there are many real and true TDRs even now, it's a fine ideal to start with. Although Dan O'Donnell is correct that many older file formats are still able to be read far past the time we might expect it, the issue of "digital preservation" is much more extensive than simply "can I still have access to the data on this file".

 

Dot

Stuart Lee and I wrote an article that touched on this a little in Gail
Owen's book on Anglo-Saxon Manuscripts. And I had a couple of brief
columns on different aspects of the problem Heroic Age a little while back.
I'm not 100% sure I share your premises, BTW. I think most things from
1995 would still be recoverable, if you knew the right software. I
recently helped a colleague restore a whole bunch of very old
WordPerfect files using Open Office. And while I've not tried a 1995
.doc file in it, I'd be amazed if it couldn't read it.
My rule of thumb is anything for a PC or Mac is recoverable, no matter
how old, unless it is in a minor proprietary format. So most image
files, most WordPerfect and Word files, I'm guessing most Wordstar files
should be fine. I'd have my doubts about ChiWriter files, though that
might have been a pre-PC program for the SuperPet. As a rule, you're
better off in recovery with something like OpenOffice, since the stakes
surrounding compatibility are much higher for them than for proprietary
software like Word: people use Word whether or not it reads other
formats, but nobody would use OpenOffice if it didn't read Word. In face
in the case of old .doc files, Open Office was a better interpreter than
Word: when I was typesetting Caedmon's Hymn, which I did in Word from
SGML masters, I had some trouble where Word would get confused in
displaying complex tables. Opening the files in OO and then saving them
immediately as Word Files again was usually enough to solve the problem.
It is useful to read Nicholas Barker on preservation anxiety, BTW. I
think a lot of what he says about misplaced fears of obsolescence with
regard to 19th Century paper is also true of things like CD-ROMs and
file formats. You'd be amazed how much works just fine.

-dan

 

I'd have my doubts about ChiWriter files, though that might have been a pre-PC program for the SuperPet. This was a popular PC program for writing scientific, especially mathematical
texts:

       
http://www.horstmann.com/ChiWriter/

Its internal format was rather simple and easy to reverse-engineer,
you can find its short description e.g. at

   
http://mirror.ctan.org/support/chi2tex/read.me

I should have still the C source code of a converter to TeX by
Horstman

         
http://www.tug.org/TUGboat/Articles/tb12-3-4/tb33horstman.pdf

but the legal status of it is not clear for me. I've received it
indirectly without any conditions, but later its author started to
sell the program. This

    
http://www.ctan.org/tex-archive/support/chi2tex/

contains unfortunately only binaries (which you can however still run,
at least in principle, under Free-DOS in a virtual machine).

This

    
http://www.ctan.org/tex-archive/support/chi2ltx/

may also be of some use.

Best regards

JSB

dr hab. Janusz S. Bien, prof. UW -  Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bien - Warsaw University (Department of Formal Linguistics)
jsbien@uw.edu.pl, jsbien@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/

 _________________________________________________________________________


Università degli Studi di Siena - Via Banchi di Sotto 55, 53100 Siena - Italia