Digression: the demon MARC

January 9, 2009

MARC stands for MAchine Readable Cataloguing. The cheapness of the acronym should tell you something about the quality of the standard.

Many well intentioned people (and many charlatans) try to convince us that you can learn more from failures than from successes. I disagree. There are, however, some lessons that can be learned from the dismal & crumbling institution of MARC.

Lesson one: when it was invented, MARC was excellent. 

That, however, was in the early 1960s. This is not.

Today’s theme is obsolescence. Technologies that don’t die at the appropriate time are frustrating, dangerous, and impede development of new and better technologies.

This is what a MARC record looks like: 

001 4520371

005 19990823210448.0

008 990108s1999 cou b 001 0 eng

035 $a(DLC) 99011493

906 $a7$bcbc$corignew$d1$eocip$f19$gy-gencatlg

955 $apc14 to la00 01-08-99; lj11 to subj. 01-11-99; lj07 01-11-99; lk02

01-12-99; CIP ver. lh04 to SL 08-03-99

010 $a 99011493

020 $a1563087723 (hardbound)

020 $a1563087022 (softbound)

040 $aDLC$cDLC$dDLC

043 $an-us—

050 00$aZ675.S3$bW8735 1999

082 00$a025.1/978$221

100 1 $aWoolls, Blanche.

245 14$aThe school library media manager /$cBlanche Woolls.

250 $a2nd ed.

260 $aEnglewood, CO :$bLibraries Unlimited,$c1999.

300 $axiv, 340 p. ;$c26 cm.

490 1 $aLibrary and information science text series

504 $aIncludes bibliographical references and index.

650 0$aSchool libraries$zUnited States$xAdministration.

650 0$aMedia programs (Education)$zUnited States$xAdministration.

830 0$aLibrary science text series.

985 $eGAP

991 $bc-GenColl$hZ675.S3$iW8735 1999$oam$tCopy 1$wBOOKS

taken from http://www.lili.org/forlibs/ce/able/course8/03whatmarc.htm

I would strongly recommend not trying to understand any of that. The thing to notice is that it’s ugly. Ugliness matters in metadata design, as in all design – as a general rule, ugly things don’t work properly. (Elegance is no guarantee of functionality, but it’s a damn good start.)

It’s not MARC’s fault. In fact, MARC was amazing for it’s time. It was the first ever effort at capturing reusable cataloguing information. (Well done libraries – another first!) But not even a librarian-genius could have forseen the type of acrobatics we ask of our information today, and so, MARC is barely capable of sitting down and standing up again. 

They call it ‘the curse of the innovator’. Whoever innovates first is, inevitably, saddled with the oldest & clunkiest system in the long term.

One big problem is that when MARC was developed, disk space was at a real premium – that’s why we have ’245′ as a field and not ‘Title Statement’ – ’245′ is shorter and easier to store. Similarly, 245$a is ‘Title’. This contraction has a nasty knock-on for human readability, which is irreplacable in metadata management.

The 245 subfields alone carry a lot – to get a feel for it, have a look at the full outline:


First IndicatorTitle added entry0 – No added entry 1 – Added entry  Second IndicatorNonfiling characters0 – No nonfiling characters 1-9 – Number of nonfiling characters 

Subfield Codes
$a – Title (NR) $b – Remainder of title (NR) $c – Statement of responsibility, etc. (NR) $f – Inclusive dates (NR) $g – Bulk dates (NR) $h – Medium (NR)  $k – Form (R) $n – Number of part/section of a work (R) $p – Name of part/section of a work (R) $s – Version (NR) $6 – Linkage (NR) $8 – Field link and sequence number (R) 


For me, the real problem with MARC can be summed up by looking at $c. ‘Statement of Responsibility’, it says. Not author. ‘Statement of Responsibility’. Author? Editor? Contributer? Complier? Composer? Who knows? But with all the detail that is contained in the record, you can’t help but assume that all the useful information must be there somewhere. 

I spent a long time with MARC, and I can’t remember if it’s there. Leading us to a general rule of data: If it is there and you can’t find it, it might as well not be there at all. 

Herein the danger of bad cataloguing: systems that look specific, but actually aren’t. This is not. The less specific, the less useful, and the more prone to misinterpretation – by humans and machines alike.

(Very brave people are invited to delve into the mysteries of control field 008 - http://www.loc.gov/marc/bibliographic/bd008.html - but don’t say I didn’t warn you).

So – there’s MARC. Clunky, unfriendly to human readers, outdated, inflexible, insufficiently specific. Also: Old. So why bother complaining about it?

Because it’s still out there, that’s why. That’s a problem. Outdated combined with obsolete throws up all sorts of awfulness, on which more later.

