- Researchers contribute to the ever-expanding network of human knowledge.
- Curators of human knowledge must preserve that knowledge for future generations.
- Today that responsibility falls upon the digital preservation community.
- CLOCKSS is a digital archive for academic publishers and research libraries.
- Curators of human knowledge in a digital era are in a state of perpetual motion.
It’s fair to say most researchers don’t do what they do for the money. Instead, what drives them is their passion for contributing to the scholarly record. This ‘record’ is not a simple sequential chronicle of academic information. Instead, it is an ever-expanding, highly interconnected network of all human knowledge racing outwards in myriad directions. Scholars build upon the work of those who published before so that those who follow them may do the same. Those tasked with curating such knowledge can never rest because the responsibility of preserving scholarly knowledge is immense, and since becoming progressively digital, it has become increasingly mercurial, and the demands on their skills more intense.
For hundreds of years until the last part of the previous century, the record of scholarly output lay in printed publications – journals and books; the network of knowledge existed through the citations and referencing listed in them. Access to these publications was largely restricted, controlled by publishers owning the rights and academic libraries holding the books and journals. Librarians played vital roles in archiving and curating the knowledge they contained. Much of this changed as scholarly output became more digital. Not only did barriers to publication fall away, accelerating content generation, but the forms of that content changed and multiplied. Barriers to accessing the content fell away too. Critically, digital preservation technology emerged and then changed, rapidly evolving as one format replaced the other. It is still changing. This has fostered the development of specialist collaborative organisations such as CLOCKSS – a digital archive for academic publishers and research libraries. Their work is anything but simple, and they are in a continual race against obsolescence as they safeguard knowledge for those yet to be born using technology that will then be outdated.
The ‘knowledge graph’
Dr Alicia Wise is the Executive Director of CLOCKSS and uses the analogy of perpetual motion in describing the work of the digital preservation community. As Wise points out, ensuring organised access for current scholastic enquiry is not the core mission for archives entrusted with curating human knowledge; ensuring its long-term preservation is. That was relatively simple when it meant filing away printed publications. Today, digital scholarly knowledge exists in multiple forms and formats – audio, visual, code, data, software, workflows, and methodologies. Each must be catalogued and cross-referenced, and authorship and other rights established, and everything must be correctly allocated and updated. The necessary information could be a line of code buried in metadata.
For Wise and her colleagues within the digital preservation community, permanence is a promise always kept yet never realised.
For organisations such as CLOCKSS, digital preservation is more than an electronic snapshot of scholarly output. It’s about constructing, maintaining, and preserving all the connections between that output – something Wise calls the ‘knowledge graph’. But this graph is not lineal; it is continually mutating, built upon with inputs from an ever-changing stakeholder ecosphere, all using technology that is a mere blink in the progress of time. Not so long ago, libraries would have been on the Rolodex of laser disc manufacturers. Today, data sits as binary code in silicon chips; tomorrow, it could be as femtosecond laser pulses in glass or coiled within human DNA.
The responsibility of maintaining this morphing knowledge graph falls on to the digital preservation community, and as more and more of that graph finds its way onto the web, the decentralised nature of the knowledge graph presents the community with several challenges. One is respecting the multiple jurisdictions wherever the web touches concerning copyright, privacy, and information-security regulations. Every output form also requires internationally agreed validation and recognised registries to determine what is where and whether it’s preserved. This helps identify gaps in human knowledge and reduces the risk of unnecessary repetition.
Every preservation service therefore needs to know its method is temporary, but its purpose is eternal.
Another challenge is versioning – the once relatively simple process of maintaining different versions or iterations of scholarly output over time. Performing this for digital scholarly output is a critical function of publishers and organisations like CLOCKSS. Publishers need to track this rapid evolution of output to ensure the reproducibility, transparency, and sometimes retraction of research results. And while academia may be their primary audience, a wider audience awaits, fuelled by demands for open access; citations are no longer the sole metric of a study’s ‘impact’. Scholarly output must therefore be discoverable beyond the traditional search tools of libraries and academic publishers. And who picks up the baton when a publisher is no longer able or willing to look after the scholarship? This is where effective digital scholarly archiving is critical. Unlike a tangible publication sitting in plain sight on a shelf, a digital record is hidden away unless someone draws a path to its door and lays a map for further exploration on the welcome mat. That means caring for digital scholarship over time is a relay race, with the baton handed from publisher to archive.
The everlasting need for knowledge
For Wise and her colleagues within the digital preservation community, permanence is a promise never realised. They are well aware that when preserving material for decades, let alone centuries, it is likely to outlast the services preserving it at any time. Few technologies they touch will survive, but the need for human knowledge is everlasting. Every preservation service needs to remember its method is temporary, but its purpose is eternal. Cross-functional collaboration is, therefore, critical for the digital preservation community. They must reach out to interested parties beyond archival science towards communication, IT and core systems, copyrighting and licensing, metadata, publishing, and software design.
Changes in the technologies, formats, standards, content-versions, ownership, and business practices in the digital preservation of the scholarly record mean it can never rest. It is like a relay race with no end, and the baton gets heavier with every step. At no point do you cross the line, and dropping the baton is not an option; passing it on to the next partnership for the next generation is the sole point. Today’s scholars trust their output will find its rightful place in the ever-expanding network of human knowledge. In the words of Wise, if preservation is your calling, anticipate perpetual motion.
What is the biggest challenge in the digital preservation of scholarly output?
Engaging, inspiring, and motivating libraries, publishers, authors, and others in the scholarly community to actively support digital preservation. It never seems compelling and then suddenly it’s too late.
What is the biggest reward?
Helping a researcher or publisher re-gain access to precious material they believed lost forever.
The technology may have changed, but what fundamentals in preserving scholarly output remain?
It is important to have an interest in the past and a commitment to the future. It’s not just about the content but also always about the people who create and use that content and the technology they use to help them do so.
If your mission is one of perpetual motion, what constitutes success?
Cheerfulness and tenacity!