Pádraic Moran, ‘Designing sustainable Digital Editions’, pmoran.ie, 10 June 2025, http://www.pmoran.ie/posts/sustainable-digital-editions/

This is a lightly revised version of a talk I gave at the Digital Epigraphy Workshop in Maynooth on 26 May 2025. Many thanks to Nora White for the invitation and to the audience for discussion and feedback.

I have been building digital editions for some time now. In fact, I was somewhat surprised, recently, to realise that the first digital edition I built was in 2005, twenty years ago this year!

I’m still building digital editions—in fact I’m working on two at the moment. So it seems reasonable to wonder what these editions will look like in twenty years’ time—in the year 2045! That number may sound like the stuff of science fiction, but presumably so did the year 2025 when I started out early in the new millennium.

Back in 2005, I remember well that the long-term sustainability of digital resources was a matter of broad concern. And now twenty years on, I feel that many of the same fundamental issues remain.

So, for this post, I want to reflect on the theme of sustainability for digital editions. Sustainability is of course a broad-ranging term, so I will focus mainly on two aspects:

Technical sustainability—maintaining access to resources. Put most simply, how can we ensure that our work will continue to exist in the future? If it exists, will people be able to use it? Will they be able to derive any benefit from using it?
Scholarly sustainability—maintaining quality of resources. How do we maintain the high standards in digital media as we would expect in traditional publishing? How can we manage time and energy on publications that are never really final? And how can we make sure that researchers—particularly, but not exclusively, early-career researchers—receive adequate recognition for all of the time and energy they invest in digital work?

1. My background and perspective

1.1 Disciplinary perspective

The reflections I will offer will be based on my own experience of working in the intersections of Celtic and Classical philology.

It’s important to clarify that the focus of my work is philology—the close study of texts, including their linguistic, literary and cultural contexts, and all aspects concerning their historical transmission. So the goal of my work is to produce some new knowledge about cultural history.

I am interested in digital editions as means to an end, and not as objects of study in themselves. I emphasise this because I suspect that scholars who consider their primary discipline to be Digital Humanities may approach some of the issues below differently.

1.2 Experience in building digital editions

The reflections that follow will draw on my own work on digital resources to date and on how I see them developing in the future.

These projects include:

Early Irish Glossaries Database (EIGD): first launched 2005, now in version 3.3. A collection of diplomatic transcriptions of texts in Old and Middle Irish with citations from Latin, Greek and other languages. The resource includes integration with manuscript images (IIIF) and search tools, as well as a feature to generate text concordances according to user requirements.
St Gall Priscian Glosses: launched in 2010, now in version 2.1. Diplomatic transcriptions (by Rijcklof Hofman) of glosses in Latin and Old Irish, presented alongside a critical edition of the glossed text, incorporating links to manuscript images and other resources, as well as search tools. In 2018 a version 2.0 incorporated Bernhard Bauer’s morphological data for the Old Irish glosses.
Gloss Corpus: in testing version since 2019, version 1.0 launched in 2024. A platform for Open Access publication of glosses, allowing comparison of diplomatic transcriptions of parallel glosses in many manuscripts. Additional tools for analytics are currently in development as part of the GLOSSAM project.
Armarium Digital Edition: launched in 2025. A platform for Open Access critical editions, intended especially for shorter texts.
Manuscripts with Irish Associations (MIrA). Not a digital edition as such, but a catalogue of metadata on manuscripts, incorporating IIIF images and data visualisations, including network graphs. Launched in 2021 and still in beta development.

1.3 Why editions are precious

Editions take an enormous amount of work—probably no one fully realises this until you undertake one yourself!

Editions are especially precious in Celtic Studies, where the number of working researchers is very low compared to other disciplines, and where the linguistic complexity of textual traditions is arguably higher than say for editions of Classical languages, which at least remained reasonably consistent over the centuries, unlike vernacular languages.

Editions in Celtic Studies are all the more precious because of the rarity with which they appear: in my own case, I published an edition of De origine scoticae linguae (or O’Mulconry’s Glossary) in 2019. There was previously one edition only, published by Whitley Stokes in 1900. On that basis, the next major edition of this text will be due to appear around the year 2138.

Now, WorldCat says that my print edition is available in 83 libraries. Since some of these libraries are already centuries old, I can be reasonably confident that most of them will still have it in a century’s time. What if I had published this as a digital edition only? What confidence would I have that a digital version will still survive in 100 years’ time, or even in twenty years’ time?

2. How vulnerable are digital resources?

All media degrade over time, including electronic media. The chips in solid-state drives—for example memory sticks and modern hard drives—will gradually lose their electrical charge. Data on a USB drive might be typically expected to last 5–10 years before data degradation (‘bit rot’) sets in and you start to lose data. Hard drives are not much better (c. 10–15 years), and magnetic tape might last 15–30 years.

Various factors influence in the rate of degradation, including the quality of the product, frequency of use (write cycles), and storage conditions (temperature, humidity, electromagnetic interference).

There are solutions to this, but they are time- and resource-intensive. You can improve the longevity by storing media in ideal conditions, by re-writing the media periodically, and by keeping multiple copies that can be compared in order to correct random data loss.

If you are really committed to preserving your data, you can deposit it in the Arctic World Archive, a facility in a abandoned coal mine in Svalbard. Data is transformed on to polyester film, in a visual format comparable to QR codes, and stored in a steel vault deep underground. The film is expected to last for 500 to 1,000 years—so still not as good as parchment and ink. (Pricing currently starts at €19 per month.)

It’s clear that most of us, as individuals, will probably not have the capacity to curate our personal digital archives throughout our own lifetimes. In fact, it’s very dubious whether most institutions would be able to either.

3. Longevity for web applications

3.1 Problems

The format of most digital editions presents further problems. Editions are generally presented as web applications, in other words, software running on a public-facing web server. And software, by its nature, is not designed for long-term use. Anyone who has a computer knows that its operating software is continually updating itself.

The most important reason for the continual updating of software is security. The Early Irish Glossaries Database, hosted by the University of Cambridge, was regularly threatened due to security issues.

In 2018 the web server software was updated, and all of the code on the website needed to be reviewed and rewritten to avoid it being shut down.
Several times, security reviewers, both professional and amateur, pointed out potential vulnerabilities on the site, which also required re-coding to prevent shutdown.
And in 2020 the site was hit with a denial-of-service attack, where the most resource-intensive pages (text concordances) were hit with around 1,000 requests per minute, incapacitating the website and the other sites hosted on the same server. This also required a coding solution.

Fortunately, I was able to give my time pro bono in order to ensure the continuation of the site. But otherwise, the resource would almost certainly have died.

3.2 Mitigation

How can we extend the shelf life of web applications? What is a realistic Best Before End date? Here are three strategies for mitigation.

1) Use Open Source software wherever possible. For example, University of Galway web applications are now built using a free and Open Source framework named Laravel. This offers managed security, including regular security updates into the future. It also imposes best practices with regard to writing software, promoting easier maintenance in future.

Nonetheless, using Laravel requires active maintenance, and its long-term stability is uncertain. Its current Long Term Support (LTS) commitment amounts to two years! And according a list of LTS software on Wikipedia, this is somewhat an industry standard: a commitment to maintaining updates for ten years on Windows and Linux seems like the outer limit.

2) Archive web application code, e.g. on GitHub. This will at least allow users (with the necessary technical knowledge) to recreate resources even if they go offline.

3) Minimal computing is a general philosophy which emphasises minimalism in as many areas of computing as possible. For example, minimal design keep things as simple as possible for the user, avoiding fussy distractions and cognitive overload, and ensuring faster loading time. Keeping code as simple as possible promotes minimal maintenance and minimal obsolescence. In an era when vast amounts of energy are used for generating AI models and storing data (by 2026, it is estimated that c. 32% of Ireland’s electricity production will be consumed by data centres), minimal computing will promote minimal energy use. And, not least, minimal computing aims to promote maximum justice: providing easier access to digital resources for users who do not have the luxury of high-speed connections and fast processors.

This is very much the philosophy that inspired Armarium. I initially decided on a fully monochrome design (it works great for books!), before conceding to occasional touches of colour. The site has absolutely minimal functionality—at the moment not even a search function—and is aimed solely at facilitating reading. The server-side functionality is also minimal: most of the site is pre-generated and served to the user as static pages.

The minimalist approach seems to be in opposition with some theorists of digital editions who call for more functionality, not less. For example (Sahle 2016, my italics):

‘Amplification and change of functionalities is one of the most obvious aspects in comparing traditional to digital editions. The book is a perfect device for the passive consumption of a limited amount of one-dimensional static information. Digital media, with its complex, multimedia, networked content, is in principle interactive and adaptive. It asks for more sophisticated browse and search functions to access all the material and information of an edition. A printed edition can be read. A digital edition is more like a workplace or a laboratory where the user is invited to work with the texts and documents more actively.’

There is probably a place for highly interactive editions with sophisticated functionality. But there is also a pressing need for access to high-quality information and complex ideas that can be digested slowly, ideally in a medium with minimal fuss. Slow, quiet consumption is not necessarily passive consumption. Dynamic information is not necessarily high-quality information. Which approach will stand the test of time?

4. Keeping data alive

It’s probably obvious that a key strategy in the long-term sustainability of digital resources is the separation of data from applications.

4.1 Archiving data

Raw data in formats such as plain text will probably outlive all of us. The ASCII plain text format has been around since 1963. Unicode has been around since the early 1990s and is so universal now as to be equally trustworthy.

Where should you store your data?

1) The option to download from web applications is fine, but not an ideal solution, seeing that the web application itself will probably have relatively a short shelf-life.

2) GitHub is an industry standard for software archiving, in particular for version control and collaboration. You can use it to archive your web application and your data too, which is very convenient. And GitHub now stores its full archive in the Arctic World Archive!

However, GitHub is not ideal in other respects:

It was not designed as a data archive. So, for example, it captures very little metadata and does not assign DOIs.
It is a private company (a subsidiary of Microsoft), so may well change its policies in the future.

3) Zenodo is a publicly funded, long-term data archive, administered by the CERN particle physics laboratory in Switzerland. It is highly trusted and offers features such as assigning DOIs and data-versioning.

4) National data repositories, such as the D igital Repository of Ireland (DRI). The DRI was founded in 2011 as a ‘national infrastructure that provides long-term preservation and access to Ireland’s humanities, cultural heritage, and social sciences digital data’ (source). It follows several international best practices for long-term archiving, including the Open Archival Information System, an ISO standard for presentation of digital information. If any one organisation in Ireland is likely to guarantee long-term access to humanities data, it is surely the DRI.

Depositing data to the DRI is not so straightforward as with Zenodo, as it is oriented towards institutional relationships rather than individual initiatives.

4.2 Using data

So assuming only the data will survive, who is going to use it and how?

The standard encoding for digital editions is XML following the specific guidelines of the TEI. -Simple mark-up may be reasonably readable, but very rich mark-up makes the core content very cluttered and difficult to read. In my opinion, XML is not particularly accessible without using some technical knowledge to improve its presentation.

Of course, people who know how to work with data will not be daunted in the least by raw XML. Having started to work with Data Scientists over the past few years, I’ve been very struck by their general approach. Since they’re trained to work with data, they have very little interest in the web applications that most of us rely on. Instead they prefer to start with the data itself. In the long-term, Data Scientists may play a pivotal role in helping humanities scholars to unlock the richness of complex datasets.

I also think, personally, that AI could completely replace the need for project-specific web applications. Already, commercial AI tools such as ChatGPT have an excellent understanding of the semantic mark-up found in TEI XML. So we should be able to use AI to take the raw data of digital editions and present it to us according to our reading preferences (e.g. with minimum, maximum or selected apparatus; with images in parallel; in tables of texts concordances). We could also use AI to analyse texts for us and produce results in an accurate and structured way.

This would put data—including the editor’s intellectual contribution—at the very centre of our digital resources, and remove dependence on presentation tools. It would allow for complete personalisation of interfaces to suit individual preferences and research goals. And it would ensure that the raw data from our project, which we hope will survive, will remain relevant to future generations.

(As pointed out in the discussion after the lecture, AI tools for academic purposes need to be developed responsibly. We need to be accountable for energy consumption, copyright and intellectual property, the biases of training data, and quality and consistency. However, I believe that responsible exploration of AI for mediating digital editions is possible.)

5. Scholarly sustainability

I want to turn now to consider another dimension of sustainability, focusing not on the artefact but on the researcher producing it.

Given the pressures on academics—the pressure on early-career researchers to publish, the pressure on established academics to find any research time at all—how much time is worth putting into building digital editions? Does creating digital editions enhance the career prospects of early career researchers? More generally, does this work produce good value for money for the tax-payers who fund public research?

5.1 Albatross projects

One of the difficulties is that digital editions are, in a way, never finished. The same is probably true for all editions, and indeed all academic work. The difference, of course, is that once a work is printed on paper, it is most definitely too late to add another paragraph or to correct typos and other errors.

Editors of digital editions will always feel the urge to tweak and improve existing work, since usually their updates will take effect more or less immediately. And apart from editorial improvements, technical maintenance, as discussed above, is an ongoing task.

This is a project-management issue. In research projects, we often over-promise results and under-estimate the complexity of tasks. For digital projects, as end dates start to loom and progress reports come due, it is tempting to release resources that are incomplete and/or untested—‘beta versions’ in software development parlance. New projects can then easily end up in ‘perpetual beta’ or ‘beta purgatory’.

Here again, I speak from experience: the Early Irish Glossaries Database has still not fully delivered on its goals and the website was therefore in beta version from 2009 to 2020—a full 11 years—when I finally accepted that the resource that was there was the actual resource and removed the beta designation.

Beta purgatory can quickly become a cumulative problem: as we take on more projects (especially early-career researchers), we grow a longer list of jobs that will eventually need to be finished—and maintained. Over time it becomes increasingly harder to get other things done.

There are two solutions to these problems:

The first is version control. Planning for timed releases—for example, with major versions 1, 2, 3 for significant advances (e.g. content expansions, major new functionality) and minor versions .1, .2, .3 for minor enhancements and fixes—not only helps to manage researcher time and focus, but also makes it fully transparent to readers what version of the resource they are reading—and hopefully citing. (And if you use an Open Access version control system such as GitHub, there should—in theory at least—be traceability for earlier versions.)
More fundamentally, the solution is better project management. Projects goals that turn out to be untenable for the time and resources available—which, let’s face it, nearly always happens—should be adjusted in order to be more realistic. Let the great not be the enemy of the good. Better to have good quality resources that were slightly scaled back, than to have very ambitious resources that never fully deliver.

5.2 Recognition for work

Another important issue for all researchers—and again early-career researchers above all—is recognition.

There are now some simple best practices to follow:

All resources should be citable with a permanent locator reference, ideally a DOI.
Readers should be able to cite a digital resource easily. If we do not provide a sample citation text for our digital resources, how can we complain if nobody cites them? So, in the case of Armarium, I have provided the citation text as prominently as possible, at the very top of the page, under the header, including the DOI (the web address is then not necessary).

A bigger issue is peer-review. This is another problem that has been around for at least twenty years and is apparently still not resolved. None of my editions, so far, have been peer-reviewed or (to my knowledge) reviewed after publication.

How would you even begin to review a digital edition? There are various guidelines available. For example, Patrick Sahle, with collaborators, published ‘Criteria for Reviewing Scholarly Digital Editions, version 1.1’ (June 2014). The main recommendations are grouped into three sections (2–4).

‘2. Subject and content of the edition’ (118 words) contains three sub-headings: Selection; Previous and project’s achievements; Content.
‘3. Aims and methods’ (335 words) contains seven sub-headings, including: Documentation; Scholarly objectives; Method, Text criticism, indexing and commentary.
‘4. Publication and presentation’ (772 words) contains sixteen sub-headings, including: Technical infrastructure, Interface and usability; Browse; Search; Technical interfaces; Social integration; etc.

While the criteria do indeed include consideration of the quality of the edited text, a much larger part focuses on aspects of the web application.

To my mind, this would be like reviewing a print edition and commenting mostly on the quality of the paper, the generosity of the margins, the readability of the typefaces, and the clarity of the contents page and indexes. Any of these topics might be worth a mention if they seem to be problematic or limit access to the intellectual content in any way, but ordinarily would not feature in a review at all. Indeed, in the case of peer-reviews before publication, reviewers often evaluate awkwardly formatted Word documents before they are typeset and professionally presented.

In the same vein, I would recommend that peer-reviews of digital editions give 90% of their focus to the quality of scholarly contribution. In fact (pushing this further), I think a review might be entirely medium agnostic. In other words, it might be acceptable for a reviewer to review a paper version of an edition and not even see the digital presentation.

Applications and interfaces could be reviewed separately by Digital Humanities specialists, if appropriate—that is, if the interface aimed to be original or ground-breaking in any way. (This would surely be very occasional.)

I am currently working on peer-review guidelines for Armarium and Gloss Corpus.

5.3 Potential for impact

I turn now to the last consideration in relation to scholarly sustainability: the value of digital editions to the broader society that funded them.

In general, research resources woulds have more social value if they are more widely available. Digital resources have the potential to be far more widely accessible than print. They also have the potential to be much less accessible, if they’re over-complicated, badly designed and/or buggy.

The FAIR principles (findability, accessibility, interoperability, and re-usability), drafted in 2016, now provide a very well-articulated set of guidelines for ensuring that data is accessible and usable by others. In relation to public value, I will offer thoughts on one aspect of FAIR in particular: re-usability.

When we publish our research in print, we hope that somebody will eventually cite it. In that situation, we expect to receive direct acknowledgement for our work. In an ideal scenario, however, our research might also make some impression on the public consciousness. Perhaps our ideas might be discussed in popular media and eventually become commonplace. The ultimate success of any idea might be if everybody and nobody owns it.

By making our resources available in formats suitable for Linked Data, especially Linked Open Data, our work has the potential to contribute to much bigger collections of cultural data. At present, the Databases of Early Latin Manuscripts (DELM) network is exploring how to link manuscript data in this way.

One such large collection of data is WikiData, a free and open knowledge base, and a source of information for projects including Wikipedia. By transferring data gathered from projects like MIrA into WikiData, we increase the likelihood that important reference works such as Wikipedia will contain high-quality data, gathered and critically evaluated by professional researchers. (And the same data will be available in any language: compare e.g. entries for Sanas Cormaic in WikiData and Italian Wikipedia.)

By sharing data, we may well choose to sacrifice control as well as acknowledgement. But there is also potential, in the process, to make a greater contribution to public understanding and discourse.

6. Conclusion

I have reflected here on aspects of sustainability for digital editions that have stood out to me, particularly drawing on my experience as researcher and creator of digital editions.

In academia today we often work towards short-term goals. This is conditioned by the research-funding landscape and especially the precarious nature of postdoc research. This environment is not conducive to reflecting on the sustainability of digital editions, including their long-term survival, the value of the scholarship, and the benefits to researchers and to the public.

But the survival of our intellectual work is, in fact, very precarious and the interfaces that we build for them are extremely so.

In the development of my own thoughts, I have come to focus on these ideas:

Plan for the long-term.
Prioritise data.
Focus on core research goals.
Simplicity.

Energy—both human and mechanical energy in all of their forms—is a very finite resource, and we need to use it well.