“How to…?": Training resources for (preserving, curating, accessing and analysing) born-digital archives and collections [fully booked]
ABSTRACT. Born-digital archives and collections span across multiple and heterogeneous digital formats and types and pose numerous challenges when it comes to their preservation, curation and access. In the future, the size and variety of born-digital cultural data can only increase the complexity of these tasks, especially for cultural heritage institutions responsible for maintaining our “future past” in the form of born-digital records. In addition, humanities researchers, especially (digital) historians, are already and will heavily be using born-digital records as primary sources for their research.
However, researchers and cultural heritage professionals working with born-digital archives and collections currently lack modular frameworks, sustainable infrastructures, communities of practices and training resources to support their endeavours. This workshop aims to bring together digital preservation specialists, cultural heritage professionals, digital historians, digital humanities practitioners, and computer engineers to discuss the availability of resources and training courses for preserving, curating, accessing, and analysing born-digital archives and collections. It will focus on integrating hands-on training with already existing historiographies to strengthen the link between traditional historical methods and working with born-digital sources.
Through its discussions, the workshop seeks to develop a series of modular “Capsule courses” on various aspects of born-digital archives and collections, combining (historiographical/research) context, datasets, hands-on tasks and exercises for submission under the special issue of the Journal of Digital History (https://journalofdigitalhistory.org/en/cfp/teaching)
Participants in the workshop will be asked to think about three components for a “capsule”: 1) datasets, 2) preliminary questions which could be analysed in a training session, 3) background articles that could help students/trainees link the questions to wider historiography within a given area. Workshop participants will discuss how to best construct these capsules, starting from loose ideas and working towards shared didactic formats.
Curating born-digital heritage in precarious times [fully booked]
ABSTRACT. This workshop takes a critical heritage perspective to understand processes of archivalisation, meaning “the conscious or unconscious choice to consider something worth archiving.” (Ketelaar 1999), and apply this to challenges related to curating born-digital materials in a polycrisis. Violent international conflicts, political polarisation, global pandemics, and climate crises impact archivalisation values in various ways. To disentangle the interdependencies within valuation processes, we expand on heritagisation, meaning the process behind how and why historical artefacts become heritage “filtered through collection, institutionalisation, commodification, and protection” (Thouki, 2022). This allows us to productively engage with questions such as: How do selection criteria come into being, and how do moments of crisis disrupt or stimulate new directions in appraisal of born-digital materials? In what ways does platformization impact memory and how should cultural heritage institutions respond to the increasing infrastructural power of Big Tech? Lastly, how do we deal with the tensions between competing values related to born-digital heritage and an ongoing ecological crisis?
To stimulate the diversity of voices, we invite participants ranging from professional archivists and technical experts to researchers and policymakers. In order to enhance interaction in the discussion, we will ask participants to share relevant materials in advance of the conference (case studies, newspaper articles, video’s, policy papers etc).
The workshop lasts 90 minutes, and consist of a brief introduction (15 minutes), and two separate sessions (each 30 minutes), concluded by a final wrap-up (15 minutes).
During the first session, the participants are divided into three small groups connected to the following themes: 1) Crisis collecting and canonization, 2.) platformization & memory, 3.) born-digital archive & ecological impact. The organizers of this workshop will moderate each group using (material) prompts, generating active discussions that will result in concrete outputs (e.g., a decision-making tree, visualizations, mindmap, collaborative whiteboards etc.).
In the second session, the three groups bring the results into a conversation and start developing collectively a manifesto on what is needed to curate born-digital heritage in precarious times.
Supporting Computational Research on Born-Digital Collections with the Archive Research Compute Hub (ARCH) [fully booked]
ABSTRACT. Target Audience: Professionals working in library/archive services that are managing born-digital collections, scholars using born-digital collections, and digital library technical staff.
Technical Requirements: ARCH is a cloud-hosted service which can run on any device.
Abstract: Every year more and more scholars are conducting research on terabytes and even petabytes of digital library and archive collections using computational methods. Supporting computational use of born-digital archives, however, poses many challenges. Few platforms exist that handle the vagaries of digital archive collections while also providing a high level of automation, seamless user experience, and support for both technical and non-technical users. In 2020, Internet Archive Research Services and a group of university-based researchers from “Archives Unleashed” collaborated to build an end-to-end platform supporting data mining of born-digital archives. The platform, Archives Research Compute Hub (ARCH), is currently being used by dozens of libraries/archives and researchers that are interested in supporting computational scholarship on born-digital collections. This workshop will be a hands-on training covering the full lifecycle of supporting computational research on born-digital collections and an in-depth tutorial on using the ARCH platform and its suite of data analysis, generation, and publishing tools.
Learning Outcomes:
● Participants will receive training on using the ARCH platform to analyze born-digital collections, from the perspective of a collection manager and of a researcher.
● Participants will gain knowledge of how to work with computational researchers, how to make collections available as data, and how to work with technical teams.
● Participants will use ARCH to generate datasets, data visualizations, publish datasets, and conduct analysis with data mining tools.
● Participants will learn about digital methods through exposure to use cases from ARCH users conducting and supporting digital humanities and computational research.
Whose bias? Demystifying human and machine interactions in Machine Learning [fully booked]
ABSTRACT. While machine learning promises powerful capabilities, it also comes with significant challenges—one of the most discussed being bias embedded in AI systems. But what is bias in this context, and where does it come from? This workshop turns the machine learning process into physical reality: through interactions between human and machine and by training your own AI models, you will foster a deeper understanding of, and have conversations about, biases at play in machine learning, and how those manifest in AI applications. No prior technical knowledge or coding skills are needed to participate in this interactive exhibition, only curiosity required!
This workshop will run two, 1-hour sessions. The first will run from 10:30-11:30 AM and the second will run from 11:30AM-12:30PM. You will be asked to select a specific session when you register.
Danish video game history - collection, preservation and access
ABSTRACT. In this workshop you are going to dive into sixty years of Danish video game history. You are going to learn how The Royal Danish Library is handling the games as born-digital material in terms of collection, preservation and access. Using R you’re going to extract trends and development in platforms, publishers and titles of Danish video games. The workshop is for absolute beginners and does not require any previous experience in R. Attendees will need to bring their own laptop.
Transmediating Immersive VR Worlds to Omeka 2D Collections
ABSTRACT. The Decameron Collective emerged from the Covid-19 crisis, when in March 2020 nine Canadian feminist artist/scholars began meeting weekly over Zoom to create and theorize. Inspired by Giovanni Boccaccio’s plague narrative The Decameron (1348-1353) the nine members of the Decameron Collective found solace, care, and community through digital storytelling. Since then, the members of the Decameron Collective made over 100 born-digital creative works that memorialise pandemic experiences, losses, and new emergent possibilities, which were then collaboratively developed, curated, and installed in two distinct digital collections: Decameron 2.0 (Web GL) and Memory Eternal (Virtual Reality). Grounded in our experience creating, curating, exhibiting, and documenting Decameron 2.0 and Memory Eternal, this roundtable discussion focuses on two main topics:
1. The challenges and opportunities of developing online, born- digital collections and curation: Together Decameron 2.0 and Memory Eternal form a record of a distinctive cultural-historical moment, a sort of diary in which the entries are poems, videos, images, music, digital objects and more—and in which the events being recorded are emotions, stages of grief as played out over three years of personal and social change and reflect creative women’s memory work. We will discuss our creativeresearch uses of born digital co-creation, our approaches to collecting our pandemic experiences and curating them in custom digital environments.
2. Documentation and work towards preservation of born digital artworks: Aware of the inevitable technological obsolescence that will impact access to Decameron 2.0 and Memory Eternal we have been cataloguing and documenting our works. We will share details on our Dublin Core based metadata application profile, theoretical issues related to our processes of cataloguing our work, our use of persistent identifiers, and our use of Omeka as a relatively stable way to share documentation of our more ephemeral born digital works.
Latin American Feminist Organizations in the Archived Web
ABSTRACT. Feminist movements in Latin America are diverse, with different initiatives to address issues such as feminicide, sexual violence, and reproductive justice. However, the archived web does not reflect this diversity because the websites of NGOs and government agencies for women are more frequently preserved on curated web archives, while grassroots organizations, when they have a website, are less likely to be archived. This gap is important because grassroots organizations, posting mostly in Spanish about local concerns, have different priorities than large NGOs and government agencies, which have been criticized for their tacit support of neoliberalism and austerity policies.
In this presentation, we will share the results of a computational analysis of the archived PDFs in Spanish found in two collections on Archive-It: the Human Rights collection by Columbia University Libraries and the Feminist Activisms in Latin America collection by Huellas Incómodas. The former has archived more websites from institutional feminist organizations, while the latter has archived more grassroots organizations. The Human Rights collection was filtered to only analyze PDF files about Latin American feminism.
The results show that, while there are some similarities in the topics of the two collections, grassroots organizations are more likely to address sexual education and political rights for women, while NGOs and government agencies are more likely to discuss migration, girlhood, and housing. Abortion was a recurring topic, but the Human Rights collection discusses it as a right, while grassroots organizations also discuss it as a medical procedure and a public health issue.
These differences not only underscore the varying concerns and needs of grassroots and large feminist organizations but also highlight the need for more inclusive digital preservation strategies to ensure that the full spectrum of feminist voices and movements in Latin America is accurately and equitably represented in the digital cultural record.
Modelling archived web data-objects as Semantic entities to support sustainable practice, effective versioning, and contextualisation: a conceptual framework
ABSTRACT. Standard digital preservation reference models such as OAIS and LOCKSS conceive preservation of born-digital artefacts either as the creation of substitute representations of original 'Information Objects' which have undergone a series of transformations (OAIS), or as the provision of multiple, identical copies of an original bitstream (i.e., 'Data Replicas' in LOCKSS). The two standard models share a realist ontological commitment towards original data-entities, the existence of which must be integrally conserved (LOCKSS) and the access to which must be preserved (OAIS). Their underpinning theoretical framework conceives data objects as ‘self-contained’ and (insofar as they are endowed with contextual information) ‘self-describing’ Information Objects.
Hypertext information objects transferred over the HTTP such as websites, however, show that ‘self-containment’ and ‘self-description’ are not inherently complementary qualities of web-based data objects. In fact, best represented as intrinsically dynamic networks that change over time, websites undermine these two theoretical key-tenets, and challenge, in a complementary fashion, the practices in web archives’ creation and preservation that are built on the standard digital preservation reference models.
This paper analyses instances of born-digital content archived in the UK Government Web Archive (UKGWA) and investigates the intrinsically dynamic and diachronic structure of web Information Objects, to identify and assess such challenges. The analysis of these challenges leads to discussing the need for a novel approach to access-provision and metadata-description of web archives based on Network Graph modelling, which, by defining and representing the position of a resource on the web, aims to convey the idiosyncratic properties of web data objects, and thus address the issues of web objects’ identity and decontextualisation that become manifest in web archives’ versioning and timelining practices.
Empowering Scholars to Study Web Archives: Search & Discovery in SolrWayback
ABSTRACT. Web archives are considered inaccessible for most digital humanities researchers partly due to a lack of easy-to-use tools and the limitations of URL-centric search engines. Currently, this is a significant obstacle for scholars wanting to research the archived web and gain insight from data at scale. This is where SolrWayback becomes relevant.
In this presentation we will showcase the strengths of SolrWayback, a powerful web application for searching and exploring data in web archives (ARC/WARC files). It has an advanced search syntax that provides full-text and metadata search, in addition to a replay service, similar to the Wayback Machine. SolrWayback offers network analysis of domains, powerful visualization tools like N-gram, search by location on a map using EXIF metadata information in images, and, not least, the possibility to export data and derivatives from search results. In short, SolrWayback is a tool that unlocks the potential of web archive collections.
SolrWayback is open source and gaining more users but active engagement with the researcher community is essential for raising awareness of the tool’s existence and potential, as well as gaining insights into the researchers’ expectations. This showcase builds on SolrWayback workshops and presentations at conferences, such as IIPC Web Archiving Conference and, most recently, the DHNB conference and will include lessons learned from those events. It will include examples ranging from working with small, private collections to large-scale, efforts such as the IIPC collaborative collections. One of the goals of this showcase is to create a thriving community around Solrwayback to involve users to further enhance a tool that makes it easy to engage with those complex but fascinating collections.
The Millennial Archive: Born Digital, Made Digital, and Digital by Necessity
ABSTRACT. This paper utilizes auto-ethnography and multimodal discourse analysis as part of a series of phases for understanding the “born-digital” collections of Millennial lifecycle documentation and experience. The researcher (born 1985) excavates their own record of digital artifacts tracing from their childhood into their career as an academic studying digital communities and online boundary maintenance. Born-digital files from as early as 1992 (some retained as paper print-outs, some with the original file still digitally accessible) are explicated as a life-lived simultaneously online and off, but always interconnected with the digital (a film photo of the author as an infant on their mother’s lap as she works on her doctoral dissertation in front of a computer, circa 1987; a scanned copy of that same photo created in the 1990s by the author’s father, a smartphone camera photograph of the print out of the scan sent in the 2010s on the day of the author’s own dissertation defense at the same university as their mother 30 years later). Materials under discussion include middle and high school class assignments, personal journaling, online public journaling, social media presences, early digital photography, smartphone photography, chat log excerpts, undergraduate papers, a dissertation, an open access university press book, and job talk presentation slides. Concepts of repair (Jackson, 2014) and restorative and reflective nostalgia (Boym 2001) serve as key frameworks for understanding relationships, potentials, and challenges, between the researcher/subject and the collection. This project is one of the first phases of a more broadly conceived research agenda interviewing and working with Millennials who have some iteration of their digital archives at-hand.
A bottom-up inquiry into the (in)vulnerabilities of personal digital heritage
ABSTRACT. Google Photo’s promises “A safe home for life’s memories”. Their corporate discourse taps into the fear of forgetting and the promise of control when it comes to our personal digital past. This attitude is also reflected in policies of cultural heritage institutions that strive to preserve personal digital heritage before it becomes obsolete. To uncover the assumptions behind cooperate and institutional preservation infrastructures, we conducted an innovative bottomup project focusing on personal digital heritage that combines long-term ethnographic research with participatory action research involving individuals, cultural heritage institutions and cooperate organizations We define personal digital heritage as personal, born-digital or digitized material that is consciously preserved because we ascribe it with emotional, monetary or other kinds of value or unconsciously preserved, through save-by-default logics of various personal technologies. This paper presents the first results that show the importance of questioning whose personal digital heritage is preserved. We did over 150 participant observations and 20 interviews in a Dutch local community centre in a significantly poor neighbourhoods that welcomes people in precarious situations. We show how creating, storing and sharing personal digital heritage is not as straightforward: What if there is no one to leave your digital legacy behind to? Or if you lack the financial or technical means to ensure safe (cloud) storage? In addition, we then compared these personal stories with the data of over 20 interviews with professionals in archival and cultural institutions and corporate organizations, like digital estate planners. Bringing together individuals’ perceptions on digital legacy, with institutionalised conceptions on personal heritage allows us to critically reflect on common conceptions of ‘the archive’ (Carbajal & Caswell, 2021). Furthermore, we hope to advice heritage institutions about more inclusive policies to safeguard the future of our digital past (Ketelaar, 2002) in times of Big Tech hegemony (Nesmith, 2023).
What do we mean when we talk about access? Born digital collections and access for disabled researchers
ABSTRACT. This paper sketches some early findings from an accessibility project based at a UK national museum. The project asks what access to digital collections (including the born digital) looks like when working with disabled, chronically ill, and neurodivergent researchers.
Access to the born digital is tricky. Research Rooms often compete with physical issues, such using a networked laptop to access born digital collections. Digital barriers to access are also rife. Born digital images, sound recordings, and pdfs at our museum, and elsewhere, are hostile to people who use assistive technology, especially when paired with use via an internal system that does not have specific software installed.
Born digital items hosted on web-servers are not much more accessible. Since 2018 publicly-funded organisations, or those with an influx of public funding, have had to adhere to Web Content Accessibility Guidelines (currently 2.1 level AA). However, heritage collections are explicitly exempt from these access requirements. In practice, this means that while a publicly funded archive has to have an accessible website, any born digital heritage content is exempt.
Recognising these issues, we started a project to tackle them in early 2024. While some of the barriers to access it has found are specific to our institution, many will be of interest to the broader sector. This paper outlines a) barriers to access the project has found, both in- person and online, and b) begins to outline some solutions and considerations to make born digital heritage more accessible. It is designed to spark conversation about what access looks like and how we provide it.
Rethinking Ethical Approaches to Acquisitions and Weeding for Born-Digital Archives
ABSTRACT. Donors of physical collections are increasingly likely to include physical media storage or digital files with their donations, requiring libraries to grapple with their ability to properly curate the materials. This paper explores the ethical challenges faced in the acquisition of born-digital archives to develop a preliminary framework that would provide archivists and librarians with a toolkit for working with donors on their born-digital collections. File sharing typically requires some technological knowledge or a cost barrier that may deter donations. To ensure a more efficient and comprehensive process, libraries should develop and fund dedicated born-digital acquisitions practices that are transparent with donors on how their materials may be used. Topics examined in the paper will include working with community patrons, copyright, managing heightened anxieties over sharing digital files, the environmental and cost impacts of digital archives, the impact of AI-driven metadata generation, and the lack of clarity among academic library foundation and property offices on how to value the donations.
ABSTRACT. In this talk, we will introduce some of the challenges faced at The National Archives when working with data at scale and how AI can help with practical challenges such as supporting OCR of documents and document selection. We will provide some practical examples and demonstrate the benefits of using an LLM to support searching our Web Archive. We will highlight key challenges of working with the metadata required to support provenance and explainability. Finally, we will discuss some of the environmental challenges and engineering challenges we are likely to face conducting this work.
Preserving Digital Humanities Prototype Projects Using the Knowledge Commons
ABSTRACT. Digital humanities (DH) projects have long suffered from limited funding, often leading to prototype applications being orphaned. This frequently happens when teams are redirected from unfunded projects to those with funding or to new project proposals. Even funded projects face challenges, as existing models primarily support early stages of development —planning, designing, and launching—leaving little for maintenance.
Among the core problems that arise from this are: the disappearance of valuable work and ideas such as codebase, user interface, or interaction design, limited ability for scholars to iterate on prior knowledge; and issues of discoverability and citation due to limited documentation that is often produced as project teams are redirected.
This talk will introduce an emerging workflow for archiving born-digital web applications using Knowledge Commons (KC), a scholarly communications platform with a global audience of nearly 50,000 users. KC is academy-owned and operated, free to use, and provides robust tools for open scholarly collaboration, communication, and archiving. The workflow offers a lightweight process for archiving prototype applications in an open, socially connected environment. This approach ensures that previously hidden work becomes discoverable, citable, and visible long-term, thereby preserving the contributions of the projects and their creators.
The Timely Archiving of Translation Technologies: Transformation, Challenges and Trends
ABSTRACT. This talk aims to present the launch of a project related to the archiving of a profession-specific technology, namely translation tools. The goal of this project, in addition to the timely preserving of an accessible yet slowly disappearing generation of both obsolete machines and early versions of more recent software, is to show that this profession often seen as antagonistic towards new tools has always been tech-savvy, but also to highlight transversal trends in their evolution, across decades, types of software, and platforms. This endeavour involves multiple challenges, starting with the combination of non-digital, digitized and born-digital tools. And while this last category easily is the most common nowadays, this technology adds a new layer of complexity that is tied to the platformization of the profession and practices, to the growing proportion of subscription-based services, to the long-established presence of AI and automation practices in the field, to the arrival of multimodal software, etc.
Facilitating Access to Social Media Data in the Humanities: SOMAR’s Solutions and Collaborative Partnerships
ABSTRACT. Social media play a pivotal role in our interconnected society. Analyzing social media data provides key insights into human behavior, social dynamics, and political opinions. However, growing privacy concerns and costs have made accessing this important cultural data difficult in recent years. The Social Media Archive (SOMAR) tackles these challenges head-on. SOMAR partners with social media platforms to develop software, improve researcher access to data, and gather community feedback.
This talk will highlight SOMAR's collaborations to create innovative software for archiving, exploring, and analyzing various social media data. It will also note SOMAR's efforts to support research, including in the humanities, by facilitating the analysis of qualitative and quantitative data. SOMAR's recent accomplishments in implementing API proxies in a secure environment to provide free access to restricted data and ensure user privacy will also be mentioned. Additionally, the talk will cover ongoing improvements in application and data use experiences based on user and community feedback. SOMAR hopes to engage with organizations and researchers at the conference to discuss available resources for social media data research and explore potential collaborations or new data sources for researchers.
Creative Approaches to Publishing Born-Digital Photographs in Jstor
ABSTRACT. This illustrated paper presentation introduces a new digital image collection in Jstor entitled “The Streets are Talking: Public Forms of Creative Expression from Around the World,” which features documentary photographs by St. Lawrence University (Canton, NY, USA) faculty, staff, students, and alumni who are working and studying off campus in conjunction with a citizen journalism project called “Weaving the Streets” (https://www.weavenews.org/series/weaving- the-streets) and a sister project called “People's History Archive” (http://peopleshistoryarchive.org/). Recently curated groupings of 8-12 images in the Jstor collection have focused on street art in Germany, Jordan, Lebanon, Mexico, Spain, and the United States, though projects underway now investigate everything from public gardens to political protests.
Organizers continue to explore the digital image collection’s potential for helping participants make connections among social issues, their lived experiences, and movements across geographic and cultural boundaries. They are also exploring the use of generative AI to catalogue images, a process that can yield contextual information initially difficult for non-natives in new locales to see or understand. AI can offer fairly sophisticated and nuanced metadata and “authority lists,” such as U.S. Library of Congress subject headings, as a starting point for further analysis and interpretation of born-digital images, while organizers recognize the fraught history of such lists.
In current efforts to highlight digital image collections in Jstor, its parent organization, Ithaka, has stated it wants Jstor to be the academic “go-to” version of Google Images, in which researchers and others can access millions of high quality primary research materials through institutional and community-based resources. Publishing in Jstor allows participants the opportunity to share their images, analyses, and interpretations from the ground up on an important international scholarly platform, as the overall collection builds brick-by-brick.
To view the collection in Jstor, visit https://www.jstor.org/site/stlawu/streetart/.
Attempts for an Autonomous Archive of Instant Messaging: Popular Cultural Heritage Beyond Platformisation
ABSTRACT. As it has been pointed out by The Institute for Technology in the Public Interest, the creation of new dependencies inherent to the adoption of digital tools for life management runs the risk of undermining the ability of individuals and organizations to be responsible on their own terms for their interactions, tasks, missions, or numbers. This is of direct concern to the Humanities which, since the beginning of the century, have unwillingly delegated the management of their popular memory to Silicon Valley, with disastrous results for the studies of cultural history – including the archival loss of the spaces of creation and popular interaction that were MySpace, GeoCities or Flickr.
With these scenarios in mind, since the beginning of 2024 the project “Ex-User Experience” has been developed at the artistic research space Hangar (Barcelona), collaborating with initiatives such as “After Memory” (Karlsruhe University of Arts and Design). Within the framework of research on digital infrastructures and sensibilities, “Ex-User Experience” historically addresses the analysis of polarized and counter-communitarian interactive patterns of social networks like Instagram and messaging platforms like Whatsapp, in conjunction with their different business decisions. With a critical perspective towards the digital accumulation of lifelogging and techno-utopian archivism, “Ex-User Experience” has carried out different tasks that test the conversion of the notion of “data” into “work” in order to develop an autonomous cultural heritage of the current century, while developing the groundwork for an archive of instant messaging. Negotiating the hybrid condition of digital writings, this talk seeks to offer a theoretical- practical contribution to the thought on digital-born collections, focusing particularly on the materials that have come to replace the epistolary condition of the 19th and 20th centuries.
Web archiving after platformisation: reading archived social media along the grain
ABSTRACT. This paper draws on ethnographic and historical research at two libraries—the National Library of Australia (NLA) and the State Library of New South Wales (SLNSW)—to contribute an understanding of how the platformisation of the web has altered the content, character, and potential future utility of web archives. I argue that, in attempting to collect social media, web archiving institutions like the NLA and the SLNSW face a double bind: while web crawling undertaken at the NLA allows for immediate but often incomplete or inconsistent access, the API-based approach taken by the SLNSW enables masses of structured data to be collected yet constrains access due to shifting licence agreements that are established and enforced by platforms. By examining the constraints of current strategies to collect, preserve, and make available social media content, I illustrate how changes to platform design and policies significantly influence what is included in web archives and how they are made available. As the ruptures and inconsistencies of collections of social media in web archives are often opaque to both creators of web archives and those using them, I argue that web archives can be read “along the archival grain” for evidence of the platformisation of the web. This approach, which draws on anthropologist Ann Stoler’s critical readings on the form and placement of colonial archives (rather than just their contents), allows an assessment of how the gaps, silences, densities, and distributions of web archives are shaped by the shifting power dynamics between different actors involved in the production, circulation, distribution, and use of information on the web.
ABSTRACT. The twin scholarly booms of the 2010s, digital humanities and public humanities, introduced a relatively accessibile suite of platforms for managing digital and digitized content. Sudden interest in developing and implementing digital collections, paired with the popularity of public-facing (if not purporting to be public-serving) community projects, often framed as educational endeavours as projects began to gain momentum. Since the Internet also turns over (at minimum) every 5-7 years, huge swaths of internet (and, by extension, digital project) history is gone. For example, Quinn Dombrowski has catalogued a number of digital projects that have effectively gone extinct and more will continue to do so as time rolls on. Naturally, creators and curators want to encourage the use of their collections. The Santa Barbara statement and the many use case scenarios Collections as Data has produced offer potential routes for users, but without adequate discovery layers, we are at a loss if all these collections are created only to be never used again.
In this presentation, I will argue that investment in longer-term digital preservation becomes a question of vulnerability, using three examples of platforms that are well-loved by communities of users: Omeka (for digital collections), Github (for code and software, often used in analysis of born-digital content), and digital storytelling platforms (such as Scalar and ArcGIS’ storymaps) for multimodal dissemination. Even with the best of intentions, it is rare that web archiving techniques can adequately capture these kinds ofdigital projects easily. I will show how these sites contribute to the danger of creating what I shall call “cultural heritage wastelands”: a set of islands hosting lovingly created content which have been abandoned without champions to support their existence.
ABSTRACT. Of the museums in the UK, over 99.5% have a dedicated web page. Whether it’s a site of their own, or a listing by the local government authority, it is the most prevalent digital tool in a museum’s arsenal. Yet, while innovative sites and collections platforms by the likes of the British Museum have become widely discussed case studies (Ross and Terras 2011), we know remarkably little about how webpages and collections software is used across the wider sector.
Combining the use of computer vision on digital collection screenshots with an analysis of the HTML code, we have investigated the hosting solutions, languages, collections information, and most prominent visual attributes of over 3,400 UK museums (Schmidt et al. 2020). By creating a comprehensive dataset, we have been able to explore which museums are absent from the online cultural landscape entirely, and collections that have been rendered digital inaccessible. Aiming to provokemethodological reflection, we highlight the potential of digital infrastructure to shape our perception the resources available to researchers and the surviving historic record.
Addressing the Born-Digital Processing Backlog with Generative AI Solutions
ABSTRACT. Starting in mid-2023, ITHAKA began investing in and engaging directly with generative artificial intelligence (AI) in three broad areas: a collaborative research project led by Ithaka S+R; a generative AI research tool on the JSTOR platform; and a proof-of-concept for GenAI-enabled processing of library special collections as part of ITHAKA’s nascent infrastructure services. These technologies are so crucial to our futures that working directly with them to learn about their impact, both positive and negative, is extremely important.
This presentation will share what we’ve learned from the proof-of-concept (PoC) work we have been doing for the past ten months to address the “backlog” problem that most academic libraries are struggling with in the triage and processing of library special/distinctive/archival collections - both physical and born-digital. The findings will be contextualized with the cross-institutional learning and landscape-level research conducted by JSTOR and Ithaka S+R. By pairing data on insights and expectations from library leaders with feedback from special collections/archives professionals using the PoC, the session will share early signals about how this technology-enabled evolution is taking shape.
Addressing the Challenge of Indexing Vast Digital Multimedia Archives: The Role of AI
ABSTRACT. In an era when the term 'artificial intelligence' (AI) is frequently used as a buzzword, our conference talk aims to provide a tangible foundation for the discussion of AI, particularly in relation to audio-visual collections that are increasingly populating archives. Forward-thinking archivists are turning to innovative methods to utilise and access these collections effectively. It will provide concrete examples of AI deployment in federal, local, and corporate archives, illustrating not only the theoretical foundation but also the beneficial application of AI in enhancing user experiences.
This presentation will examine the core functionalities of current AI tools and demonstrate their significant impact on the accessibility and user engagement in archives. A live demonstration illustrates how AI-powered search capabilities greatly enhance the way users interact with born-digital archive content, enabling intuitive navigation and responsive search experiences without the need for months of laborious manual indexing. We will acknowledge that AI is not a cure-all ‒ discussing its current limitations and offering an outlook on anticipated developments.
Key Learning Outcomes:
● An understanding of the practical impact of AI in transforming digital archives into dynamic, user-centric platforms.
● Insights into the real-world applications of AI, evidenced by improved operational efficiency and enriched visitor engagement.
● Recognize the current limitations and future potential of these technologies.
This presentation goes beyond abstract discussions surrounding AI, focusing instead on its practical benefits and the lessons learned from its implementation. By sharing our experiences, we aim to provide archive professionals with the inspiration and knowledge to use AI to revolutionise their collections and enhance the archival experience. Join us to explore how AI can be more than a topic of discussion – it can be a catalyst for transformative change in the field of digital audio-visual archives.
Keynote Lecture by the Digital Curator for the Smithsonian National Museum of African American History and Culture
ABSTRACT. The American idiom “hindsight is 20/20” refers to the clarity with which we see pitfalls when reflecting on the past. That idea can be extended and applied to the scholarly understanding of the theoretical archive as having holes, gaps, and silences. Certain documentation that was actively or passively excluded from collecting now seems of obvious importance. At the point of creation and preservation, however, materials are appraised and disposed of using the beliefs of the moment.
While hindsight may be “20/20,” to quote Robert Darton “What was proverbial wisdom for our ancestors is completely opaque to us.” Darton was referring to the everyday commonly-understoods of 18th century France, but in the ever changing digital archive the transparency of our glasses becomes more opaque in shorter and shorter time-frames. This talk seeks to take a step back from the immediacy of born-digital archives, and to examine them within the larger context of how and why we’ve created archives for centuries.
Bio: Dorothy Berry is an archivist and writer. She is currently the Digital Curator of the National Museum of African American History and Culture in Washington, D.C., where her portfolio includes all of the museum’s digital interpretation and digital scholarship strategy. Her forthcoming book The House Archives Built and Other Thoughts on Black Archival Possibilities will be the first release from We Here Press, expected in autumn 2025.