RESAW 2025: THE DATAFIED WEB
PROGRAM FOR THURSDAY, JUNE 5TH
Days:
previous day
next day
all days

View: session overviewtalk overview

10:45-11:00Coffee Break
11:00-12:30 Session 5A: Platforms
11:00
Metrics on the Inside: How Platform Employees Understand Platform Health

ABSTRACT. Where have employees at social media platforms looked to get a ‘status update’ about the health of the platforms they work for? This paper draws on interviews conducted with 53 former employees of platforms that have shuttered over time, including GeoCities, Friendster, MySpace, and Vine, to understand the ways that employees used analytics, broadly construed, to make sense of the success and decline of a platform. Originating in a broader project about platform closure, this presentation adds to RESAW 2025’s focus on histories of the Datafied Web by describing the ways that platform employees used both traditional notions of quantitative analytics to understand their platform’s success and decline, as well as less-discussed, qualitative markers of health. This paper thus complicates an understanding of analytics as purely quantitative, instead showing the ways that employees and the organizations they worked for integrated information from quantitative analytics programs with sometimes surprising metrics like press coverage, whether they needed additional computing infrastructure, visits from public figures to organization offices, and public reception of company merchandise.

From GeoCities to Vine, the most tangible evidence of decline for these organizations was through quantitative user metrics. Digital metrics (alternatively, digital analytics) included information like page views, amount of viewing time, number of comments, posts, or likes, and paths through numerous pages, measurements describing user behavior as users interact with a site (Tandoc, 2014). These metrics were in turn used to locate value and attach meaning to user behaviors on a site (Beer, 2017). The importance of digital metrics in media production cultures has been discussed most in the context of online journalism, wherein figures around audience engagement have been shown to shape the editorial process at numerous stages (Beer, 2017; Christin, 2020).

Research on metrics and social media platforms shows how analytics displayed to users–like the number of likes on Instagram or the number of retweets on Twitter–act as markers of social distinction (Paßmann & Schubert, 2020), while it is also known that user metrics influence the design of algorithms on platforms, which in turn shape user experience on those sites (Couldry & Powell, 2014). Despite these studies, there is limited empirical evidence showing how social media platform employees have made sense of user metrics within organizations. Interviews with platform employees show the significance of digital metrics for internal comprehension of a platform’s value and its competitiveness with like entities, especially when this value appears to be shifting from success to decline.

Yet an understanding of how employees digested this information into conclusions about a platform’s overall health, this paper argues, must also consider the qualitative and affective information that platform employees were also interpreting. Primary themes from interviews across included the importance of: (1) media and press, especially in magazines and newspapers with high cultural capital; (2) technical capacity, especially the counterintuitive notion that if the site kept crashing it was because it was growing at an unprecedented pace, and; (3) public reception of company merchandise, for instance, how strangers would respond to an employee wearing a company-branded t-shirt. By surfacing the varied means through which employees and organizations understand organizational health through both quantitative and qualitative analytics, a more complete story of platforms, and their place in web history, can be told.

11:20
The platformization of the follower factory: para-platforms, automation, and labor in the market for social media engagements

ABSTRACT. This paper examines the evolution of an illicit, sprawling, yet obfuscated global market for artificial social media engagements, which inflates follower counts and engagement metrics on social media profiles and posts. The organization of this market has previously been characterized using industrial metaphors such as 'click farms,' 'follower factories,' and 'digital sweatshops' primarily based in the Global South. These descriptions emphasize exploitative and often informal labor conditions. Using a mixed-methods approach that integrates ethnography with digital methods, this research delineates the platformization of the follower factory, focusing on the shift towards automation rather than manual interaction, which has facilitated the rapid expansion of this multi-sided market. As resellers scale up, this has necessitated a more complex labor organization involving marketing, customer service, and administrative work, which have shaped cottage industries across regions such as Indonesia, India, and Nigeria.

A key element of this research is its use of Internet history methods, including historical reverse IP lookups and the Internet Archive’s Wayback Machine, to trace the evolution of this market over time. By employing historical reverse IP lookups, we were able to map the growth of panel websites (websites used for reselling engagement services) and their global distribution, highlighting how the market has expanded since 2016. This process revealed the central role of platform providers such as Perfect Panel, which offers pre-built platforms for reselling social media engagements. This technical infrastructure has allowed even users with limited technical knowledge to set up and scale engagement reselling businesses, contributing to the market's rapid proliferation.

Moreover, the Wayback Machine allowed us to track the evolution of engagement services offered by platforms such as Just Another Panel (JAP) over time. By capturing historical snapshots of the services offered on JAP, we observed the diversification of engagement types and their associated pricing, revealing both the volatility of the market and the persistence of certain services. Through this historical lens, we explore how the para-platform ecosystem has developed alongside and remains reliant on corporate social media platforms.

At the core of this market lies what we term a para-platform ecosystem, which, while operating outside the immediate control of corporate social media platforms, still depends on their infrastructure for the delivery of engagement services. This ecosystem exists in a conflictual and asymmetrical yet productive relationship with social media platforms. On one hand, it disrupts the platform’s organization of users and their activities; on the other, it generates activity metrics that align with platforms’ economic models by boosting user engagement.

By examining platformization and platform ecosystems ‘from below,’ this paper challenges dominant platform theory, which typically focuses on corporate platforms. It argues that the para-platform ecosystem complicates conventional narratives by demonstrating how platforms are not only centers of economic power and governance but also spaces where informal and illicit economies thrive. By exposing this illicit backend of the datafied web, the study provides critical insights into the hidden infrastructures and practices of online engagement markets at the intersections between formal platform economies and their shadow counterparts.

11:40
Super-App Histories: Tracing Alipay, Meituan, and WeChat through App Repositories

ABSTRACT. This paper offers a historical exploration of the phenomenon of ‘super-appification’ in China, focused on a comparative analysis of Alipay, Meituan, and WeChat. While discourses around ‘super-apps’ are often accompanied by promotional narratives and hype, recent research in digital media studies has suggested the term nevertheless reflects an increasing concentration of corporate media power within the global platform and app economy - a process characterized by dual tendencies of platformization and appification leading to the emergence of integrated service ecosystems encompassing communication, financial transactions, transportation and delivery, and more (Pitre, 2022; van der Vlist, 2024). Notably, apps like WeChat have been frequently cited as ‘the poster-child’ of this trend (Chan, 2022), resonating with critical examinations of such platforms that have broadly considered their governance structures and infrastructural implications (Plantin & de Seta, 2018; de Kloet, et al. 2019), including their pervasive integration into everyday life (Harwit, 2017; Chen, et al. 2018). With a focus on these new Asian megacorps (Steinberg, et al. 2022), our contribution thus aims to make a contribution to the late history of ‘the datafied web,’ a period when platforms like Meituan and Alipay evolved from websites into app-based ecosystems, and WeChat introduced the potential for internal mini-programs. While still relying on HTTP and HTTPS protocols, their growth marks a shift to mobile-first ecosystems that pursue datafication through proprietary protocols, custom software development kits (SDKs), and closed infrastructures. The development of ‘super-apps’ like Alipay, Meituan, and WeChat, accordingly, highlights the continued entanglement of the web with platformization and the balkanization of the internet, signaling the emergence of new digital fiefdoms.

Methodologically, we contribute to platform and app historiography by expanding multi-situated app studies (Dieter, et al. 2019) to new modes of diachronic analysis. Taking inspiration from biographical studies of websites (Rogers, 2017) and platforms (​​Burgess & Baym, 2020; Helmond & van der Vlist, 2019), we consider how apps such as Alipay, Meituan, and WeChat have played an active role in ‘authoring’ their own historical trajectories through being situated within digital infrastructures (Helmond & van der Vlist, 2021). To operationalize this perspective, we leverage traces of software versioning sourced from industry data, web archives, and app repositories in conjunction with digital tools like scrapers, decompilers, and code inspectors. A key resource in our work is AndroZoo, a large-scale app repository hosted by the University of Luxembourg, which contains over 24 million Android application packages and their metadata collected from various marketplaces. While AndroZoo has mainly supported research on app descriptions, malware detection, app permissions, and GDPR compliance (Alecci, et al. 2024), its potential for interdisciplinary studies of media concentration and the phenomenon of super-appification remains underexplored. We will present several initial findings from this exploratory research, including the large expansion of device permissions by ‘super-apps’ to facilitate datafication across an increasing range of services; their deep integration with dominant smartphone manufacturers; the parallel platformization strategies taken up to expand beyond mainland China; and the patterning of infrastructural traces with corporate acquisitions. In addition to documenting these specificities of Chinese ‘super-app’ development, our inquiry reflexively considers the challenges associated with utilizing such complex archives, including their technical limitations, and the need for diverse methodological considerations to adequately ground research findings.

11:00-12:30 Session 5B: Panel: The Challenge of Archival Practices' Context for a Better Understanding of Data Web Archives at Aix-Marseille University
11:00
The Challenge of Archival Practices' Context for a Better Understanding of Data Web Archives at Aix-Marseille University

ABSTRACT. The Challenge of Archival Practices' Context for a Better Understanding of Data Web Archives at Aix-Marseille University

Web archives have become a key data source within universities—produced and reused like any other type of data. One particular aspect of this issue is that the teams working on it are diverse and multidisciplinary, and in the case of this session, all of them focus on the Mediterranean region. Chaired by Sophie Gebeil (UMR TELEMMe), this panel session addresses the theme of the conference ‘Web Archiving Data Practices and Challenges’ by discussing the differentiated practices and challenges of archived web data for Mediterranean studies. It will introduce three speakers from different professions at Aix-Marseille University (AMU), each with distinct expertise in web data. They incorporate web archives into their work in various ways, such as tracking the progress of research programs, supporting PhD students with their theses, or archiving researchers’ outputs. Christine Mussard, historian and deputy director of the IREMAM research laboratory, demonstrates how investigative practices are transformed through contact with web archives. Véronique Ginouvès, head of the MMSH archives, shares the challenges related to the preservation of online databases managed by the MMSH. Finally, JC Peyssard, head of the MMSH media library, shows how his own expert practices intersect with the support requests from novice researchers who are unaware that understanding the datafied web also requires a hermeneutic reflection. At the MMSH, web archiving practices span from considering web archives as historical material (Brügger, 2012) and as a critical method (Weber, 2020), to viewing web archives as a domain of expertise in data analysis within the humanities and social sciences, in order to address the challenges faced by a research community dealing with impeded fieldwork.The presentation aims to highlight the necessity of working together in order to, over the long term, understand, use, and about data web archives, while documenting who is involved and how they do it. It demonstrates the necessity of an inter and transdisciplinary approach that combines specialized expertise of academics and non academics to transform practices and address the challenges posed by the datafied web in the study of Mediterranean societies.

The web archive as a resource in an impeded field: from a substitute source to a major trace in the development of colonial history Christine Mussard (MCF HDR - INSPÉ - UMR IREMAM) The practice of social science research presupposes a regular relationship with the field of study, which is seen as a place where the subject matter can be impregnated and where a wide range of data can be collected. In recent years, the Covid epidemic has hampered access to these experimental spaces, forcing researchers to invent other ways of reaching them. For researchers involved in projects in the Arab world, geopolitical tensions have exacerbated these obstacles even further. The Institute of Research and Study on the Arab and Islamic Worlds (IREMAM) conducts research on the entire Middle East and North Africa region in all the social sciences and humanities. Access to land is regularly restricted or even forbidden, as conflicts erupt between Middle Eastern states. The use of web archives and, more generally, digital data has become more widespread among researchers, who have had to develop new skills to make use of them. In this presentation, I propose to show, through the prism of a research experience in the history of Algeria under French domination, the evolving approach I have taken to the web archive, initially considered as a substitute material pending access to the source in situ, then envisaged as a central piece of my documentation. This reflection therefore looks at the way in which a constrained context affects the historian's relationship with his sources, including testimonies, generating an unexpected renewal of their uses, and a revision of the way in which they are related and ranked. However, in addition to the odd photo and memory I took from these websites, I was also able to see the different ways of presenting the school memories which occupied a large part of these community sharing platforms. They revealed how the contributors were attempting to reunite classes in today’s very different French Algeria, providing insights into the social connections of the past and the form they take today. The aim of this presentation is, therefore, to investigate the different ways of using these websites which tell memory- packed stories, sorting the real from the fake in terms of the sources and understanding the practice of memory expression as a research topic.

Neglect, Stammering, Focus: Processes of an Archival Experience of Archive Collections and Audiovisual Projects at AMU Posted on the Web Over the Past 30 Year Véronique Ginouvès (IRHC CNRS, UAR3125) The presentation aims to offer both a reflective history of practices and to highlight the challenges faced by a Mediterranean sound archives and research center at Aix-Marseille University, as the world of the web and its uses unfolds. The first email written from the UMR TELEMMe, my laboratory created in 1994 at the Université de Provence, was sent from the Sound Archives Department in 1995, yet it was never archived. The first online sound archives database, was created using MySQL in 1997, it left no trace. The first relational database software that facilitated the online publication and documentation of the sound archives dates back to the early 2000s, but it was only in 2005 that we thought to archive parts of this documentation on the Wayback Machine. In 2020, the database software the Sound archives center had been using was acquired and a retroconversion of all metadata was required. We did it in EAD format and made an export in DC format from the OAIPMH repository ; we hope the metadata of the new platform will soon be archived at CINES – it is the platform of the Ministry of Higher Education and Research. The editorial projects of the "Pôle image-sons, pratiques du numérique" in which I have been involved since 1998, could now be described using terms like "new media" or "alternative narratives." These projects were online and operated with Adobe Flash. Fortunately, this time, we archived them using Conifer and the Wayback Machine before Flash's discontinuation in 2019. In recent months, we contacted INA for legal web deposit, and we are awaiting confirmation on whether the database will be included. The blog for the "Pôle images-sons pratiques du numérique" project, which captured content from the older site (the form is available on the Wayback Machine), is saved on the CINES servers by the platform Hypotheses itself. The web archiving process for projects from an archival center requires careful foresight. However, the first challenge is simply remembering to think about it. Using a data base daily fosters a form of negligence: tomorrow, I will still have access to the site, and the day after that as well. Yet, there comes a day when everything stops. As I write this summary, the Wayback Machine has been forced to suspend operations due to a cyberattack—a sobering reminder of the fragility of our web archiving efforts and the limited tools available to us.This presentation also aims to highlight the issue of web archive hosting, considering the various archiving spaces while recognizing the risks inherent in large, centralized spaces that host data and which are often seen as high-value targets for hackers.

Web Archives as a Substitute for Fieldwork: Lessons from a Decade of Research in the MENA Region Jean-Christophe Peyssard (IR CNRS, UAR3125) For nearly a decade, numerous researchers and students have turned to web sources for their research. The centrality of the web and social media in global culture, coupled with increasing difficulties in accessing field sites—or complete inaccessibility in certain areas—has significantly contributed to this trend. The impeded fieldwork is now often replaced by digital fieldwork across all disciplines of the humanities and social sciences. Archaeologists have become engrossed in digital archives, while anthropologists, political scientists, and sociologists have spread across social networks and online video platforms, collecting disordered and context-lacking data on their hard drives during their explorations. Simultaneously, research libraries are beginning to recognize the crucial issues related to the consultation, collection, and preservation of web-based corpora (Neal, 2014), and are witnessing an increasing number of users confronted with these new field materials. The field of web archives has become highly structured since the creation of the IIPC in 2003. The tools, methods (Brügger, 2018), and research collectives are now well-established and have proven their relevance. However, vernacular and improvised uses of web sources remain largely the norm within the within the community, as observed from the research library of the MMSH (AMU). After a decade of experience in training, supporting, and conducting projects using web archives focused on the Middle East and North Africa area, this presentation aims to provide an assessment and offer perspectives on the difficulties and challenges encountered. Can digital fieldwork substitute for impeded fieldwork? Under what conditions can web archives provide genuinely useful knowledge in the study of a society and its social and cultural realities? What are the necessary skills and knowledge prerequisites for an ethical and useful use of web archives in the context of impeded fieldwork?

References Brügger, Niels. 2012. “Web History and the Web as a Historical Source.” Zeithistorische Forschungen/Studies in Contemporary History, Online-Ausgabe, 9(2). https://zeithistorische-forschungen.de/2-2012/4426. https://doi.org/10.14765/zzf.dok-1588. Brügger, Niels. 2018. The Archived Web: Doing History in the Digital Age. Cambridge, Massachusetts: The MIT Press. https://search.worldcat.org/fr/search?q=bn:9780262039024. Gebeil, Sophie. 2021. Website Story: Histoire, Mémoires et Archives du Web. Bry-sur-Marne: INA. Gebeil, Sophie, and Jean-Christophe Peyssard, eds. 2023. Exploring the Archived Web during a Highly Transformative Age: Proceedings of the 5th International RESAW Conference, Marseille, June 2023, FUP. https://doi.org/10.36253/979-12-215-0413-2. Mussard, Christine. 2024. “Websites as Historical Sources? The Benefits and Limitations of Using the Websites of Former Repatriates for the History of Schooling in Colonial Algeria.” In Exploring the Archived Web during a Highly Transformative Age, edited by Sophie Gebeil and Jean-Christophe Peyssard. 10.36253/979-12-215-0413-2.27. Neal, James G. 2014. “The Integrity of Research Is at Risk: Capturing and Preserving Web Sites and Web Documents and the Implications for Resource Sharing.” Lyon, France. http://library.ifla.org/id/eprint/907. Weber, Matthew S. 2020. Web Archives: A Critical Method for the Future of Digital Research. Published by the research network WARCnet, Aarhus.

11:00-12:30 Session 5C: Technologies and Datafication
11:00
CD-ROMs versus Online in the 90s: Hybrid Paths to Datafication

ABSTRACT. This proposal aims to retrace part of the history of data-sharing, storage, and information management, by focusing on the 90s and the hybridization as well as transition from CD-ROMs to online databases. Our starting point is anchored in the debates within the library community, where librarians, as early adopters, engaged critically with these emerging technologies. Drawing on these secondary sources, as well as archives from the EU Publications Office, and insights from a dozen oral interviews, we aim to analyze the role of CD-ROMs as a “transient technology” , then the hybrid data management practices, based on the case study of the EU Publications Office, and finally the broader impact of CD-ROMs on the process of datafication.

The first part focuses on the discussion within the library community about CD-ROMs as a “transient technology”. This adoption was influenced by the need to store, manage and disseminate large amounts of data efficiently. However, as online databases also emerged, the role of CD-ROMs was increasingly questioned. Scholars like Stratton (1994) and Bevan (1994) engaged in discussions based on an original article published in 1990 by McSean and Derek and entitled “Is CD-ROM a Transient Technology?”. These discussions reflected a broader uncertainty about the longevity of CD-ROMs and the evolving landscape of data management.

The second part examines the hybridization of data management practices within a specific context, using the EU Publication Office as a detailed (and in progress) case study. This institution exemplifies the complex interplay between traditional print, CD-ROMs, and online platforms during the 1990s (Schafer, 2020). The EU Publication Office, tasked with disseminating of numerous data and notably the daily publication of the Official Journal as well as public tenders, faced the challenge of managing multiple formats simultaneously. The Office itself, as a user of these technologies, had to navigate the challenges of integrating different formats into a cohesive information management strategy. At the same time, the end-users of the Office’s data were also adapting to the new formats, illustrating the dual-user dynamic in this transitional period. The case of the EU Publication Office thus provides a concrete example of how institutions managed the shift from analog to digital data and the co-existence of print, CD-Roms based, and online information.

Finally, we will conclude with the broader role of CD-ROMs in the process of datafication. CD-ROMs were instrumental in the conversion, dissemination, and transmedia movement of data, which set the stage for the web. In this way, CD-ROMs served as a catalyst for the broader process of datafication.

References

Bevan, N. (1994). Transient Technology? The Future of CD‐ROMs in Libraries. Program 28, no. 1, 1-14. https://doi.org/10.1108/eb047155. McSean, T., and Derek L. (1990). Is CD-ROM a Transient Technology?. Library Association Record 92, no. 11, 837-841. Schafer, V. (2020). From Print to Digital, from Document to Data: Digitalisation at the Publications Office of the European Union. Open Information Science, (4), 204-217. doi:10.1515/opis-2020-0015

Stratton, B. (1994). The Transiency of CD-ROM? A Reappraisal for the 1990s. Journal of Librarianship and Information Science, vol. 26, no. 3, 157-164.

11:20
«Finally the big internet connected to tonet»: infrastructure, websites, users practices and imaginaries as the components of the «-net»

ABSTRACT. This paper explores the specific period in the history of the internet in the Russian city of Tomsk. The city-wide internet, known as «tonet», existed from 1998 to 2008/2010. Established through a peering agreement among ISPs in 1998, tonet enabled affordable, high-speed access to local websites — and slower and much more expensive access to all other websites. After the introduction of home networks in 2001 the user base expanded rapidly. In 2008, unlimited data plans were introduced, which made the previous advantages of speed and price differences obsolete. Over the next two years, unlimited plans became widespread, and tonet became history. I focus on the crucial period from 2008 to 2010 when these changes in ISP’s policies fundamentally altered user experiences. I have conducted 29 interviews on this topic in addition to 48 already done in previous years and analyzed local press publications, advertisements, and website archives to reconstruct users' reactions to the city’s internet infrastructure changes. This paper is a follow-up to my presentation at RESAW-2019 and an article by my colleagues Polina Kolozaridi and Dmitry Muravyov (Internet Histories, Volume 4, 2020/1)

This study contributes to both Internet Histories and Infrastructure Studies. I introduce the concept of «-nets» to describe networks like tonet, which sheds light on the interconnection of physical infrastructure (wires, routers) with digital infrastructure (websites, forums) and a discourse (imaginary, self-descriptions, community narratives). I am aware of Kevin Driscoll and Camille Paloques-Berges's work on «The Net» and translated it into Russian; nonetheless, I propose a more infrastructural notion of this concept. «-nets» can be a productive way of distinguishing the internet segments corresponding to those three parameters as a special type of object — having natural boundaries, self-descriptions, and, importantly, a scale smaller than a country. The vast majority of work in this area focuses on the scale of countries, and «net» may be the missing element in a more granular understanding of the scale of networks.

Furthermore, I address the gap between infrastructure design and users’ practices. Thomas Hughes's work shows that early infrastructure studies focused on the builders and overlooked users (Hughes 1983, 1986, Joerges, 1999:18). However, later research demonstrated the productivity of addressing the user. It shifted the focus from the inventors to the practices of using infrastructures, affirming the relational rather than essentialist nature of infrastructures: «there are only observed infrastructural relationships» (Slota, Bowker, 2017:531). This second conceptualization is not particularly interested in how infrastructures were built. Instead, it focuses on the impact of infrastructures on user practices and vice versa (a good example is Shah and Sandvig, 2008). In my paper, I draw on the resources of both conceptualizations to show tonet as a set of interconnected infrastructures. I will demonstrate how the material network becomes the infrastructure for urban websites and forums, which in turn become the infrastructure for user practices and generate discourse about tonet. In addition, I will outline the theoretical work that has been done to connect two conceptualizations in a consistent way.

11:40
A Hidden Track? The start of implementation and the current use of trac(k)ing methods concerning the Internet of Things basic technology Bluetooth.

ABSTRACT. Bluetooth technology now forms the cornerstone of numerous applications used by digital societies in their environments. Sensor-based interactions (coupling) between different mobile devices with integrated Bluetooth chips give rise to mobile communication networks or: co-operations between various participating human and technical actors. From its original claim as an "[u]niversal radio interface for ad hoc, wireless connectivity" (Haartsen 1998) – aimed at replacing cables and fostering wireless connections - the endeavor and possibilities for further adaptations and applications of this technology have significantly expanded over the past three decades. Examples include Internet of Things (IoT) applications through fitness trackers (AirTags) or smart watches (wearables). A variety of specialized applications have emerged from this universal technology. For the WPANs (Wireless Personal Area Networks) or piconets generated by Bluetooth devices, the simple claim "to ensure the best use of a shared medium" was pursued when they were introduced in 2000 (Braley et al. 2000: p. 26). However, due to increasingly digitalized living environments, the focus of interoperability has shifted to adaptations for purposes of the IoT. It is now centered on the "development of open consensus standards addressing wireless networking for the emerging Internet of Things (IoT), allowing these devices to communicate and interoperate with one another, mobile devices, wearables; Optical Wireless Communications (OWC), Autonomous Vehicles, etc." (IEEE 802.15 Working Group on Wireless Specialty Networks). The pairing of different devices and the sharing of information are media practices facilitated by Bluetooth that organize the environments of digital societies. Media practices associated with Bluetooth and their interaction with existing and emerging environments (Sprenger 2019; Sprenger/Engemann 2015), i.e., the practice of environing – an active modification of the environments via the involved actors (humans or technologies) (cf. Cubasch et al. 2021) –, are needed to be considered, when learning how digital infrastructures and data capturing are developing. Our environments have been translated into data since the advent of Computer Supported Cooperative Work (CSCW). And the ongoing data streams surrounding Bluetooth-enabled devices through integrated chips (i.e., sensor-based media) that bring about the computerization of physical environments are even evolving.

When Contact-Trac(k)ing methods during the Covid pandemic were introduced very quickly via Apps (e.g. Germany’s Corona-Warning-App), users tend to be cautious about their use, especially regarding to privacy concerns. Meanwhile the technology emerged to be ‘always-on’ at smart devices and the ongoing-tracking functions are seeming to be no longer an issue. But surveillance studies cannot be excluded from the discussion of Bluetooth technology, given its foundational role in tracking and tracing apps and various IoT applications often used for advertising and marketing purposes (e.g., Bluetooth Beacons) and corresponding practices such as a possible capturing of data (Agre 1994).

But since when has the possibility of Tracing or Tracking been part of the Bluetooth Technology? And how secure is the permanent 'visibility' and the long-term activation of Bluetooth devices regarding private, personal data?

Through methods such as oral history interviews and archival research (e.g., company archives), the historical reconstruction as part of the project focuses on (1) the development of Bluetooth technology and (2) the period when trac(k)ing capabilities became part of the technical specifications and the business model of the Bluetooth SIG. The project adopts a praxeological approach to basic research in media studies, focusing on digital infrastructures and levels of co-operation through Boundary Objects (Star/Griesemer 1989) in the origins of the technologies that we integrate almost seamlessly into our everday digital lives.

The contribution aims to raise critical questions about this ubiquitous technology, its use practices and the options for digital contact trac(k)ing.

References:

Agre, Philip E. (04/1994): „Surveillance and Capture: Two Models of Privacy“, in: The Information Society Bd. 10(2), 101–127.

Braley, Richard C./Gifford, Ian C./Heile, Robert F. (2000): „Wireless Personal Area Networks: An Overview of the IEEE P802.15 Working Group“, in: Mobile Computing Communications Review 4(1), DOI: 10.1145/360449.360465, 20–27.

Cubasch, Alvin J./Engelmann, Vanessa/Kassung, Christian (2021): „Theorie des Filterns. Zur Programmatik eines Experimentalsystems“, Zenodo (Preprint 04/2021), DOI: 10.5281.

Haartsen, Jaap (1998): „Bluetooth – The Universal Radio Interface for ad hoc, Wireless Connectivity“, in: Ericsson Review, The Telecommunications Technology Journal, 1998(3), 110–117.

Star, Susan L./Griesemer, James R. (1989): „Institutional Ecology,‚Translations‘ and Boundary Objects: Amateurs and Professionals in Berkeley‘s Museum of Vertebrate Zoology, 1907-39“, in: Social Studies of Science, 19(3) (1989-08-01), 387–420.

Sprenger, Florian (2019): Epistemologien des Umgebens: Zur Geschichte, Ökologie und Biopolitik künstlicher environments, 1. Aufl., Bd. 65, Bielefeld: transcript.

Sprenger, Florian/Engemann, Christoph (2015): Internet der Dinge: über smarte Objekte, intelligente Umgebungen und die technische Durchdringung der Welt. Digitale Gesellschaft, Bielefeld: transcript.

12:30-14:00Lunch Break (Mensa Food Court, across the street)
14:00-15:30 Session 6A: Gender and Intimacy
14:00
“The flames are 50/50 right now”: content moderation practices at the onset of the HIV/AIDS epidemic in the United States (1982–1990)

ABSTRACT. This paper received The Journal of Internet Histories Early Career Researcher Award for 2024.

The timeline for the onset of the HIV/AIDS epidemic in the United States occurred parallel to the domestic shift of computing and the advent of DIY computer networking efforts. During this critical time, many activists and community organisers within lesbian, gay, bisexual, transgender, and queer (LGBTQ+) spaces utilised computer networking, such as bulletin board systems (BBSs) and Usenet boards to facilitate information exchange within their affected communities. Due to the sensitive nature of the epidemic and often-vital need for up-to-date information, content moderation became an increasingly important issue on these boards. This paper uses varying archival methods to explore the development of content moderation practices, and the influence of HIV/AIDS culture, on bulletin board systems and Usenet boards, with a special focus on boards dedicated to LGBTQ+ content and HIV/AIDS information exchange.

14:20
A Marriage of Convenience: Transgender Websites within LGBT+ Hyperlink Networks, 2009-2022

ABSTRACT. The acronym LGBT+ suggests that a natural alliance exists between lesbians, gays, bisexuals, and transgenders. Indeed, relatively recently, “[m]any formerly LGB organizations began to ‘add the T’”—highlighting that these marginalized groups fight a common cause (Stone, 2009, p. 336). At the same time, however, transgenders and the other ‘letters’—gays and lesbians in particular—are “odd bedfellows” (Ros & Motmans, 2015). Transgender people “destabilize the otherwise easy division of men and women into the categories of straight and gay because they are both and/or neither” (Devor & Matte, 2004, p. 179). This has resulted in “a contradictory environment simultaneously welcoming and hostile:” “Transgender relations to gay and lesbian community formations necessarily became strategic—sometimes oppositional, sometimes aligned” (Stryker, 2008, pp. 146 and 149).

My talk sheds light on this uneasy alliance by means of hyperlink analyses. I analyzed the special LGBT+ web collection of the Dutch National Library. Perhaps nowhere else is the online LGBT landscape as rich and diverse as in the Netherlands. This collection (2009-present) contains over 200 websites of Dutch LGBT+ organizations (each harvested once annually); some catering to specific ‘letters,’ others to the entire LGBT+ community. It is unique in size and richness, but has not yet been researched.

Addressing web archiving data practices and challenges head-on, I will discuss how I analyzed the queer network that these websites formed and how this network evolved over time (2009-2022). Per year, I extracted and scrutinized all hyperlinks of these websites, for hyperlinks yield insights into “hyperlinked identities” (Szulc, 2015, p. 121). I concentrated on all thousands of links that directed to websites that catered to LGBT+ people. Gephi was then used to visualize and analyze – i.e., distant-read – the resulting queer network.

My findings will highlight that transgender websites formed a clear cluster within the network. Transgender organizations indeed were a community within a community. No other cluster was tied as strongly together—underscoring the aforementioned tension within the LGBT+ community. Capitalizing on Gephi’s functionalities, I will interpret this main finding, e.g. show which websites had a high centrality and which bridged transgender with other queer websites, and will discuss changes over time.

14:40
Flashing Intimate Things in People's Faces. Intimate Computing and the Datafied Web.

ABSTRACT. Computers have been envisioned early on as devices that would be able to aid people in their lives and augment their cognitive capacities. Sociotechnical imaginaries of human-machine interaction often refer to the intimate or speak of an intimate computer as a desirable mode of interaction. In contrast, Lauren Berlant traces the advent of an “intimate public sphere” with the rise of the Reaganite right and its mass-media rhetoric of sentimentality and a traumatized national identity: “Now everywhere in the United States intimate things flash in people's faces” (Berlant, 1997, p.1). Writing in the late 1990s, their observations do not include the development of a digital public sphere of the datafied web. What Berlant observed will be continued and complicated here, in a technically realized public space. Early on digital artists are experimenting with new forms of intimacy in the public sphere, and software companies design their products to simulate spaces of sharing. Using the once ubiquitous meta-platform for web-based applications Macromedia/Adobe Flash as a case study, my talk will explore how spaces of intimate privacy were opened up on the early internet. I will offer insights both into the aesthetics and technical structures of this development. I will first introduce the artist-duo Auriea Harvey and Michaël Samyn, known as “Entropy8Zuper!”, and explore their Flash-based cyberperformance of online-intimacy, “Wirefire”, as an experiment with new forms of relating to another. I will then trace the formalization and commodification of forms of interaction involving the sharing of digital objects in the development of the Flash Communication Server (FlashComm), culminating in the misuse of Flash’s “Local Shared Object”, the so-called Flash-Cookie, in the context of behavioral advertising. I thereby propose an alternative history of intimate computing that draws a parallel between practices of navigating technological and interpersonal vulnerabilities. "Wirefire" and Flash developers were searching, simultaneously, but from very different starting points, for ways of digital communication that could support the imaginary of a shared, intimate space. In this sense, Flash offers a valuable case study of what Ara Wilson called an "infrastructure of intimacy", tracing how "infrastructure offers a useful category for illuminating how intimate relations are shaped by, and shape, materializations of power" (Wilson, 2016, p.263). Flash offered tools to program intimacy in a simulated public sphere of sharing digital data and introduced new technical vulnerabilities into the protocols of the internet. Speaking with Michel Serres, a relation always involves a third party, a para-site. My example of Flash shows, how the navigation of vulnerabilities can be both the entry point of exploitation and extraction in what Berlant called a heteronormative metaculture, as well as the condition and foundation of trusting, resilient relationships.

14:00-15:30 Session 6B: Web archives practices
14:00
Bulk access to web-archived data using APIs

ABSTRACT. In the context of archiving large datasets, ensuring that the data is both accessible and searchable is paramount for facilitating research and discovery. In recognition of this need, we implemented Application Program Interface (API) access to Arquivo.pt in 2018. This initiative not only improved accessibility but also enabled the development of microservices that operate on our platform. As a result, nearly half of the web traffic to Arquivo.pt is now generated through API requests. Each year, a diverse array of projects spanning disciplines such as economic analysis, artistic endeavors, and computer science emerge, all leveraging the capabilities of our APIs. Currently, we offer four distinct APIs, each designed to meet the varying needs of our community. In recent years, there has been a growing demand from the research and education sectors for the ability to perform bulk downloads of web-archived data and index files. This demand arises from a range of applications, including the training of artificial intelligence models, optimizing the routing of web archive requests, and retrieving information from specific websites, such as news outlets. In response to these requests, Arquivo.pt has made all its index files publicly available in real-time, significantly facilitating the bulk download of web-archived data. This decision has opened new avenues for researchers, evidenced by a more than sixtyfold increase in our network bandwidth since the introduction of bulk download access. This enhancement not only streamlines data retrieval but has also made it possible for researchers to utilize our data in the development of Large Language Models (LLMs). One notable outcome of this effort is GlórIA, an LLM designed for processing European Portuguese, which comprises an impressive 35 billion tokens. The integration of our archived data into AI research exemplifies how access to comprehensive datasets can drive innovation and advancement in various fields. Through this paper, we aim to underscore the critical importance of providing users with broad access to archived data and to detail how our APIs and services are actively utilized within the research community. By sharing our experiences and insights, we hope to demonstrate the transformative impact of accessible data on research and development.

14:20
Navigating the Datafied Web: User requirements and literacy with web archives

ABSTRACT. Introduction

As more and more information is created and shared online, the importance of web archives has grown. Afterall, online information on the web does not last forever. Contrary to popular belief, the longevity of Web pages has an average lifespan of around 1,132 days (Agata et al., 2014, p. 464). Here, web archives, as a new form of archival material curated through a process of selecting and preserving websites (Cui et al., 2023), come to the rescue. Web archives are digital collections of web pages and other online content that have been preserved over time. They provide a snapshot of the web at a specific moment in time and can be a valuable source of information for the different users groups, especially researchers. However, navigating and using web archives can be challenging, as they are often different from other types of online resources. Web archived content may be incomplete, may not function as it did originally, and may be difficult to locate. In addition, web archives often contain a mix of primary and secondary sources, and it can be difficult to determine the reliability and credibility of the information they contain. Researchers need specific skills and knowledge to effectively use archived web content as research data. As more researchers begin working with this content, understanding their ability to navigate and utilize web archives is becoming increasingly important. This study seeks to explore both the user requirements and their literacy with web archives.

Methodology

This study uses both qualitative and quantitative methods. We held two workshops with researchers, librarians, archivists, and other users who interact with these digital resources to map out their requirements. Next, an online survey was distributed across these different user groups to validate these requirements and to assess their literacy. For assessing literacy with web archives, this survey includes 36 statements grouped into five key categories. The first category gauges users’ familiarity and understanding with the basic concepts of web archives such as distinction between original and archived web content and awareness of the diverse range of materials in web archives. A second group of statements explores users’ ability to recognize the research potential of web archives and the impact of archiving practices on the nature and availability of archived content. Another category focuses on skills to search and navigate web archives. The last category evaluates the users' ability to critically assess the limitations and biases in archived content and the curational practices that shape web archives.

Conclusion

Our findings offer guidance on improving the accessibility of web archives by highlighting the diverse requirements of users and suggesting ways to create tailored experiences based on literacy levels. The results indicate that adaptive interfaces and personalized user paths can be developed to make web archives more useful for everyone, from beginners to experienced researchers.

References

Agata, T., Miyata, Y., Ishita, E., Ikeuchi, A., & Ueda, S. (2014, September). Life span of Web pages: A survey of 10 million pages collected in 2001. In IEEE/ACM Joint Conference on Digital Libraries (pp. 463-464).

Cui, C., Pinfield, S., Cox, A., & Hopfgartner, F. (2023, March). Participatory Web Archiving: Multifaceted Challenges. In Information for a Better World: Normality, Virtuality, Physicality, Inclusivity: 18th International Conference, iConference 2023, Virtual Event, March 13–17, 2023, Proceedings, Part I (pp. 79-87). Cham: Springer Nature Switzerland.

14:40
Lessons learnt from preparing collections as data: the UK Web Archive experience

ABSTRACT. This paper proposes an examination of the UK Web Archive's Datasheets for Datasets project as a pioneering initiative that integrates the critical role of librarians and archivists in the evolving landscape of the datafied web. By adopting a method from the machine-learning field to improve the description and documentation of web archive datasets, this project not only enhances access to and understanding of digital collections but also provides a framework for addressing the broader implications of datafication in library and archival practices.

The UK Web Archive collects and preserves websites published in the UK, encompassing a broad spectrum of topics. The entire collection amounts to approximately 1.5 petabytes (PB) of data, which necessitates the use of machine learning approaches to explore the collection effectively, in addition to the detailed examination of individual websites that the UK Web Archive also facilitates. Moreover, the archive includes curated or thematic collections that cover a diverse array of subjects and events, ranging from UK General Elections, blogs, and the UEFA Women’s Euros 2022, to Live Art, the History of the Book, and the French community in London.

Since its inception, the UK Web Archive has collected websites using a number of different methods, with an evolving technological structure and under different legal regulations. The result of this means that what can be discovered and accessed is complicated and, therefore, not always easy to explain and understand. To try to ensure wider access to our collection we plan to publish the metadata we have created to describe archived websites as data.

Collections as data has become a very popular term within the GLAM sector in recent years. However, publishing collections as data is still not an easy task and there are little in the way of guidelines on how to do this as one solution does not fit all. In this presentation, we will reflect on some of the challenges of preparing UK Web Archive collection metadata for publication, how we published these collections and what additional material was required to ensure reuse of this data. These published data sets will be of use to researchers who want to use them with new datamining tools.

The UK Web Archive Datasheets for Datasets project embodies a forward-thinking approach to the challenges and opportunities datafication presents to libraries and archives. By fostering transparency, education, and ethical engagement, this initiative not only enhances the utility of web archive datasets but also exemplifies the crucial role librarians and archivists play in shaping our collective understanding and use of the datafied web. This paper aims to inspire further dialogue and exploration within the RESAW community and beyond, encouraging a thoughtful and human-centred approach to the evolving relationship between libraries, archives, and large datasets.

14:00-15:30 Session 6C: Panel: Data Loss in Archival Regimes: The Politics of Data Twinning, Preservation and Conversion
14:00
Data Loss in Archival Regimes: The Politics of Data Twinning, Preservation and Conversion

ABSTRACT. Chair: Nanna Bonde Thylstrup (University of Copenhagen)

Panel Description Web and data archives play an increasingly central role in societies both as crucial sources of 21st century history and as suppliers of datasets for machine learning systems. Yet, little is known about the decision-making processes that intentionally exclude or accidentally fail to capture data, partly due to complex layers of technicity in the data archiving process (Bingham and Byrne, 2021) and partly because of the relative novelty of the field of digital archiving (Dowling, 2019). This panel offers empirical and critical substantiation of how ‘archivers’ (Ketelaar 2023) in formalized and non-formalized data repositories and archives detect, measure, conceptualize, experience and counteract data exclusion and disappearance across processes. Methods and theories are drawn from STS, information studies and media studies and cases include explorations of web archiving processes in the German National Library, GitHub dataset preservation in an Arctic Vault in Norway, and the storage of administrative digital data at the Danish National Archives.

Research presented in this panel is part of the Data Loss project, where our analysis of data politics shifts focus from accumulation and aggregation to disappearance, destruction and dispossession. Rather than conceptualize data loss as the inverse of accumulation, our research approaches loss as an integral part of data collection, storage, and preservation. What is at stake in researching data loss, then, is not only a matter of quantitatively measuring and mapping what is lost in digital information ecologies; it is also to qualitatively understand how data loss is discursively and materially co-constituted within knowledge infrastructures.

Paper 1 Arctic Archives: Making platformed datasets cold-storage-ready

This paper argues that GitHub’s 2020 Arctic Vault project exemplifies the politics of data loss in deep time archival processes, revealing how platform centralization and control reshape the future of open-source software preservation. On February 2, 2020, GitHub curated an algorithmically selected “greatest hits” collection of 17,000 data repositories, migrating them from GitHub servers to a deep time archival format known as Piql Film. The archives were then distributed across four locations: the Bodleian Library in Oxford, the Bibliotheca Alexandrina in Egypt, Stanford Libraries in California, and the Arctic World Archive in Svalbard, Norway. By making these repositories ‘cold-storage-ready,’ GitHub enforced platform-centric rules that excluded datasets with external dependencies, thus sacrificing the diversity and interconnectedness of the open-source ecosystem in favor of centralization.

GitHub has been described as a “software intermediary,” (Bounegrou, 2023) where it operates simultaneously as a developer platform, community hub, and storage container. Mackenzie (2018) has argued that GitHub is distinct from other code sharing platforms in that it configures coding as a ‘social networked practice’, where code repositories are adorned with social media-style apparatus of following, watching, liking, and tagging. GitHub has attributed its success as a platform to these affordances, often praising its dedicated user base of developers who continuously monitor, update, and maintain the code. Leading up to the Archive Project deadline in 2020, users were informed that in order for their datasets to be included, they had to meet all of the necessary conditions: any dependencies hosted elsewhere, in other open-source repositories, had to be hosted or mirrored in the default branch on GitHub or else the software in the archive would not be usable. Through these conditions, GitHub’s platform-oriented forms of centralisation and control (Fuller et al, 2017) are extended into deep time through the preservation process of making GitHub ‘cold-storage-ready,’ while also creating conditions in which repositories with external dependencies are lost.

Further, to create the conditions in which the data might be preserved for up to 1,000 years, the open-source software code from GitHub underwent a series of transformations. The repositories were algorithmically selected by internal popularity scores, then automatically crawled and scraped before sent to Piql. There, 21TB of data was copied onto 186 reels of 35mm polyester film coated with a gelatin emulsion containing microscopically small light-sensitive silver halide crystals. This is commonly known as silver halide film, a popular photography and microfilm medium, but is here referred to as piqlFilm. Piql uses this material to transfer binary data using photons in frames along the film using their piqlWriter. A photochemical processor, known as the piqlProcessor, is a machine where the information written on the film is chemically developed and fixed to ensure image presence and permanence. Data is thereafter only read through a piqlReader, which reads frames from the film and converts them back into sampled images which are then decoded back into digital data. The cold storage transformation of GitHub-hosted repositories for the Arctic Vault challenge and build on archival processes to render datasets fixed in space and time, while making processes that are contingent on Piql products and inscribing these machines and formats into the future.

Finally, while the parent project to GitHub’s Arctic Vault, the Arctic World Archive (AWA), provides an interface through which their registered partners can make modifications or pull requests to their data deposits, the Vault provides no access at all to their users, meaning data cannot be removed or destroyed once included. In communications, they state that they will revisit and evaluate the project every 5 years. This is common with cold-storage archives (Radin and Kowal, 2017), where “cryo-objects are thus available and unavailable at the same time” (Braun, 2024). Through its empirical examination of the GitHub Arctic Vault project and its theoretical engagement with platform studies and archival technologies, this paper expands on emerging work in STS, media studies, and critical data studies that attends to the politics of deep time data preservation. It contributes to these fields by foregrounding how platform-driven archival practices extend centralization into the future, creating conditions in which the preservation of data paradoxically entails its stasis and potential obsolescence.

Paper 2 Web Archives as the Digital Twin: Navigating Tensions Between Preservation and AI

Recent news that the Internet Archive has „backed up the entire cultural heritage“ of Aruba in response to climate change exemplifies how discourses on web archives are shaped by imaginaries of completeness and sustainability—suggesting that, despite the ephemerality of the web, everything can be stored and made accessible forever. These visions have gained further momentum with the advent of generative AI, where web archives are seen as valuable datasets for training models and developing new digital methods (see Ogden, Summers, and Walker 2023; Acker and Chaiet 2020).

Drawing on ethnographic research with the German National Library, the Internet Archive, and web archiving experts, I discuss how the focus on completeness and AI-driven use cases creates a persistent sense of falling behind and inadequacy for traditional memory institutions. These institutions face challenges in balancing an emphasis on scale with selective curation, outdating technology, and limited resources. The German web archive is a compelling case, as the .de domain is too large to crawl completely (a common strategy of other national libraries), leaving them to decide what defines ‚the German internet‘ and which parts should be preserved. Their strategy of selective web archiving—focused on topic-based and event-specific crawls—demonstrates that data loss is not a failure but an inherent aspect of web archiving.

Questioning the digital fantasy of completeness and recognizing the conditions and challenges of web archiving highlights the need for a nuanced understanding of the political implications of both the practices and realities of selection and loss as well as connected generative endeavors. In this context this contribution has two main objectives: first, to empirically examine the German National Library’s selective web archiving strategy, where data loss and curation must be discussed, rather than hidden under the overpromise of supposed technical possibilities. Second, to critically explore tensions between different archival regimes and the competing strategies of preservation versus generative knowledge production.

To discuss these tensions, I introduce the concept of the digital twin, which represents a shift from traditional cultural heritage preservation to a generative, data-driven model. In this view, web archives are imagined as dynamic, continuously updated knowledge systems. These archival systems are expected to serve dual roles: mirroring the entire online world while simultaneously training and optimizing AI models that inform future scenarios, which then feed back into and refine the archive itself. Visions of digital twinning contrasts conditions of national libraries, where the archive is governed by a legal mandate to preserve a representative selection, often restricted to on-site access due to regulations like copyright. Since both web archives and digital twins are infrastructures of knowledge production and are deeply political, a more nuanced understanding of their "infra-politics" is crucial—one that acknowledges what is selected, saved, or lost in these processes (see Thylstrup 2018; Thylstrup et al. 2021).

Paper 3 Ontological overflows of data friction: Investigating mundane enactments of data absence at the Danish National Archives

In recent decades, several national archival institutions have been turning to the long-term preservation of administrative digital data of various kinds, prominently including the US and UK UK National Archives. A particularly pertinent case of this is constituted by the Danish National Archives, an institution which regards itself as “Denmark’s Digital Memory” (The Danish National Archives, n.d.) and has been charged with digital preservation since the 1970s (Rostgaard, 2023). As such, the Danish National Archives reflect both national archives’ broader turn to digital data preservation, but also exhibit a particularly significant institutional expertise in the project of keeping digital data present. In this paper, we take the case of the Danish National Archives to approach the long-term preservation of digital data as not exclusively or primarily productive of digital data presence, but also one that is shaped by the making of data absence. We ask: How are the practices of digital data preservation at the Danish National Archives always shaped by mundane makings of data absence?

To approach this question, we draw from recent theorizing in science and technology studies (STS) that calls for the value of exploring processes of how actors engender “the absent”. Inspired by a longer lineage of work within STS that attends to excluded and neglected people and things in technoscience (e.g., Star, 1990; Latour, 1992; Mol, 1999; Bowker & Star, 2000; Barad, 2007; Puig de la Bellacasa, 2011), Lee (2023: p. 2) develops the concept of “ontological overflows” to stay with actors but to look “the other way - toward practices of excluding, cutting, removing - the practices of making absences”. We transpose this analytical orientation to our own work, investigating how data disappears in digital knowledge regimes at the Danish National Archives. Additionally, we mobilize the long-standing STS concept of data friction which describes the “costs in time, energy, and attention required to simply collect, check, store, move, receive, and access data” (Edwards, 2010, p. 84). This conceptual combination is meaningful for studying archival institutions as it highlights the ontological stakes in moments of data friction and helps reinvigorate an STS of archives (Bowker, 2005; Waterton, 2010).

Mobilizing a range of materials related to the Danish National Archives’ data preservation practices, including expert interviews, institutional strategies, and press articles, our methodology draws from both infrastructural (Bowker & Star, 2000) and temporal (Velkova, 2024) inversion. The analysis highlights three ontological overflows and how they enact data absences: first, data format conversions; second, determinations of which data are ‘worthy of preservation’; and third, processes of digital data decay. These three overflows - each in their own way - highlight how the making of data absence is ingrained in the mundane project of keeping data present. Hence, the central contribution of this paper is to both conceptualize and empirically show how digital data preservation at archival institutions is shaped by mundane makings of digital data absence.

References Acker, A., and M. Chaiet. (2020). “The Weaponization of Web Archives: Data Craft and COVID-19 Publics.” Good Systems-Published Research. Barad, K. (2007). Meeting the Universe Halfway: Quantum Physics and the Entanglement of Matter and Meaning. Durham, NC: Duke University Press. Bingham NJ and Byrne H (2021) Archival strategies for contemporary collecting in a world of big data: Challenges and opportunities with curating the UK web archive. Big Data & Society 8(1). SAGE Publications Ltd: 2053951721990409. Bounegru, L. (2023). The platformisation of software development: Connective coding and platform vernaculars on GitHub. Convergence, 0(0). Bowker, G. (2005). Memory Practices in the Sciences. Cambridge, MA: The MIT Press. Bowker, G. & Star, S. L. (2000). Sorting Things Out: Classification and Its Consequences. Cambridge, MA: The MIT Press. Braun, V. (2024). The stuff of memories: Planning hindsight in animal cryobanks. Social Studies of Science, 0(0). Dowling S (2019) Why there’s so little left of the early internet. BBC. https://www.bbc.com/future/article/20190401-why-theres-so-little-left-of-the-early-internet Edwards, P. N. (2010). A Vast Machine: Computer Models, Climate Data, and the Politics of Global Warming. Cambridge, MA: The MIT Press. Fuller M, Goffey A, Mackenzie A, et al. (2017) Big diff, granularity, incoherence, and production in the GitHub software repository. In How to Be a Geek: Essays on the Culture of Software. Cambridge, UK: Polity Press. Ketelaar, E. (2023). The Agency of Archivers. Oxford Twenty-First Century Approaches To Literature, 287. Latour, B. (1992). Where are the missing masses? The sociology of a few mundane artifacts. In W. Bijker & J. Law (Eds.), Shaping Technology/Building Society (pp. 225-258). Cambridge, MA: The MIT Press. Lee, F. (2023). Ontological overflows and the politics of absence: Zika, disease surveillance, and mosquitos. Science as Culture, 33(3), 417-442. Mackenzie A (2018) 48 million configurations and counting: platform numbers and their capitalization. Journal of Cultural Economy 11(1): 36–53. Mol, A. (1999). Ontological politics. A Word and Some Questions. The Sociological Review, 47(1_suppl), 74-89. Ogden, J., E. Summers, and S. Walker. (2023). “Know (Ing) Infrastructure: The Wayback Machine as Object and Instrument of Digital Research.” Convergence. Puig de la Bellacasa, M. (2011). Matters of care in technoscience: Assembling neglected things. Social Studies of Science, 41(1), 85-106. Radin, J. and Kowal, E. (2017). Cryopolitics: Frozen Life in a Melting World. The MIT Press. Rostgaard, M. (2023). Archival paradigms: The past, present, and digitised future of Danish archiving. In G. Bak & M. Rostgaard (Eds.), The Nordic Model of Digital Archiving (pp. 23-41). Routledge. Star, S. L. (1990). Power, technology and the phenomenology of conventions: on being allergic to onions. The Sociological Review, 38(1_suppl), 26-56. The Danish National Archives (n.d.). Strategy 2025: The digital memory of Denmark. The Danish National Archives. Available at: https://en.rigsarkivet.dk/wp-content/uploads/2024/02/The-Danish-National-Archive-Strategy-2025.pdf [Accessed: 01/07/2024] Thylstrup, Nanna Bonde. (2018). The Politics of Mass Digitization. Cambridge, Massachusetts, London, England: MIT Press. Thylstrup, Nanna Bonde, Daniela Agostinho, Annie Ring, Catherine D’Ignazio, and Kristin Veel. (2021). “Big Data as Uncertain Archives.” In Uncertain Archives. Critical Keywords for Big Data, 1–27. Cambridge, Massachusetts, London, England: MIT Press. Waterton, C. (2010). Experimenting with the Archive: STS-ers As Analysts and Co-constructors of Databases and Other Archival Forms. Science, Technology, & Human Values, 35(5), 645-676. VanDerHorn, Eric, and Sankaran Mahadevan. (2021). “Digital Twin: Generalization, Characterization and Implementation.” Elsevier Decision Support Systems (145). Velkova, J. (2024). Data Infrastructures and their Temporalities. In T. Venturini, A. Acker & J. Plantin (Eds.), Sage Handbook of Data and Society. Thousand Oaks, CA: SAGE Publications.

15:30-16:00Coffee Break
16:00-17:30 Session 7A: Web Archives as Data
16:00
Mining Digital Terror: A Case Study in Using September 11 Web Archives as Data

ABSTRACT. The September 11th attacks and their aftermath (the “9/11 attacks”) are some of the most-documented events in human history. Throughout September 2001 and beyond, the Internet Archive, Library of Congress, and researchers from various fields came together to capture the unfolding events, creating digital repositories at (then) unprecedented scale. These repositories are held by these collecting institutions, as well as George Mason University’s September 11 Digital Archive. Yet their potential has been largely unrealized in histories of 9/11, due to the scale of data, fragmented formats, and inconsistent metadata in these early web archives.

I am currently writing a monograph on a digital history of 9/11, which involves marrying several different types of born-digital historical content: web archives found in the Internet Archive’s Wayback Machine, 3,349 blogs that were manually uploaded to the September 11 Digital Archive, as well as thousands of e-mails sent on list-servs (and preserved via web archives), manually donated to sites, as well as pager messages and other digital media obtained through freedom of information requests. This project involves transforming fragmented, inconsistent archives into cohesive datasets for computational analysis.

To write this history, I am transforming this information – pager message, e-mail, list-serv post, blog, and website - into large databases of CSV files for future analysis. This process demands careful integration of machine-generated and user-generated metadata, across formats that vary in structure, timezone, and language. For example, one list-serv was donated as a very long single text file, and another list-serv has been manually scraped from Wayback Machine snapshots of a Yahoo! message group, but I need to integrate them together.

The scale of these messages and sites is beyond human-readable comprehension. This is where computational techniques like topic modeling, metadata analysis, and keyword searches become essential tools to surface patterns in the data while maintaining a focus on the underlying human stories. For much of my data, my goal is to create human-readable PDFs, contextualized into the larger corpus via distant reading.

In my talk, I will advance three arguments. First, I will provide preliminary examples from my work, illustrating what a computationally-driven history of 9/11 can improve for the historiography. Secondly, I will underscore how much work I need to do today as a historian in standardizing metadata - which underscores the need for robust metadata frameworks today that adapt to the evolving nature of digital content. For web archivists today, this continually underscores the importance of establishing clear metadata standards at the outset of any event-based collecting initiative to enhance future accessibility and research potential. Finally, my presentation will explore the ethical considerations that arise when reprocessing and republishing publicly-available digital materials. For example, I am combining several discrete born-digital collections into one comprehensive dataset for my own analysis — should I share such information? This dilemma reflects broader concerns about data ownership, privacy, and the ethics of reprocessing web archives as data.

16:20
Establishing which websites constituted a national web in the 1990s

ABSTRACT. When doing historical studies of the web one of the first and most fundamental tasks is to determine which web entities are within the scope of the study, be that web elements, web pages, websites, or web spheres. Depending on the concrete study the result is a list that identifies the web entities to be included, and with this list at hand researchers can try to retrieve the relevant web entities, e.g. in a web archive, and use them in their analyses. However, establishing such a list is not easy, because useful comprehensive sources and overviews are very often lacking.

This presentation investigates how a researcher can establish which websites constituted a national web in the 1990s, based on the ongoing research project 'Histories of the Danish web in the 1990s'. The aim is to develop and test a method to identify Danish websites of the 1990s. The method use two overall approaches, each of which come with different sub-approaches.

(1) Finding old ccTLD domain name lists Obviously, the list of registered domain names of the ccTLD .dk is a strong candidate when trying to establish a list of website domains of the past. The following sub-approaches were used: (a) Contacting the existing ccTLD administrator: Punktum.dk, today's administrator has not preserved old ccTLD lists, they were discarded in 2018 due to GDPR rules. (b) Contacting previous ccTLD administrators: The .dk ccTLD had several administrators in the 1990s, just identifying these, let alone finding relevant staff to contact today, is a challenge; this step has not yet been fully explored. (c) Searching the websites of existing and previous ccTLD administrators in the Internet Archive: Spending a lot of time (and with a great deal of luck) and having access to the archived Danish web from the 1990s through a SolrWayback interface, including full-text search, was a huge success, and complete ccTLD lists from 1996 (Oct) and 1997 (Jan) were found.

(2) Reconstructing website domain names based on other sources ccTLD domain names are important, but they do by no means constitute a complete list of existing domain names. In the Danish case only companies, organisations, and the like were entitled to buy a domain name until begin 1997, and therefore all other web actors had their websites hosted on web hotels, with web addresses like 'inet.tele.dk/name'. This is one of the reasons why reconstructing website domain names is relevant, and here the following sub-approaches were used: (a) .dk websites in existing web archives: Based on the above mentioned SolrWayback access a list of the .dk domain names that were present in the web archive was created, including sub-domains of web hotels. (b) Outgoing links from .dk websites to .dk websites: A list of all outgoing links from all websites identified in step (2a) was created and filtered to keep the website address where the website had not been archived (called 'known unknowns') which resulted in a list of .dk websites that were linked to (but not archived) and that may have existed in the past. (c) Directories/lists/web hotels: Various web directories/lists/web hotels were identified and their listing of websites was used (due to scripting in the code they were not in all cases identified in the two previous steps). (d) Other sources: Other sources were consulted, including print media (books, magazines), digital copies of news papers, and usenet groups and a number of websites were identified, in particular for the period before Oct 1996 when the Internet Archives started. Based on these different approaches annual lists of Danish websites in the 1990s was establashed, as comprehensively as possible.

As this brief overview indicates, establishing a complete picture of which websites constituting a national web in the past is not straigh-forward, hence, claims to comprehensiveness of the analyses based on this material may be weakened.

In the presentation, all of the points above will be explained and evaluated in detail, and their potential use in other use cases will be debated.

16:40
Datafication of Web Archives and the Periodization of Website History: A Case Study of the National Museum of Australia

ABSTRACT. Studying the history of museums on the web faces multiple challenges, including those related to the specificity of the website as an object of study (Brügger, 2009). The problem of the ephemeral character of the website is quite familiar to the researchers of the live web and becomes even more complicated in relation to the archived web (an den Heuvel, 2010). Periodisation of the websites’ development and reconstruction of the versions of the websites for research are under ongoing discussion by scholars. There are several approaches to the periodisation of the website evolution: 1) reference to the technological changes in website construction and design (Allen, 2013; Helmond, 2013); 2) shifts in the content published on the websites (Chakraborty & Nanni, 2017); 3) generalisation of the web development (Anat Ben-David, 2019). The versioning of websites, selecting portions of information that should be taken into account while researching, and decomposing preserved data into fragments also refer to periodisation. A version can be considered as a composition of the snapshots from a certain period. A year is often taken into account for reconstructing a website or it can be a selection of separate years with some gap in between for tracing the changes (Svarre & Skov, 2024). Of course, the approach depends on the research purposes and may vary. The proposed paper suggests considering the periodisation based on the assessment of the resources preserved on the web archives and available for research. The Web archives have undergone significant changes from the early years of the Internet to today. It is not surprising that the quality and details of web preservation have changed over time, directly affecting the amount and quality of data we have today for scholarly consideration. Identification of these periods is essential to reconstructing the versions of the website, revealing the shift in content within the period and then addressing the changes, keeping in mind the volume of data. The website of the National Museum of Australia (NMA) has been selected as a case study. This example is interesting from a comparative perspective on the web archives. The NMA website has been preserved by both initiatives - the Internet Archive and the Web Archive of Australia - since 1996. The Jupyter Notebooks from the GLAM Workbench have been utilised to obtain the data and test the hypothesis. The code from the Notebooks (Web Archives, 2024) has been modified to obtain and visualise data for research purposes. The diagrams reveal the distribution of the snapshots preserved on each of the considered web archives and identify the specific periods of the website’s preservation. The gaps in data have also been mapped. This approach supported the argument for selecting particular periods of time for studying the website's history and showed differences and similarities in the datasets in the two web archives. The rationalities, processes, and results of the research will be outlined at the conference.

References: 1. Allen, M. (2013). What was Web 2.0? Versions as the dominant mode of Internet history. New Media & Society, 15(2), 260–275. 2. an den Heuvel, C. (2010). Web Archiving In Research And Historical Global Collaboratories. In N. Brugger (Ed.), Web History (in book series Digital Formations, ed. S. Jones) (pp. 279-303). Peter Lang International Academic Publishers. 3. Ben-David, A. (2019). National web histories at the fringe of the Web: Palestine, Kosovo, and the quest for online self-determination. In N. Brügger, D. Laursen (Eds.), The historical web and Digital Humanities: The case of national web domains (pp. 89—109). Abingdon: Routledge. 4. Brügger, N. (2009). Website history and the website as an object of study. New Media & Society, 11(1-2), 115-132. 5. Chakraborty, A.; Nanni, F. (2017). The changing digital faces of science museums: diachronic analysis of museum websites. In Niels Bruegger (ed.) Web 25: Histories from the first 25 Years of the World Wide Web. New York, NY: Peter Lang. 6. Helmond, A. (2013). The algorithmization of the hyperlink. Computational Culture, 3(3). 7. Svarre, T.; Skov, M. (2024) The online presence of the Danish public sector from 2010 to 2022: Generating an archived web corpus. In: Exploring the Archived Web during a Highly Transformative Age: Proceedings of the 5th International RESAW Conference, Marseille, June 2023. Gebeil, S. & Peyssard, J-C. (eds.). Firenze University Press 8. Web Archives. GLAM Workbench, accessed on 19.09.2024, https://glam-workbench.net/web-archives

16:00-17:30 Session 7B: RSNs
16:00
To Monetize or not to Monetize: doubts, resistance and U-turns in early YouTubers communities

ABSTRACT. In this presentation, the aim is to understand the impact of monetization and what it meant to the ideals of participatory amateur culture in the early 21st century. The focus will be on the first ten years of YouTube (2005-2015), which started as a perfectly suitable infrastructure for amateurs who loved the opportunity to distribute self-made videos. Initially, YouTube had a strong base in a peer community-driven platform that believed in a truly democratic opportunity for even and fair distribution of user-generated content. Especially in its early stage, YouTube ‘thrived on enthusiasm of users as they ran and operated their new virtual spaces, which were often regarded as experiments in online citizenship and a reinvention of the rules for democratic governance’ (Burgess and Green 2009, 15). Scholars like Benkler (2006) believed a networked public sphere, characterised by non-market peer production, was possible. The early days of the web saw a level of enthusiasm for the promise of a so-called ‘pro-am’ revolution, which is famously summarised by the concept of ‘participatory culture’ by Henry Jenkins (2006). However, this ideal of participation and democratic communities was questioned by many users after Google acquired YouTube in 2006. Although Google had initially promised to keep the community-based identity of the platform intact, the gradual introduction of regulations (e.g. the partnership programme in 2008) that enabled some form of monetisation ultimately changed the character of YouTube. YouTube developed into a platform with a far more complex interwovenness of not just what users do or hope or what technologies enable but also the particular business models and specific rules of governance that underpin this whole system (cf. Van Dijck et al., 2018) Because of this, YouTube became the perfect example of a website that deploys automated technologies and business models to organise data streams as part of measuring economic interactions, while providing social exchanges between users of the Internet. While the platform quickly commercialized, a discursive struggle ensued regarding what makes YouTube a place for amateurs. As one of the amateurs voiced discontent around 2015: ‘YouTube died when it stopped being a hobby and started becoming about the money.’ Some users pleaded for the return of the “old YouTube: the once active community of alternative, unruly users generating a specific cultural form. This discussion was repeated in many other conversations elsewhere on YouTube, and it shows users' passionate, often quite divergent, positions about the identity of the ‘real’ YouTuber. With YouTube’s 20th anniversary, it is an excellent moment to revisit historical controversies and debates about the impact of the datafication of this part of the web.

16:20
Tumblr Purge: A Story Told Through Data

ABSTRACT. In November 2018, after being suspended from Apple’s App Store for hosting child pornography, Tumblr announced its decision to ban *all* NSFW (not safe/suitable for work) content with the aid of machine-learning classification. The decision to opt for strict terms of use governing nudity and sexual depiction was as fast as it was drastic, leading to the quick erasure of subcultural networks developed over a decade. My contribution maps out platform critiques of and on Tumblr through a combination of visual and digital methods. By analyzing 15158 posts made between November 2018 (when Tumblr announced its new content policy) and August 2019 (when Verizon sold Tumblr to Automattic), it explores the key stakes and forms of user resistance to Tumblr's “porn ban”. The presentation reflects on the circulation of user-generated content in response to platform-driven censorship, with particular attention to practices of screenshotting and memeification. It further explores the changing relations of relevance in the controversy surrounding the deplatforming of Tumblr cultures.

16:40
The business of datafied identity: LiveRamp’s evolution in the audience economy

ABSTRACT. Data brokers are companies that collect, aggregate, and sell access to personal information about individuals and organisations within the “audience economy” (Helmond and Van der Vlist, 2023). These companies have become central actors in the digital economy by providing businesses with detailed consumer profiles that can be used for marketing, credit scoring, and other (often automated) decision-making processes. Notable data brokers companies include Acxiom, Experian, Equifax, and LiveRamp, whose revenues reflect the growing importance of data as a commodity.

This paper contributes to the growing body of research on the role of data brokers in the digital data economy (Christl and Spiekermann, 2016; Crain, 2018, 2021; Elmer, 2004; McGowan et al., 2024; McGuigan, 2023; Reviglio, 2022; Van der Vlist and Helmond, 2021; Zook and Spangler, 2023) by offering a historical perspective on one of its central players: LiveRamp, which serves as a microcosm for understanding political economy and power in the broader data brokerage landscape (Van der Vlist and Helmond, 2021). Applying a methodological framework that utilises the Internet Archive Wayback Machine for examining platform evolution (Helmond et al., 2019; Helmond and Van der Vlist, 2019), we trace LiveRamp’s platform evolution across three dimensions: its discursive positioning, data and product offerings, and partner ecosystem. Through this analysis, we reveal how LiveRamp has navigated and influenced the shifting dynamics of the digital data economy in its favour.

First, we examine LiveRamp's discursive positioning by analysing changes in its taglines, “about” pages, and other self-representative content. This analysis reveals how the company has rebranded itself over time—from an “onboarder” of data to an “identity provider.” This shift shows its ambition to build critical “identity infrastructure” within the data industry. Furthermore, LiveRamp increasingly markets itself as an “open” and “interoperable platform for data collaboration,” positioning itself as a neutral or independent actor in contrast to the closed data silos of major social media platforms.

Second, we explore the evolution of LiveRamp’s data and product offerings. Over time, the company has expanded its services from basic data “onboarding” to facilitating complex data “connectivity” and “collaboration” across platforms. This includes the aggregation and marketisation of first-party data, an area of growing significance as privacy regulations like the GDPR in Europe and the California Consumer Privacy Act (CCPA) in the United States impose stricter rules on how companies collect and share personal data. The emergence of data “marketplaces”, “clean rooms” and the increasing role of “first-party” and “synthetic” data are key developments we observe in LiveRamp’s trajectory, reflecting a broader industry shift towards privacy-preserving data practices.

Third, we examine LiveRamp’s partner ecosystem, which is critical to its expansion and integration within the broader data economy. By forming strategic partnerships with major technology platforms, service, media, and data providers, LiveRamp has established itself as a central connector and infrastructural gateway within the audience economy. These partnerships also reveal how the larger data economy has evolved in response to technological innovations, shifting market conditions, and regulatory frameworks.

While many of their operations remain opaque, examining the evolution of a data broker on these three levels helps us to better understand: (1) how discursive self-positioning reveals the power dynamics and ideological underpinnings in LiveRamp’s portrayal of its role within the data economy; (2) how the company’s corporate messaging and product offerings co-evolve in response to regulatory pressures, positioning it as a ethical “leader” or “innovator” in the field; and (3) the growing scale and complexity of the data economy, as evidenced by the company’s partnerships and integrations with other players in the industry. These elements bring together LiveRamp’s evolution as a data broker and highlight the key role of partnerships in creating and expanding its infrastructural (platform) power within the data economy. Furthermore, this analysis underscores the company’s evolving influence in the advertising industry, particularly as privacy regulations shifted in favour of LiveRamp’s position.

16:00-17:30 Session 7C: Social Media and APIs
16:00
Robots.txt and A History of Consent for Web Data Capture

ABSTRACT. Web archives are increasingly positioned as ideal data repositories for building generative artificial intelligence (AI) and training datasets for Machine Learning (ML) (van Strien, 2023; Alam, 2023; Cargnelutti et al. 2024; Deckers & Potthast, 2022). In recent years, the alignments and overlaps between web archives and training data have become more widely discussed and addressed, as web archives like Common Crawl have been used as the basis for model training data like the Colossal Clean Crawled Corpus (Dodge et al., 2021; Baack, 2024). At the same time, pushback against the development of generative AI by users and companies alike has taken the form of restricting access to open data on the web. A recent study by the Data Provenance Initiative, an MIT-led research group, discovered an “emerging crisis in consent,” affecting the free collection of AI training data sets from the open web (Roose, 2024).

In this paper we take a historical, socio-technical, and critical approach to understanding the ongoing relationships between data, web archives, and AI. We frame our discussion by examining the 30-year-old Robots Exclusion Protocol (REP, or Robots.txt) as a means for controlling crawler behavior, and its ongoing role as an infrastructural component of the web inscribing concepts of consent. Past work on the protocol, developed by Koster (1994) highlights its role in censorship and impacts to excluding materials from web archives (Elmer, 2009; Ogden, 2020).

In our approach, we trace the use of REP in the parallel genealogies of web archiving, and the development of AI and machine learning technologies. Using a historical analysis of early web mailing lists (Hocquet & Wieber, 2018), we demonstrate how both histories exhibit three key moves that allow for robots.txt to be reinterpreted and repurposed over time to justify collecting decisions and represent ethical data decision making. In this paper, we argue that 1) robots.txt use for indexing and retrieval has been extended to technologies for capture and extraction, 2) the definitions for bot behavior and ‘politeness’ have been deployed as a de facto ethical framework for all web data collection, and 3) prompted by recognition of data’s value and ownership, robots.txt extends ethical rules to determine the legal implications of robots.txt.

We focus our discussion on one aspect of data collection: how automated tools conceptualize and operationalize consent. As the generative turn in AI makes clear, web archives should not only be understood as historic collections, but also sites of future-oriented knowledge regimes. As such, understanding web archives’ orientation to consent from data creators or data subjects has long-ranging effects on archived web data’s future uses. We conclude by considering how critical data studies and critical archival studies can contribute perspectives beyond the technical solutions of REP, and address the context-dependent nature of consent and mediating access to information. We reflect upon the impacts for large-scale data collection and analysis and the future of both web archives and AI.

References Alam, S. (2023). IACopilot [Python]. Internet Archive. https://github.com/internetarchive/iacopilot Baack, S. (2024). Training Data for the Price of a Sandwich: Common Crawl’s Impact on Generative AI. https://foundation.mozilla.org/en/research/library/generative-ai-training-data/common-crawl/ Cargnelutti, M., Mukk, K., & Stanton, C. (2024, February 12). WARC-GPT: An Open-Source Tool for Exploring Web Archives Using AI. Library Innovation Lab. https://lil.law.harvard.edu/blog/2024/02/12/warc-gpt-an-open-source-tool-for-exploring-web-archives-with-ai/ Deckers, N., & Potthast, M. (2022). WARC-DL: Scalable Web Archive Processing for Deep Learning. https://arxiv.org/abs/2209.12299v1 Dodge, J., Sap, M., Marasović, A., Agnew, W., Ilharco, G., Groeneveld, D., Mitchell, M., & Gardner, M. (2021). Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus (arXiv:2104.08758). arXiv. https://doi.org/10.48550/arXiv.2104.08758 Elmer, G. (2009). Robots.txt: The Politics of Search Engine Exclusion. In J. Parikka & T. D. Sampson, The Spam Book: On viruses, porn, and other anomalies from the dark side of digital culture (pp. 217–227). Hampton Press. Koster, M. (1994). A Standard for Robot Exclusion. The Web Robots Pages. http://www.robotstxt.org/orig.html Ogden, J. R. (2020). Saving the Web: Facets of Web Archiving in Everyday Practice [Phd, University of Southampton]. https://eprints.soton.ac.uk/447624/ Roose, K. (2024, July 19). The Data That Powers A.I. Is Disappearing Fast. https://www.nytimes.com/2024/07/19/technology/ai-data-restrictions.html van Strien, D. (2023, May 10). Getting Started with Machine Learning and GLAM (Galleries, Libraries, Archives, Museums) Collections | Internet Archive Blogs. https://blog.archive.org/2023/05/10/getting-started-with-machine-learning-and-glam-galleries-libraries-archives-museums-collections/ Hocquet, A., & Wieber, F. (2018). Mailing list archives as useful primary sources for historians: Looking for flame wars. Internet Histories, 2(1–2), 38–54. https://doi.org/10.1080/24701475.2018.1456741

16:20
On Reciprocity - Algorithmic Interweavings between PageRank and Social Media

ABSTRACT. This paper argues, that from a historical point of view Google's PageRank algorithm plays a crucial role for the datafied infrastructures of contemporary social media. In the first part of the talk I will show, the social principle of reciprocity, which is always to be regarded as two-sided and essential for the production of sociality, also plays a central role at the technological implementation level of PageRank. To demonstrate this, Moreno's sociometry (Moreno 1934), which is a central reference point in the patents of PageRank (Page 2001, Page 2004) and essential for its functioning, is to be brought into dialogue with the gift theory by Marcel Mauss (1990). Both approaches, Moreno's and Mauss', assume that the smallest social unit is the dyad and that society only comes into being through a third element (e.g. a gift). It will be argued that PageRank, based on the web architecture with hyperlinks (as third elements), makes use of precisely this central social principle of two-way reciprocity and institutionalizes it technically. This marks at the same time an expansion of ‘intersubjective spacetime’ (Munn 1986), wherein something like reputation can arise in the first place. And is precisely this principle of ‘networked prestige’ (Halavais 2008) that underlies both Google's PageRank and the datafied infrastructures of today's social media platforms. Against this background, the second part of the talk focuses on the growing blogosphere at the beginning of the 2000s and its interweaving with PageRank. Of particular importance here are the track and pingback functions within the blog journals, which are based on the principle of two-sided reciprocity, as they automate (social) linking practices of bloggers among themselves. The blog search engine Technorati has been taking advantage of these practices since 2002, similar to the principle of PageRank, and rates blogs based on the reputation of the links for each blog (using track- and pingbacks), thus assigning them a ‘networked prestige.` In other words, it can be observed in concrete terms how the dyadic principle (as the smallest social unit) is technically institutionalised and an algorithmic organized hierarchy emerges, which also becomes the basis of the feeds of social media platforms. Finally, it will be shown on a theoretical level that the introduction of PageRank marks a crucial distinction between a direct and a generalised form of reciprocity (Stegbauer 2011), that is inscribed in social media platforms. This manifests itself in the principles of ‘befriending’ (as a direct form of reciprocity) or ‘following’ (as a generalised form of reciprocity) in the user interfaces, as well as in hybrid forms of both reciprocities, which arise for instance from privacy settings (a private Instagram or Twitter/X account). In the context of the datafied web, reciprocity is a precondition for the platformization (Helmond 2015) of social media.

References

Helmond, A (2015): The Platformization of the Web: Making Web Data Platform Ready. Social Media + Society 1(2). Mauss, M (1990): The Gift: The form and reason for exchange in archaic societies. London: Routledge. Moreno, J. L. (1934) Who shall survive? A New Approach to the Problem of Human Interrelations. Washington: Nervous and Mental Disease Publishing. Munn, N. (1986) The fame of Gawa. A symbolic study of value transformation in a Massim (Papua New Guinea) society. Cambridge: University Press Page, L. (2001) Method for Node Ranking in a Linked Database, US Patent 6285999B1. Page, L. (2004) Method for Scoring Documents in a Linked Database, US Patent 6799176B1. Stegbauer, C. (2011) Reziprozität. Einführung in soziale Formen der Gegenseitigkeit. Wiesbaden: Springer.

16:40
APIs. How their role in the history of computing and their software engineering principles shape the modern datafied web.

ABSTRACT. This paper takes a media and software studies approach to the discussion of APIs (Application Programming Interfaces) and shows how their genealogy and their main software engineering principles (e.g. separation of concerns, information hiding, reusage) resurface as wide-ranging social and political implications in modern-day web APIs, leading to discussions about data-sharing and data-hiding, access, power, platformization, innovation and collaboration, as well as interdependence, the commodification of the web, and legal considerations.

But let's start at the beginning. APIs are as old as the history of digital computing itself. Way before modern web APIs became one of the defining components of the datafied web, certain computer programming routines (which weren't known under the term API back then) powered the rise of computing and software design from as early as the 1940s on. At a time, when computers consisted mainly of big, wired hardware and machine code, Herman Goldstine and John von Neumann already in their 1949 paper "Planning and Coding of Problems for an Electronic Computing Instrument" saw the need for shared computing components, that is, for library subroutines. These were meant to be libraries for tasks that must be computed all the time, like mathematical equations or input-output communication.

But it wasn't until the end of the 1960s that the term Application Programming Interface was coined, designating a well-defined interface, that allows one software component to access programmatically another component. This definition can still be applied to contemporary web APIs. Web APIs started with e-commerce sites like eBay and Amazon, were followed by social media platforms like Facebook and X and were built into mobile applications like Google Maps and Instagram. Countless smaller APIs conclude the current web landscape.

Within the data-driven web, the core software design principles of APIs become not only a technical necessity, but also actors with an extensive social and political scope. For example, accessing one software component through an API means the following: there is a well-defined access point, through which you can interact with certain components. All other components are hidden from outside access. This is called information hiding, and it is one of the core software design principles of APIs. This principle is beneficial for reducing complexity, for decreasing dependency between programs, and for protecting components from misuse. But at the same time, it could lead to hindering access, and exploiting power relations between the creators and the users of an API.

APIs are full of inherent ambiguities. And it is exactly at this intersection, where this paper is placed. Showing their historicity and their powerful innovative implications on the one hand, as well as their immanent unstable power negotiations on the other hand. It is like an ever-changing dance around openness and closedness, around creation and hinderance for everything and everybody who is interacting with an API – be it users, developers, algorithmic agents, or hardware components.

17:30-18:00Coffee Break
18:00-19:30 Session 8: KEYNOTE by Nanna Bonde Thlystrup (chair: Sebastian Gießmann)
KEYNOTE: Vanishing points: technographies of data loss

ABSTRACT. What happens to data when it vanishes? How do digital remains persist even as information seemingly disappears? What can the politics of disappearance tell us about power in datafied worlds?

Disappearance has become a crucial yet understudied force shaping digital experiences and infrastructures. This keynote develops a technographic approach to examine how data loss and digital remains create complex patterns of presence and absence that defy simple narratives of erasure. Through cases ranging from platform architectures to digital archives, it traces how power operates through sophisticated mechanisms of appearing and vanishing, leaving traces that persist in unexpected ways.

By mapping these dynamics of disappearance, the exploration uncovers how our data landscapes are shaped not by mere accumulation, but through intricate processes of loss, persistence, and transformation. The keynote explores how digital societies negotiate memory and forgetting, positioning disappearance itself as a crucial digital experience.

The talk develops theoretical tools for analyzing these dynamics while remaining grounded in concrete technological practices and their political implications. Through this, it compels us to rethink fundamental assumptions about presence, absence, and the complex temporalities and materialities of digital culture.

19:30-22:00Dinner Reception (Unteres Schloss Foyer, US-C 150)