IC2S2-2021: 7TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SOCIAL SCIENCE
PROGRAM FOR SATURDAY, JULY 31ST
Days:
previous day
all days

View: session overviewtalk overview

09:00-10:30 Session S1-A: Societal Challenges
Location: Track A
09:00
Countering Filter Bubble with a “Bait-and-Switch” News Recommender System

ABSTRACT. We propose to counter the filter bubble and nudge users to read more diverse news articles by recommending articles that are similar in headlines but dissimilar and diverse in content. The degree of dissimilarity is calibrated for each user through an explore-exploit approach.

09:15
Finding Coordinated Amplification on Twitter with dSNA

ABSTRACT. Astroturfing and other mis/disinformation campaigns cause real world harm (e.g., vaccine hesitancy and political violence). These coordinated amplification activities may appear as anomalous levels of coincidental behaviour. We present a temporal window-based detection pipeline and its evaluation on several political Twitter datasets resulting in new insights and validation techniques.

09:30
Twitter Bot Networks and Activity Patterns during the COVID-19 Infodemic

ABSTRACT. We show the segregated topology of their retweet networks during COVID-19 infodemic. Through analyzing bot score, indegrees and common retweeted items, we found out that although the basic influence of information diffusion could be larger in human users than bots, the effects of bots are non-negligible under an infodemic situation.

09:45
Digital traces reveal the gender gap of flexible workers: women work less in the evenings

ABSTRACT. We examine the gender gap in the workload of flexible workers using data about 6.4 million logs of online English lessons, and show that women work less than men, especially at the age of 30-35 y.o., and this difference is mostly explained by lower evening workload of women.

09:00-10:30 Session S1-B: Computational Methods and Applications
Location: Track B
09:00
Tiplines to Combat Misinformation on Encrypted Platforms: A Case Study of the 2019 Indian Election on WhatsApp

ABSTRACT. We compare crowd-sourced ‘tips’ sent to a WhatsApp tipline with large, public groups on WhatsApp. Our analysis shows that tiplines cover the most popular content well, and a majority of such content is often shared to the tipline before appearing in large, public WhatsApp groups.

09:15
How Ideas Spread? Analyzing Growth and Diffusion of Topics in Event-based Social Network

ABSTRACT. We present a new computational methodology for the quantitative analysis of cultural and intellectual globalization and demonstrate its usefulness in a practical study that uses data about 2.6 million Meetup events organized in 146 countries during a sixteen-year period (2003-2019).

09:30
Developing and applying sequence-based similarity methods for dynamic audiovisual stimuli

ABSTRACT. When computing similarity for time-varying data or stimuli, there is sometimes a need for methods which abstract away from a time-locked, continuous representation. Here, an approach is described combining HMMs, string-based distance metrics, and clustering using DBSCAN, which successfully distills 19,000+ stimuli down to 9 which are abstractly maximally-representative.

09:45
The Shape of Design History: Exploring Evolution of Sneakers Design at Scale Using Neural Embedding

ABSTRACT. We have crawled sneaker images/metadata online and constructed a deep-learning-based sneaker design embedding model by utilizing three attributes, shape, color, and segment. Our work can be linked with cultural analytics to discover the patterns of "latent" features driving cultural evolution over a long period in areas such as fashion.

09:00-10:30 Session S1-C: Social Media (Twitter)
Chair:
Location: Track C
09:00
Estimating Individual Socioeconomic Status of Twitter Users

ABSTRACT. This paper presents a method to estimate the socioeconomic status of individual Twitter users from the commercial and entertainment brands they are following.

09:15
Evaluating Twitter Data Collection Methodologies for Political Science Research

ABSTRACT. Researchers often collect Twitter datasets through keyword-based search. Yet, prior literature demonstrates that this approach is potentially biased and the resulting datasets unrepresentative. Scholars have since proposed various alternative approaches. Yet, few comprehensive comparisons have been done to rank these diverse methodologies. Our paper aims to fill this gap

09:30
What drives anti-immigrant sentiments online? A novel approach using Twitter

ABSTRACT. We study anti-immigrant attitudes of 28,000 Twitter users over a one-year period. People tweet more negatively about immigrants in periods following high media salience of immigration, but less negatively if they live in the areas with more immigrants, or follow a more ethnically diverse group of people on Twitter.

09:45
Preaching to Social Media: Effects of Turkey's Friday Khutbas on Twitter

ABSTRACT. I analyse through LDA the content of all Friday sermons read in Turkey’s Mosques between 2015 and 2021 to millions. I link sermons with tweets to analyse whether sermons affect public opinion on six issues: business, family, nationalism, trust, patience, health. I document strong effects of sermons on tweets.

09:00-10:30 Session S1-D: Causal Inference and Applications
Location: Track D
09:00
Perverse Downstream Consequences of Debunking

ABSTRACT. We investigate consequences of corrections on users’ subsequent sharing. We identified N=2,000 users who shared false content on Twitter and replied to their tweets with links to fact-checking websites. We find causal evidence that the correction decreases quality, and increases slant and language toxicity, of the users’ subsequent retweets.

09:15
Shared partisanship dramatically increases social tie formation in a Twitter field experiment

ABSTRACT. We test the causal effect of shared partisanship on the formation of social ties in a field experiment on Twitter. We find users were roughly three times more likely to reciprocally follow-back bots whose partisanship matched their own, and this was true regardless of the bot’s strength of identification.

09:30
The Indirect Influence of Bots on Humans through Recommendation Algorithms

ABSTRACT. We show that bots can affect opinions also when all direct connections between the bot and human agents are removed. Results show that influence happens via recommender systems even in the absence of direct connections. The presence of the bot is sufficient to shift the average population opinion.

09:45
Formation of Social Ties Influences Food Choice: A Campus-Wide Longitudinal Study

ABSTRACT. Nutrition is a key determinant of long-term health, and social influence has long been theorized to be a key determinant of nutrition. To identify it, we leverage a novel source of data: logs of 38 million food purchases made on a major university campus, linked to anonymized individuals’ smartcards.

09:00-10:30 Session S1-E: Statistical Methods and Applications
Location: Track E
09:00
The emergence of hierarchy in spatial diffusion over the life-cycle of innovations

ABSTRACT. Using a model capturing distance-decay, urban scaling, and hierarchical difference, we show that hierarchical diffusion has an increasing role over the life-cycle in the spatial adoption of an online social network.

09:15
Harder, better, faster, stronger cascades -- or simply larger?

ABSTRACT. By studying the structure of online diffusion cascades, one hopes to understand how different content spreads. We show the importance of the joint distribution of statistical properties of such cascades in any analysis, both through an empirical analysis of false/true news cascades and through the analysis of a motivating model.

09:30
Data Science of Judicial Decisions for Evidence-Based Housing policies in Spain

ABSTRACT. We present a methodology that aims at studying the jurisprudence by scrutinizing data from judicial decisions so to reveal systemic patterns, pillar decisions and the consequences of important legislative changes, and thus be able to evaluate those mechanisms that fail along the judiciary process.

09:00-10:30 Session S1-F: Simulations
Location: Track F
09:00
Impact of Herd Behavior and Visual Field of Agents on Crowd Evacuation Decisions

ABSTRACT. The analysis of the video clip captured during the Great East Japan Earthquake showed unique evacuation behavior of people, which was reproduced by our evacuation decision model; analysis of the simulations revealed that the visual field of agents narrows to 20 degrees.

09:15
Measuring Selective Exposure: A Systematic Comparison of the Application of Community Detection Algorithms in Theoretical and Empirical Co-exposure Networks

ABSTRACT. This study develops a formal model of audience behavior and compares the performances of 8 community detection algorithms at appraising selective exposure in media consumption patterns. It replicates the qualitative findings on an empirical network, demonstrating how formal models can be used to inform analytical decisions in computational social science.

09:30
Power-law intensity distributions in linear and nonlinear self-excited Hawkes processes

ABSTRACT. Hawkes processes are popular stochastic model describing bursty and self-exciting phenomena in social and financial systems. In this talk, we present our theoretical solutions to linear and nonlinear Hawkes processes. We finally find robust power-law scaling laws in the intensity distribution, which will be useful for calibration in data analyses.

09:45
Evolution of Kinship Structures Driven by Marriage Tie and Competition

ABSTRACT. Kinship structures governs social relationships in indigenous societies. We modeled such societies comprising families that have traits and mating preferences to determine the relationships concerning marriage, cooperation, and competition. With numerical simulations, families formed kinship structures. Environmental dependence of emergent structures are empirically verified by analyzing Standard Cross-Cultural Sample.

10:30-12:00 Session S2-A: Societal Challenges (Ethics)
Location: Track A
10:30
Demographic Bias in Named Entity Recognition

ABSTRACT. We assess bias in various Named Entity Recognition (NER) systems across different demographic groups using synthetically generated corpora. Our analysis reveals that models perform better at identifying names from specific demographic groups. We observe that character-based contextualized word representation models results in the least bias across demographics. Paper: http://arxiv.org/abs/2008.03415

10:45
Who Gets What, According to Whom? An Analysis of Fairness Perceptions in Service Allocation

ABSTRACT. Our work presents the results of a multi-factor conjoint analysis study about fairness perceptions, specifically quantifying the effects of a certain context in which a question is asked, the framing of the given question, and who is answering it.

11:00
Subfields, gender, and prestige in computer science faculty hiring

ABSTRACT. Using a comprehensive database of training, employment, and publication records for tenure-track faculty at U.S. PhD-granting computer science departments, we investigate the role of computer science subfields in shaping field-level gender diversity, finding subfields vary widely in gender composition and are unevenly distributed across the prestige hierarchy of academic institutions.

11:15
The corruptive force of AI-generated advice

ABSTRACT. We present a behavioral experiment (N=1,572), testing whether AI-generated advice corrupts and whether transparency about AI presence mitigates its influence. Using the NLP-algorithm, GPT-2, we generated honesty- and dishonesty-promoting advice that participants read before engaging in a cheating-task. Results indicate: AI-generated advice corrupts people, even when knowing the advice source.

10:30-12:00 Session S2-B: Computational Methods and Applications
Location: Track B
10:30
Claim Matching Beyond English to Scale Global Fact-Checking

ABSTRACT. We define claim matching as identifying pairs of messages that can be served with one fact-check. We construct a novel dataset of pairs of WhatsApp messages and pairs of fact-checks annotated for this. Our dataset contains English, Hindi, Bengali, Malayalam, and Tamil. We train a embedding model that is state-of-the-art.

10:45
Calibration of Google Trends Time Series with Google Trends Anchor Bank

ABSTRACT. Google Trends is a Swiss army knife for data scientists, but users have been hampered by its imprecision, stemming from the fact that results are scaled and rounded. We sharpen the knife with a simple and efficient calibration method, and are looking forward to introducing it to the IC2S2 community.

11:00
A Large Scale Study of Reader Interactions with Images on Wikipedia

ABSTRACT. In this work, we quantify and characterize readers’ interest in images when browsing Wikipedia by running a large scale analysis of users' traffic data about interactions with images and pages on the encyclopedia.

11:15
Novelty and Cultural Change in Modern Popular Music

ABSTRACT. Understanding how cultural changes occur provides insight into how creative ideas develop and spread. Using Music Information Retrieval (MIR) data, which provides quantitative information from audio signals, we apply computational analysis to explore how the introduction of novel ideas and attributes fuels ongoing creative evolution in modern popular music.

10:30-12:00 Session S2-C: Social Media (Other)
Location: Track C
10:30
"Imagine All the People": Characterizing Social Music Sharing on Reddit

ABSTRACT. In this paper, we develop a new methodology for understanding the large-scale social contexts of online music sharing, and apply it to analyze all instances of music sharing on Reddit.

10:45
Does Platform Migration Compromise Content Moderation? Evidence from r/The_Donald and r/Incels

ABSTRACT. Two popular communities on Reddit, r/The_Donald and r/Incels, were faced with sanctions from the platform and created their own standalone websites. To assess whether community-level moderation measures were effective in reducing the negative impact of these communities, we study how they progressed following their platform migrations.

11:00
Measuring Cultural Distance between Countries using Facebook Data: The case of food as a cultural marker

ABSTRACT. Focusing on food as a marker of culture, we characterize countries by using Facebook users' interests in popular dishes. We provide measures of cultural similarity across countries, compare our estimates with survey data, and discuss the potentially heterogeneous role of migration in shaping cultural proximity across countries.

11:15
#JusticeForGeorgeFloyd: How Instagram Facilitated the George Floyd Protests

ABSTRACT. We present and analyze a database of 1.13 million Instagram posts during the George Floyd Protests. We show geographic epicenters using social network analysis. After culling top photos from 1.69 million photos, we show how an individual's death coalesced into a nation-wide movement around one of America's greatest social issues.

10:30-12:00 Session S2-D: Causal Inference and Applications
Location: Track D
10:30
Bridging Nations: Quantifying the Influence of Multilinguals in the European Twitter Network

ABSTRACT. Multilinguals are likely an essential part of information diffusion on social media. We quantify the structural role and communication influence of multilinguals using causal inference techniques in two studies of Twitter users. Multilinguals play a larger role in information diffusion than their monolingual peers, but effects vary substantially across countries.

10:45
Discovering Heterogeneity of Treatment Effects in Conjoint Experiments: Conjoint Finite Mixture Model

ABSTRACT. The existence of subgroups with extreme preferences and their prevalence in the population can seriously bias the results of a conjoint experiment. We propose a new model—a conjoint finite mixture model—that addresses the challenge of subgroups with heterogeneous treatment effects and allows to easily investigate membership in unobserved classes.

11:00
Towards More Reproducible and Meaningful Computational Social Science

ABSTRACT. We demonstrate key methodological issues faced in computational social science with recent work on the “moral contagion” effect, showing that it performs no better than an implausible “XYZ contagion,” and discuss how many of the currently proposed methodological fixes are insufficient for reproducible and meaningful research.

11:15
Estimating Community Feedback Effect on Topic Choice in Social Media

ABSTRACT. Using an interpretable semistructured model and rich temporal data from Twitter and Reddit, we measure how the amount of community feedback (comments, retweets, etc.) to social media user's posts influences their decision to continue the topic of their previous post, while controlling for the confounding effect of external events.

10:30-12:00 Session S2-E: Statistical Methods and Applications
Location: Track E
10:30
Human Capital and the Upward Occupational Mobility of Chinese Rural Migrant Workers

ABSTRACT. The study adopts mobility tables, Mlogit regression and Logit regression to examine the associations between four human capital factors, including formal education, professional training, certificate and foreign language proficiency, and the upward occupational mobility of Chinese rural migrant workers conditioning on first occupational attainment.

10:45
Technological Complementarity:the Division of Know-how and the Future of Work

ABSTRACT. By applying bundles of technologies, workers put the embedded knowledge of a diversity of human experts to use. Here, we examine the relationship between such bundles of technologies used by human labor. To do so, we measure complementarity between technologies which can explain part of wages unexplained by other factors.

11:00
The Choice to Discriminate: How Source of Income Discrimination Constrains Opportunity for Housing Choice Voucher Holders

ABSTRACT. We the contexts in which landlords implement a strategy of discrimination against voucher holders in a set of 1,107,110 web-scraped rental listings from Craigslist to better understand the locational outcomes and opportunities afforded to voucher holders.

11:15
Implications of Underspecification on Neural Network Interpretability

ABSTRACT. Many techniques aim to explain or interpret artificial neural networks (ANNs), but recent work has highlighted that, in practice, ANNs are highly underspecified — we show that the inherent variability this causes casts doubt on the robustness and meaning of their explanations and interpretations.

14:30-16:30 Session K4: Keynotes
14:30
How the data generating process shapes dynamic networks

ABSTRACT. Over the past few years, an exciting trend in network science has been higher-order network analysis. The idea of higher-order analyses is to go beyond graphs by focusing on interactions involving more than two individuals. In my talk, I provide a perspective on this development drawing on ideas from community detection. The main novelty brought forth by this discussion (I think) is a heretofore overlooked aspect of temporal network analysis, namely the importance of the constraints imposed by the network generating process. These constraints point towards a unifying perspective to organize higher order network models. More specifically, I discuss how the structure of communication events strongly constrain network configurations and shape network flows in a way that profoundly impacts key areas of network analysis from community detection and null models, to our approach to modeling dynamics.

15:10
Representation learning for computational imagination

ABSTRACT. Computational social science leverages new data sources and computational methods to expand the "sociological imagination", connecting individual milieus to the wider sociological conditions. Beyond extending the existing approaches, can new computational tools also help expanding the scope of our imagination? In this talk, I will talk about how simple representation learning techniques can allow us to think about the text and network data in new ways, particularly using an analogy with physical space.

15:50
Combining Data from Multiple Sources: Examples from Economic and Public Health Research Studies

ABSTRACT. Combining data from different sources will be key for social scientists to take full advantage of the data deluge resulting from the increasing digitalization of society. Currently we see many attempts at using single (big data sources) with mixed results, the most exciting projects rely on a combination of different data, some still collected with traditional modes. This talk will highlight a few approaches and provide a framework with which researchers can think about creating new data products. An important element in this endeavor is however the respect of people’s privacy. While different cultures have different norms about the collection of specific types of data for specific purposes, the notion of contextual integrity still holds. Learning how to design data collections for new insights in a more holistic way, will be the overarching theme of this talk. 

In the talk I will be using several Economic and Public Health Research examples, in particular the IAB-SMART research project to discuss privacy issues and the approaches to create high quality combined data sources. In brief:  The IAB-SMART study combines data from administrative records, surveys, and digital traces from smart phones. The digital trace data are collected via an app. The purpose of the IAB- SMART study is to measure the effects of long-term unemployment on social integration and social activity, as well as the inhibiting effects of reduced social networks and activities in finding reentry into the labor market. To create measures of social integration access to the phone's address book and usage is required, as well as sensory data from accelerometer and geoposition. For valid population estimates statisticians need to account for potential coverage bias and bias due to nonresponse and measurement error. Using the case study I will demonstrate how we approached these problems. The 2nd example from the Global CTIS survey, a partnership between Facebook and academic institutions to create a global COVID-19 symptom survey. The survey is available in 56 languages. A representative sample of Facebook users is invited on a daily basis to report on symptoms, social distancing behavior, mental health issues, and financial constraints. Facebook provides weights to reduce nonresponse and coverage bias. Privacy protection and disclosure avoidance mechanisms are implemented by both partners to meet global policy and industry requirements. Country and region-level statistics are published daily via dashboards, and microdata are available for researchers via data use agreements. Over 1 million responses are collected weekly. We will discuss problems such partnerships face, skills needed for such large survey data collections, and the need to use social circles as an alternative data source when asking about sensitive questions like vaccine uptake.

17:00-18:30 Session P3: Poster Session
Location: gather.town
Embedding Heterogeneous Hierarchical Structures

ABSTRACT. This work presents an embedding approach when the data lies in multiple hierarchies.

Spatial & temporal disparities in air pollution exposure at Italian public schools

ABSTRACT. Spatial and geographical disparities in air pollution exist between schools. I geocoded addresses of 22 thousand schools in Italy and connected them with estimates on PM 2.5. Air pollution is highly heterogenous but shows to improve overtime. High SES schools are more polluted but are improving their air quality faster.

Media coverage of inequality and influence on US readers’ views about redistribution

ABSTRACT. We collected 61k news articles about inequality and redistribution in the USA (1983-2019). We performed a crowdsourcing approach to label 7483 of them about their stance on the topic and their persuasiveness. We are now training classifiers to label the rest of the articles. An experimental design is discussed.

Emergence of inequality in a complementary currency system

ABSTRACT. Complementary currency is a medium of exchange independent of the national currencies, which is based on the agreement of the users and is usually implemented to serve social goals. Circles [1,2] network is a worldwide exchange platform based on community and complementary currency systems. Launched in October 2020, Circles is based on the issuance of personal cryptocurrencies in the network of trust created and declared by the users. The official goal of the developers is to offer a Universal Basic Income supported by the trust of the social relationships that really exist in the community. In this work, we report about the analysis the flow in the transaction network. We have discovered that in spite of the egalitarian motivation of Circles, inequality emerges in the whole currency system and we describe this phenomenon as well as its origin. Furthermore, we analyse the backbone of the economic network by detecting and studying the maximum cycle basis. In order to do so we introduce a set of monitoring tools which can be used to better understand and design complementary and community currency systems.

When do Social Learners Affect Collective Performance Negatively? The Predictions of a Dynamical-System Model

ABSTRACT. We propose a dynamical-system model for collective decision-making considering interactions of social and cognitive factors. The model predicts a critical threshold for the proportion of social learners, above which a bi-stable state appears, and the majority can favor either the higher- or lower-merit option.

A little Knowledge is a Dangerous Thing: Excess Confidence Explains Negative Attitudes Towards Science

ABSTRACT. We propose a new testable and theoretical model, to understand how knowledge and confidence play a role in determining public attitudes towards science and discuss how this can inform science communication.

Representing Repositories in the Open-Source Ecosystem

ABSTRACT. In order to accelerate the progress of representation learning in the open-source software ecosystem (a complex and social domain), we present: a new dataset, a set of representation baselines, and benchmark suite for Github repositories.

Counting Cars from Space to Monitor Internal Displacement

ABSTRACT. We propose an AI-based car detection system that analyzes longitudinal satellite imagery to automatically capture changes in the active vehicle count, to obtain information about internal displacement.

Differentially Private Propensity Scores for Bias Correction

ABSTRACT. https://drive.google.com/file/d/1vVGti5_Ew6mPjF4chnBJUGFNZZv1XaQw/view?usp=sharing

Improving vehicles' emissions reduction policies by targeting gross polluters

ABSTRACT. Our work uses vehicles' GPS trajectories and a microscopic model to estimate the emissions of four air pollutants produced by private vehicles moving in different cities; we study their distributions across vehicles and roads, and use the framework to simulate emission reduction scenarios.

Measuring Anonymity in Complex Networks

ABSTRACT. In order to comply with regulations regarding privacy and data protection, in sensitive social network data it should not be possible to reidentify indivuals based on their position in the network. This work presents a new method for measuring this so-called disclosure probability of individuals in social networks.

Measuring Mobile Media Use

ABSTRACT. Given the mediality and pervasiveness of mobile media, I propose that measuring the time people spend with them is not enough, but that it is necessary to consider mobile media use dimensions that are directly tied to the technological capabilities of such devices.

Cultural Differences in Features of Popular Music in Taiwanese, Japanese, and American Markets

ABSTRACT. Music preferences are often embedded in the cultural traditions of the listener. Using Spotify features from 1810210 songs from US, Taiwanese or Japanese cultural markets, we trained machine learning models to classify their cultural membership. Overall accuracy was high, and model interpretation showed strong cultural variations in music features.

Unique in what sense? Heterogeneous relationships between uniqueness and popularity in music

ABSTRACT. We examine how different dimensions of a song’s uniqueness contribute to its popularity. Combining computational text analysis with statistical modeling, we found that (1) lyrics uniqueness has the strongest association with popularity than acoustics and chord-progressions uniqueness; (2) the relationship between uniqueness and popularity is mediated by the lyrics’ repetitiveness.

YouTube Mukbang Videos Assimilate Eastern and Western Cultures Over Time

ABSTRACT. Since 2014, Mukbang videos have grown in virality on YouTube yet their content is poorly understood. We collected over 11,000 Mukbang YouTube video titles. Our analysis of the channel-food bipartite network shows that Mukbang videos have connected regional cuisines to global audiences at the cost of promoting unhealthy eating habits.

Predicting the New from the Old: Transfer Learning for Rising Chinese Artists from the Established Global Artists

ABSTRACT. Using the Transfer Learning model, we investigate the predictability of the rising local artists’ market value from the established global artists. By transferring the values from global contemporary artists to Chinese artists with limited information, we increased the predictability and showed the comparative advantage of transfer learning with data scarcity.

We Are What We Listen To: How our Moral and Human values reflect on our Music Genres Preferences

ABSTRACT. https://arxiv.org/pdf/2107.00349.pdf

Is Rap still Tough? The Dynamics of Masculinities Expressions in American Rap.

ABSTRACT. Current research hypothesises changes in masculinity expressions in rap. Using rap lyrics and topic modeling it makes an attempt to trace the dynamics of masculinity expressions from 1984 to 2018. Findings show that the way of presenting rappers as men have not changed significantly over the course of its history.

Quantifying Creativity and Innovation of Movies via AI

ABSTRACT. The innovativeness and financial success of creative products – movies, books, computer games, or art – are notoriously hard to predict, leading to misallocated investments that drain resources from more productive activities. Harry Potter was turned down by over 20 publishers, no major movie studio wanted to invest in Star Wars but did invest in Waterworld, a gigantic flop. In this work, we investigate whether the ability of deep learning to identify hidden patterns of success can provide new insights into this age-old creativity problem. Using a unique set of empirical materials, we have analyzed a dataset of pre-production 10,000 movie scripts and their 20,000 detailed reviews made by domain experts. Our goal is to predict the reviews given by the experts. First, we construct the story related features such as number of roles, scenes, and genres among others. Second, we use semantic embeddings of the scripts. This helps in identifying the semantic similarity of the scripts with the existing ones. Third, we consider the interactions between the roles in the script. This provides a unique way of assessing the creativity of the script. Our AI model can predict the quantified reviews made by the experts from these features 25% better than competitive baselines. This study promises in understanding the determinants of movie success in a better way and thus having tools for efficient movie production process.

Auto-Complete: Hidden Recommendation Engine

ABSTRACT. We explore how search engine recommendations affect users’ search behavior and motivate researchers to account for these effects when analyzing search behavior. We explore anonymized search logs from one of the largest search engine platforms to provide descriptive statistics, and demonstrative examples to show the influence of these recommendations.

Exploring Topical Filter Bubble: An Audit Study of YouTube Recommendations

ABSTRACT. Recommendation algorithms for customization generates concerns of “filter bubbles”. This audit study aims to examine the topical filter bubbles on YouTube. Employing Markov chains, I found YouTube’s recommendations are easy to drag news audiences to entertainment and keep entertainment audiences out of news exposure, which is detrimental to the democracy.

Web Search Behaviour and Individual Demographics: Observational Evidence from Switzerland and Germany

ABSTRACT. https://arxiv.org/abs/2105.04961

The strength of selection and drift in the evolution of online communities

ABSTRACT. Applying the Price Equation to the organizational evolution of multiple traits relating to online communities’ varied approaches to self-governance, we find strong negative selection over administrative rules and weak positive selection over information rules.

Information Foraging and the Attention Economy

ABSTRACT. Modern media competes for human attention in what has been termed the attention economy. Humans direct their attention in order to gather information in a similar way to how animals forage for food, and food foraging models have been used to describe information foraging in humans[1]. Given this human behavioural model, we ask how media must adapt to compete for our limited attention in a world with increasingly powerful communication systems and high information prevalence. We assume that media competes by providing higher information utility rates, which we can partly measure with the proxy of word entropy. Our model describes three distinct phenomena. 1) Word entropy has risen across all media since 1900. 2) Word entropy of short-form media is higher than that of long-form media across varied corpora. 3) Short-form media such as social media is more viable as information prevalence rises. Our empirical results of linguistic changes over time and across media categories are an important contribution. Our model also offers a hypothesis to describe these changes, and gives predictions.

Measuring Alignment of Online Grassroots Political Communities with Political Campaigns

ABSTRACT. This study applies a behavioural community embedding to Reddit during the 2019-2020 Democratic Presidential primaries, and analyzes candidate-specific communities along eight different cultural dimensions. We find a high degree of alignment between candidate-specific communities on Reddit and the views of candidates' supporters, but little alignment with candidates' policy positions.

Winners, Losers, and Future Achievement

ABSTRACT. Past achievement usually predicts future achievements. Here, we systematically test this principle. We observe that if there existed a milestone, the athletes who had just missed the milestone ended up outperforming the athletes just above it in future. We observe this phenomena of reversal both in sports and in science.

Measuring academic reputation: peer review versus bibliometrics

ABSTRACT. As our research is focused on the use of bibliometric indicators to identify the disciplinary elite we compared the results between traditional indicators acquired from the national bibliometric database (RISC) with a conducted survey of active sociologists in Russian and observed a convergence between those two approaches.