Ebook: Computational Social Science and Complex Systems
For many years, the development of large-scale quantitative social science was hindered by a lack of data. Traditional methods of data collection like surveys were very useful, but were limited. The situation has of course changed with the development of computing and information communication technology, and we now live in a world of data deluge, where the question has become how to extract important information from the plethora of data that can be accessed. Big Data has made it possible to study societal questions which were once impossible to deal with, but new tools and new multidisciplinary approaches are required. Physicists, together with economists, sociologists, computer scientists, etc. have played an important role in their development.
This book presents the 9 lectures delivered at the CCIII Summer Course Computational Social Science and Complex Systems, held as part of the International School of Physics Enrico Fermi in Varenna, Italy, from 16-21 July 2018. The course had the aim of presenting some of the recent developments in the interdisciplinary fields of computational social science and econophysics to PhD students and young researchers, with lectures focused on recent problems investigated in computational social science.
Addressing some of the basic questions and many of the subtleties of the emerging field of computational social science, the book will be of interest to students, researchers and advanced research professionals alike.
This volume contains the lectures presented at the CCIII Course “Computational Social Science and Complex Systems” of the “Enrico Fermi” School held on 16–21 July 2018 in Varenna, Italy.
According to Stephen Hawking, the 21st century will be the century of complexity. Heterogeneous systems consisting of many interacting parts showing emergent phenomena are abundant and, indeed, at the turn of the millennium, the time seemed to be ripe for science to address the most relevant questions about their structure and dynamics. Paradigmatic examples of complex systems are human society and economics, where, in addition to the anyway challenging task of studying such systems, the property of adaption constitutes an additional difficulty, as the observers and the observed are coinciding in some cases.
The development of large-scale quantitative social science has been hindered for a long time by the lack of data. Traditional methods of data collection, like surveys, have in fact been very useful, however, they have severe limitations. The situation has radically changed recently due to the dramatic development of computing and information communication technology. We live now in a world of data deluge, where the question is not only how to obtain the data but rather how to sort out the important information from the plethora of data that can be accessed. Almost all our everyday activities leave digital fingerprints behind, enabling researchers to investigate interactions of people with high spatio-temporal resolution. Time-stamped mobile call records with position specification by antenna towers are just one example of such data sources. Another one from the field of finance is the detailed information about the limit order book of financial markets.
This Big Data has made it possible to study questions, which had been earlier impossible to deal with. From the large-scale structure of society to the temporal patterns of communication, from multi-scale dynamics of the stock market to the data-based categorization of investment strategies, from social contagion to disease spreading, many aspects of social and economic systems have become accessible. It is important to underline that many of these problems have immediate relation to applications.
The new situation has required new tools and new approaches. It has turned out that it is not possible to cope with the challenges induced by Big Data within one discipline like sociology or economics: There is a need for a multidisciplinary effort. New fields like econophysics, sociophysics, computational social science, and network science have emerged providing examples of co-operation occurring overcoming disciplinary borders. Physicists —besides economists, sociologists, computer scientists, etc.— have played an important role in this endeavor from the beginning.
Physics, especially statistical physics, contributes to the study of social and economic complex systems with an arsenal of tools, such as phase transitions and scaling, mean-field theory, random walks, correlation functions, pattern formation, non-linear dynamic systems, etc. However, the role of physics goes much beyond discovering possible analogies to physical systems and applying tools developed for investigating them. Perhaps even more important is the approach of physicists to tackle problems, which goes back to Galilei and Newton and which is widely acknowledged as the “scientific method”. It consists of a perpetual interplay between empirical observation, modeling, and theory, between induction and deduction, where empirical facts are always the decisive elements for the development or falsification of a theory.
The Summer Course of the International School of Physics “Enrico Fermi” on “Computational Social Science and Complex Systems” had the aim of presenting to PhD students and young researchers some of the recent developments in the interdisciplinary fields of computational social science and econophysics.
In the collected lectures of the School, a group of lectures have focused on recent problems investigated in computational social science. Specifically, Stefan Thurner discusses the study of virtual social systems. He shows that several sociological classics including the formation of social networks, the setting of gender differences, the growth of wealth inequality, etc. can be successfully investigated on the historical records of a society of computer game players. Markus Strohmaier considers the problems and potential of measuring social and political phenomena on the web. László Barabási and Federico Musciotto summarize Barabási’s lecture introducing the recently emerging “Science of success”. Barabási’s approach primarily focuses on the differences between the concepts of performance and success in modern society. Some recent results of the research area of market microstructure are discussed in Fabrizio Lillo’s lectures, where he discusses the interplay between order flow, intention of trading agents, and price dynamics of the traded asset. Both in computational social science and in the analysis and modeling of complex systems a crucial role is played by network science. Salvatore Miccichè and Rosario Mantegna present a primer on so-called “statistically validated networks”. These are networks where a selection of nodes and links is obtained under the condition that a specific statistical null hypothesis is rejected. With this methodology, one can highlight groups of nodes and links that are over-expressed (or under-expressed) with respect to a null hypothesis usually taking into account the intrinsic heterogeneity of the systems. Another crucial aspect of networks science concerns nature and properties of temporal networks. János Kertész briefly reviews the characteristics of temporal networks with special emphasis on small motifs and on the process of spreading. Temporal networks are also discussed in Alain Barrat’s lecture with an emphasis on the role of face-to-face interaction between individuals. Specifically, he focuses on the detection of face-to-face interactions recorded with using technologies such as Bluetooth, WiFi or RFID and on their analyses in terms of minimal models at different levels of description incorporating non-trivial longitudinal structures, mesoscopic structures, and correlated activity patterns. The role of the spatial structure of the population in the classic problem of disease spreading is discussed by Vittoria Colizza in her lecture where she discusses how to introduce the concept of meta-population in the standard modeling framework for the study of epidemic spread. Finally, the properties of spatio-temporal infrastructure networks are reviewed in the contribution of Louis Shekhtman and Shlomo Havlin, where they highlight how interdependent networks are characterized by abrupt first-order-like transitions where a cascade leads to a network collapse.
The Course was the first “Enrico Fermi” summer School on the interdisciplinary topic of computational social science with an emphasis on economic and social complex systems. This book records the lectures provided at the School and should be a useful reference for researchers interested in this area. We believe that beginning graduate students, young researchers and advanced research professionals will find this book useful for approaching some basic questions and many subtleties of the emerging field of computational social science.
We are grateful to the Italian Physical Society (SIF) and to Morgan Stanley Magyarország Elemzö KFT for their financial support. We wish to thank Prof. F. Mallamace and S. De Pasquale for their encouragement during the preparation for the School. Finally, we wish to thank Barbara Alzani (SIF), Ramona Brigatti, Elena Salvadore and Monica Bonetti (SIF) for their excellent cooperation before, during, and after the School period.
János Kertész, Rosario Nunzio Mantegna and Salvatore Miccichè
Can we understand social systems quantitatively and predictively, given that we know all actions, interactions, and states of individuals? We interpret human societies as co-evolutionary systems of individuals and their interactions. Based on unique data of a society of computer game players, where all actions and interactions between all players are known, we show that this might indeed be possible. Within this framework we address a number of sociological classics, including formation of social networks, strength of relations, group formation, hierarchical organization, aggression management, gender differences, mobility, and wealth-inequality. We discover behavioral and organizational patterns of the homo sapiens and its society that were not visible with traditional methodology from the social sciences.
This short article summarizes results from previously published, co-authored articles of the presenter on the issue of measuring social and political phenomena on the web. In particular, this article briefly introduces computational social science as an emerging discipline and then proceeds to review articles on i) measuring gender inequality on Wikipedia, ii) modeling minorities in social networks and iii) measuring voting power and behavior in Liquid Democracies.
Performance and success are often used as synonyms to express individual accomplishment. However, from a scientific perspective they cover very different concepts: performance is about individual effort, while success is a collective quantity capturing the community’s acknowledgment of effort and performance. In these notes, we investigate the quantitative rules that govern both, trying to model their interdependence within the framework of complex systems. We explore different fields, ranging from online crowdfunding platforms to academia, with the idea of applying scientifically sound methods to uncover the universal laws that determine the allocation of merit in science and society.
In this paper I review some of the recent advancements in the understanding of the market microstructure of financial markets and of the role of heterogenous investors in explaining the statistical regularities observed in market data. After introducing some of the main problems in microstructure and describing the most common structure of financial markets, the Limit Order Book (LOB), I will focus on the interplay between order flow, describing the intention of the agents, and the price dynamics, describing the outcome of their interactions. I will show that the quantity embedding this subtle interaction, termed market impact, is determined by the large heterogeneity in size of the investors in the financial market. I will also show the relevance of this problem for investors, by introducing the problem of optimal execution and describing the empirical evidences on the associated cost, focusing also on the role of herding behavior. Finally, I will describe some evidences of the role of heterogeneity of time scales in determining some properties of LOB.
In this contribution we discuss some approaches of network analysis providing information about single links or single nodes with respect to a null hypothesis taking into account the heterogeneity of the system empirically observed. With this approach, a selection of nodes and links is feasible when the null hypothesis is statistically rejected. We focus our discussion on approaches using i) the so-called disparity filter and ii) statistically validated network in bipartite networks. For both methods we discuss the importance of using multiple hypothesis test correction. Specific applications of statistically validated networks are discussed. We also discuss how statistically validated networks can be used to i) pre-process large sets of data and ii) detect cores of communities that are forming the most close-knit and stable subsets of clusters of nodes present in a complex system.
Networks as scaffolds of complex systems are intrinsically dynamic: They grow and shrink, split and merge, as well as there are processes taking place on them like spreading phenomena. As long as the time scale of the change of the network is much slower than that of the processes a static network picture is adequate. When these scales get closer to each other, a different, dynamic approach is necessary. There is a class of networks, in which the connections between the nodes are only temporarily present —these are the temporal networks. Examples are communication networks, networks based on proximity or the networks of financial transactions. Here we briefly review the characteristics of such temporal networks with special emphasis on the motifs, i.e., small, typical spatio-temporal units. We also discuss the effect of time distributions of events on spreading in temporal networks.
Face-to-face interactions of humans play a crucial role in social relationships as well as in the potential transmission of infectious diseases. Here we discuss recent research efforts and advances concerning the measure, analysis and modelling of such interactions measured using decentralised infrastructures based on wearable sensors. We present the empirical data, which takes the form of temporal networks, and novel techniques aimed at describing the data and at uncovering structures in these data. We describe recent modelling efforts and studies of spreading processes on temporal networks. We finally discuss several ways of practically using the data and how the issue of incomplete data can be tackled.
The spatial structure of populations is a key element in the description and understanding of the spatiotemporal propagation of infectious diseases spread. Host population in space is often characterized by a highly fragmented environment where it is structured and localized in relatively isolated discrete patches or subpopulations connected by some degree of hosts movements. Metapopulation models provide the ideal theoretical framework to capture the separation of a host population into local communities, with strong homogeneous mixing within each community and weaker interactions between communities corresponding to the underlying substrate of commuting patterns, mobility networks and/or transportation infrastructures. This paradigm can be applied to model the spatiotemporal propagation of epidemics in structured populations at different scales, by considering for example families, city locations, hospital wards, farms, urban areas or regions as local communities connected by hosts’ mobility processes. Here we present the computational approach to the modeling of epidemic processes in spatially structured systems. We introduce metapopulation models as the standard modeling framework for the study of epidemic spread among localized communities of hosts. Taking into account the coupling provided by the interactions among localized populations, different modeling approaches are described, including mechanistic (i.e. microscopic) simulations and effective approaches, and the possible presence of memory effects. Topics like invasion dynamics and local vs. global containment of an emerging epidemic will be addressed, and the theoretical results will be put in relation with the design of possible intervention policies for epidemic control. These notes represent a theoretical introduction for the development of data-driven realistic metapopulation models for application in public health.
Many infrastructure systems can be modeled as networks, where a set of nodes is connected via some edges. Such a formulation allows us to consider the resilience properties of the network including its ability to maintain connectivity under some level of failures. Here we present a review on prior and recent results about such resilience properties We consider the general connectivity patterns of basic networks and review applications to specific systems like traffic, climate, and physiology. We also present results on the possibility of repairing a network after some failures. Next, we present results on interdependent networks, where one network depends on another for some resource. A typical example is a communications network depending on a power grid (and vice versa). Interdependent networks are often characterized by abrupt transitions where a cascade leads the network to collapse suddenly. We review possible methods of preventing these cascades such as reducing the level of interdependence, reinforcing some nodes, and other methods. Finally, we show how spatially embedded networks have unique properties such as extreme vulnerability in interdependent networks, metastable properties under localized attacks, and cascades under overload failures. Overall the results here provide possible methods and understanding of how to improve the resilience of modern critical infrastructure.