Big data is a buzzword for something that many academics as well as students may not be able to grasp. Why? Because big data are not only defined by their volume but also by the variety, velocity, and value they provide. Moreover, big data require analytical skills that are neither integrated in the usual undergraduate Social Science curriculum nor skills that (many) Social Science academics naturally acquire. Let’s rewind.
The attempt of a definition: When thinking about big data people commonly think of volume, i.e., a really large number of pieces of information, millions or billions of observations. Fewer may think of the variety in which big data can be found: Numerical values, such as coordinates, but also textual information, such as speeches or posts on social media, as well as audio, imaging, and video files. They [big data] are also produced by obvious and not-so-obvious (political) actors. Individuals – think about former United States President, Donald Trump, and his tweets. Organizations – such as repositories of speeches in the United Nations. Administrations – think of the large number of records stored by administrative entities, such as Covid infections or vaccination records collected by the United Kingdom’s National Health Service. Some may even contain unique data, such as biomarkers. Big data are further produced in an ever-changing environment, referring to the velocity of the nature of these data. What has been a big data set this minute, may be an even bigger data set in the next one as further information is added rapidly.
Social Scientists have long discovered the value of information contained in big data sets. As such, big data represent an extraordinary resource for research purposes. For example, tracking and analyzing individual politician’s communication on policy issues and their involvement in specific networks covering these issues; understanding how and why citizens take part in politics by tracing respective profiles and how they like, share, or produce political content; analyzing geographical tracking data to see how specific actors or forces in a country are moving and to potentially predict a future move.
The need to understand big-data-opportunities and -challenges for Social Science must be recognized. While big data applications are increasingly popular for research, academics should not forget that there is also need to address big data problems in teaching, as students are increasingly interested in collecting and analyzing big data (see Conversation overheard on the corridor).
The challenge is that big data analysis moves beyond the traditional understanding of using empirical data (qualitative, quantitative, or mixed in nature). First, big data need systematic collection, which requires advanced technical knowledge, as the sheer volume will not allow manual collection.
Second, once collected, big data structures can be complex and messy. Rarely, will we be so lucky and have straight forward and well-organized data. As such, they [big data] require extensive data cleaning and management to be analyzed effectively. Think about the many ways in which the date and time of the day could be captured and how difficult it gets if the format for either, both, or just a component of either, changes for millions or billions of data points...
Third, once cleaned, we need to be able to employ analytical strategies that allow us to detect patterns in our big data, demanding expertise in programming algorithms, essentially artificial intelligence, and understanding how these are actually able to help us understand the data in light of our research questions.
The realization that advanced computing skills are required to be able to bring big data into research, and the need to also bring them [big data] into the classroom can be daunting. While secondary data repositories such as Kaggle or the Harvard Dataverse allow to access big data sets, which potentially takes off the pressure from collecting big data, advance skills in data management and analysis, including machine learning practices that involve training algorithms, remain essential.
Let’s have this insight sink in slowly for a few seconds. Didn’t most of us Social Science academics want to analyze Social Science phenomena rather than become Data scientists? Isn’t this the reason why many of our students decide to take a Social science degree? But, can we [Social Scientist] really allow ourselves to miss out on studying and teaching the social and political world described by big data?
The strategies to account for big data in research and potentially teaching vary on the type of Social Scientist.
(1) Some may talk about big data rather than walking the big data walk: Acknowledging the existence of big data and that they have an impact on how we view the social and political worlds is certainly a good start. Theorizing the challenges and opportunities that come with big data is valuable. However, it may seem like a dead-end given that the big data talker may have little understanding of the technical and analytical challenges.
(2) Others go all-in on data science. An increasing number of universities offer advanced degrees in data science (and statistics) to enable Social Scientists to big data analytics. It may take a certain type of academic to embrace this opportunity, however. Not all Social Scientists may wish, nor should they be made to, walk down the path of becoming a Data Scientist. After all, traditional empirical data remain important, valid, and valuable, too.
(3) Yet again, others become Social Science chameleons finding some kind of a middle ground by moving beyond the surface of talking about big data but withdrawing from the ambition of becoming Data Scientists. A basic understanding of programming, fundamental insights to training algorithms and making use of artificial intelligence could be a fruitful middle way to reconcile the other views. To enable collaboration between Social Science and Data Science both sides need to be able to speak the other discipline’s language. Social Scientist with a better understanding of the technical challenges underlying the desired data collection, maybe better able to voice to the Data Scientist what data they should collect. Similarly, it will be beneficial us Social Scientists to be able to understand and evaluate the Data Scientists strategy in identifying patterns in the big data and finding the best fit for them [the big data].
Wherever scholars stand remains up to the individual researcher. It is important that (1) different individuals exist in Social Science departments; (2) they come together as an academic collective within the same organizational unit to talk about new data and methods; and (3) that these conversations result in developing hands-on strategies to conquer the digital world for their research. Only an informed collective of academics will be able to meet student demands when addressing big data problems and applications in politics and international relations.
* All views voiced in this blog represent the views of the author. This blog has been written after the author completed training in webscraping using python and machine learning taking steps towards becoming a more informed social science chameleon. The training was made possible by the British Academy’s Talent Development Award (TDA21210018).