For many of us in anthropology, the advent of “big data'' represents a threat. Why, after all, spend months developing rapport and interviewing 100 people when you can run sentiment analyses on 40 million tweets in a matter of hours? Still, I agree with Tricia Wang, who urges us to engage big data and complement that work with our own “thick data.” In “thick data,” the depths of our insights into meaning and interpretation, “the native’s point of view,” could act as a corrective to billions of data points that may “speak for themselves,” as Chris Anderson claimed, but not, perhaps, for people. Ironically, this move to “thick data'' was enabled by the gradual choking off of data access to social media APIs. Facebook, Instagram, Twitter - one by one social media platforms began limiting third-party access to their data, under the cover of protecting users from infringements on their privacy. Well, not all third-party access. Corporations and select researchers still manage to maintain access to the “firehose” of user data in social media, while the rest of us have to make do with whatever limited sets of data we can access. For some platforms, (e.g., Facebook), access has ceased altogether. You can still gain access to much of this proprietary data through scraping, but that’s not an ethical research practice for anthropology. So, I’ve worked towards my “thick data,” using the limited data I can download from platforms like Twitter to broaden the “deep” data I’ve been getting from more traditional, ethnographic methods.
This has proven useful for community-based ethnographic work, and I've applied it to studies of neighborhoods in Baltimore, in Seoul, and elsewhere, resulting in articles and a co-authored monograph (“Networked Anthropology”) explaining the advantages of this mixed-methods approach to community-based, participatory research strategies. I’ve also worked on multiple grants with the National Park Service using the same approach. There, the park itself is the focus of social media investigation, with the ultimate goal being the identification of community stakeholders and their connections to the park.
However: in early 2021, after the introduction of a new API interface, Twitter allowed academics to apply for an academic track with access to 10 million tweets per month. While this is not full access, it certainly moves my possibilities more into the realm of big data. And this raises all sorts of new problems and possibilities. While my work has utilized some basic metrics (centrality measures, word frequencies, descriptive statistics), the scale of data I now have access to requires a different set of empirical tests and, perhaps, a different class of questions. Ultimately, I wonder if it is possible to even ask similar kinds of questions of these data. Can they tell me, for example, about the meaning of place? About the ways people interpret their worlds? The challenge for me is to bridge “thick” and “big” data.
But the big challenge (and opportunity) here is to anthropology. While no stranger to quantitative methods, we still generally do not work with larger data sets. These have been inimical to the “small societies” approach that characterized anthropology in the early twentieth century. So what will anthropology become in this environment?