Showing posts with label Big Data. Show all posts
Showing posts with label Big Data. Show all posts

Wednesday, May 26, 2021

Anthropology and the Twitter Challenge

For many of us in anthropology, the advent of “big data'' represents a threat.  Why, after all, spend months developing rapport and interviewing 100 people when you can run sentiment analyses on 40 million tweets in a matter of hours?  Still, I agree with Tricia Wang, who urges us to engage big data and complement that work with our own “thick data.”  In “thick data,” the depths of our insights into meaning and interpretation, “the native’s point of view,” could act as a corrective to billions of data points that may “speak for themselves,” as Chris Anderson claimed, but not, perhaps, for people.  Ironically, this move to “thick data'' was enabled by the gradual choking off of data access to social media APIs.  Facebook, Instagram, Twitter - one by one social media platforms began limiting third-party access to their data, under the cover of protecting users from infringements on their privacy.  Well, not all third-party access.  Corporations and select researchers still manage to maintain access to the “firehose” of user data in social media, while the rest of us have to make do with whatever limited sets of data we can access.  For some platforms, (e.g., Facebook), access has ceased altogether.  You can still gain access to much of this proprietary data through scraping, but that’s not an ethical research practice for anthropology.  So, I’ve worked towards my “thick data,” using the limited data I can download from platforms like Twitter to broaden the “deep” data I’ve been getting from more traditional, ethnographic methods. 

This has proven useful for community-based ethnographic work, and I've applied it to studies of neighborhoods in Baltimore, in Seoul, and elsewhere, resulting in articles and a co-authored monograph (“Networked Anthropology”) explaining the advantages of this mixed-methods approach to community-based, participatory research strategies.  I’ve also worked on multiple grants with the National Park Service using the same approach.   There, the park itself is the focus of social media investigation, with the ultimate goal being the identification of community stakeholders and their connections to the park.  

However: in early 2021, after the introduction of a new API interface, Twitter allowed academics to apply for an academic track with access to 10 million tweets per month.  While this is not full access, it certainly moves my possibilities more into the realm of big data.  And this raises all sorts of new problems and possibilities.  While my work has utilized some basic metrics (centrality measures, word frequencies, descriptive statistics), the scale of data I now have access to requires a different set of empirical tests and, perhaps, a different class of questions.  Ultimately, I wonder if it is possible to even ask similar kinds of questions of these data.  Can they tell me, for example, about the meaning of place?  About the ways people interpret their worlds?  The challenge for me is to bridge “thick” and “big” data.

But the big challenge (and opportunity) here is to anthropology.  While no stranger to quantitative methods, we still generally do not work with larger data sets.  These have been inimical to the “small societies” approach that characterized anthropology in the early twentieth century.  So what will anthropology become in this environment?  

Wednesday, January 18, 2017

The Partial Truths of Big Data


Last July I was using R to do some social network analysis of Instagram tags.  After lots of package downloads, App Developer’s applications, etc., I couldn’t get it to work, only to discover that Instagram had changed its policy the months before.  Like many social media platforms, Instagram had restricted access to data through its API (Application Programming Interface).  For some, this could be welcome news—after all, third party developers having untrammeled access weakens privacy and serves to expose more and more of our lives to commodification.

But this isn’t the whole story.  Just because I (a researcher at a mid-tier state university) was having trouble gaining access doesn’t mean that large corporations were having trouble, or the National Security Agency, or Instagram itself.  Rather, what we’ve seen with the rise of Big Data as a research object is the progressive commodification of social media.  The social network analysis that began as a recondite branch of anthropology, sociology and mathematics has become an indispensable tool in business development.  Social media data are money, and the tightening of restrictions represents another digital divide, this one between corporations and governments that can gain access to the “firehose” of complete data, while the rest of us work with a fraction of that under whatever restrictions are placed upon data access through APIs.  In this latest chapter of the digital divide, some people (and entities) get Big Data, and some of us get “partial” data.

This has prompted some scholars to question the involvement of academics in Big Data analysis in the first place: “How much of a difference does it make for academics to gain access to Big Data, after all, when the logics of commercial enclosure of social media data may [have] already begun to run deep?” (Chan 2015: 1080).  It certainly doesn’t look good for cultural anthropologists—our “n” in a research study rarely exceeds one hundred.  Compare that to the 2016 update of a 2011 study from Facebook that looks to social distance and weak ties among its 1.5 billion users, concluding the geodesic distance between anyone on the planet is about 3.57 “degrees of separation” (Bhagat et al 2016).  It would be hard for anthropology to compare their work to this.  And yet, as Tricia Wang (2013) has reminded ethnographers, we have little choice but to work with the Big Data science around us: “Otherwise our work will be all too easily shoved into another department, minimized as a small line item in a budget, and relegated to a small data corner” (Wang 2013).  One strategy here is to point out the obvious.  “Big Data” (however construed) does not interpret itself; it needs context, theory, narrative—in other words, the work of anthropology.  In their often cited 2012 paper, dana Boyd and Kate Crawford urge researchers to critically engage the emergent hegemony of Big Data by pointing to the limits of the data these social media platforms aggregate.  “Do numbers speak for themselves?  We believe the answer is ‘no’” (boyd and Crawford 2012: 666).

But this means more than stressing the importance of history and political economy to the quanta of data we emit.  We need to ask more subversive questions.  What kinds of numbers are generated in the space of social media?  What, for example, does Facebook know about me?  On the one hand, it undoubtedly knows a great deal.  Not only am I updating Facebook with personal information (photos from family trips, political opinions), but I’m also “liking” groups, causes, music, etc. on Facebook and, furthermore, Facebook harvests cookies from my non-Facebook internet perambulations in order to “better serve me” advertising targeted to my demographic and political leanings.

But none of this, I would suggest, is really “anthropological” data—instead, it’s consumer data, information about what I buy, and what I might be tempted to buy.   It’s tempting to leap from this to insights into culture, society and social action, but that’s not really what Facebook is collecting.   The numbers are numbers about consumers—users who click on links, who link to each other, who can be profiled in order to sell more.  When we do other things on Facebook: “like” a group or respond to efforts to organize for a cause, we do so through a consumption frame.  Not surprisingly, this has led to several critiques of slacktivism: it looks like consumption without a credit card number.  In any case, Facebook data is not, as Boellsstorff put it, “raw data.”  Instead—it’s already been thoroughly “cooked”, data as emanating from an individual consumer (Boellstorff 2013).

As far as Facebook is concerned, though, this is all that’s important.  Facebook thinks it knows the whole truth, and, from the perspective of an enormous, monopolistic corporation, it knows all it needs (or cares) to know about my identity, habits and social relations.  And yet, it does not.  The emergent, the collective, the alternative, the subaltern, becoming-animal, the multitude—Facebook will never start the revolution, because Facebook can only know our social lives through the reified perspective of commodification.  Of course, activists have utilized Facebook (and other social media) for their work, but they do this in spite of the platforms themselves, media frames that will gamely struggle to track shopping and supply advertising to even the most ardent revolutionary’s account.  Big Data, then, is always “partial” data.

In other words, Facebook (and other social media) disclose “partial truths.”  I deploy this term from Clifford’s often-cited (and often excoriated) introductory essay to “Writing Culture,” a collection of essays that is widely credited with issuing in anthropology’s “postmodern” age.  There, Clifford (1986: 10) focuses attention on the ways ethnographic accounts “construct” culture and, in particular, the ways these genre conventions both enable and delimit anthropological truth:
"'Cultures' do not hold still for their portraits.  Attempts to make them do so always involve simplification and exclusion, selection of a temporal focus, the construction of a particular self-other relationship, and the imposition of a power relationship."
In focusing on the constructedness of the ethnographic encounter, Clifford led a generation of anthropologists to experiment with the ethnographic form and to reflect on their dyadic, field encounters.  But by directing our attention to the dyadic encounter, he deflects our attention from other contexts, among them political economy, social activism, postcolonial struggle and the work of the different communities in which anthropologists site their work.  As many critics have since concluded, anthropology is only in the last (and reified) instance, the ethnographic representation of a dyadic encounter.

There is, nevertheless, truth in Clifford, but it is a truth that serves to conceal other truths.  As Taussig writes of magic in general, “The real skill of the practitioner lies not in skilled concealment but in the skilled revelation of skilled concealment” (Taussig 2003:273).  A momentary glimpse into one secret serves to conceal another; for anthropology, the truth of ethnography served to conceal the onslaught of neo-liberalism.  This is where we can re-define Clifford’s titular perspective: not just a “part,” and not just biased, but a truth that obscures other truths.

With Big Data, the magic is the same.  There are truths to Big Data, but the focus upon them obscures other insights that may lead us to critical alternatives.  The same theories and methods that graph connected action and aggregate millions of data points also serve to deflect the eye from local process, or from action that unfolds over a longer timeline, or non-episodic phenomena that continue without defining “events”.    

In his 2011 book, Rob Nixon introduced the concept of “slow violence,” “a violence that occurs gradually and out of sight, a violence of delayed destruction that is dispersed across time and space, an attritional violence that is typically not viewed as violence at all” (2).  Ordinary violence—along with other temporally discrete phenomena—is particularly amenable to social media.  How many examples of police violence, for example, have been rendered visible through their felicitous recording on smartphones, the resulting videos uploaded to Facebook?  But slow violence proceeds without these—incremental tragedy impacting health, education and psychology.  Nixon concentrates his analysis on the slow violence of environmental degradation, and, particularly, on the ways that marginalized communities suffer through policies that enable corporations and governments to concentrate pollution in communities that cannot defend against it.  But slow violence can take many other forms, including processes of structural violence, de-industrialization, de-funding, under-development, infrastructure decay, pathologization.  None of these may spark social media storms, but these “slow” processes have the same, calamitous consequences in neighborhoods in both urban and rural areas.

This is where the data of anthropology and the “Big Data” available through social network analysis seem to diverge the most, but the onus is upon us to attempt to identify the lacunae and, when possible, use our methodological understandings to move in these interstices.  And it can mean using Big Data in ways contrary to the social media platforms that aggregated it in the first place—e.g., researching food deserts through Instagram (Beck 2016).  It is, however, not an easy task to take images that reflect the commodification of daily life and the drive towards the “quantified self” and appropriate them to advance social justice.  And it is here where the ethnography that seemed so beside the point suddenly becomes vital.

References

Beck, Julie (2016).  “The Instagrams of Food Deserts.”  The Atlantic [accessed on November 1, 2016 at www.theatlantic.com].

Chan, Anita (2015).  “Big data interfaces and the problem of inclusion.”  Media, Culture & Society: 1080-1086.

Bhagat, Smriti, Moira Burke, Carlos Diuk, Ismail Filiz and Sergey Edunov  (2016).  “Three and a half degrees of separation.”  Facebook Research [retrieved from research.fb.com on November 10, 2016].

Boellstorff, Tom (2013).  “Making big data, in theory.”  First Monday 18(10).  [Retrieved at firstmonday.org on January 6, 2017].

boyd, dana and Kate Crawford (2012).  “Critical Questions for Big Data.”  Information, Communication & Society 15(5): 662-679.

Clifford, James (1986).  “Partial Truths.” In Writing Culture, ed. By James Clifford and George Marcus.  Berkeley: University of California Press.

Nixon, Rob (2011).  Slow Violence and the Environmentalism of the Poor.  Cambridge: Harvard University Press.

Taussig, Michael (2003) “Viscerality, Faith, and Skepticism.”  In Magic and Modernity, ed. By Birgit Meyer and Peter Pels, pp. 272-306.  Stanford: Stanford University Press.

Wang, Tricia (2013).  “Big Data Needs Thick Data.”  Ethnography Matters [retrieved from ethnographymatters.net on September 3, 2013].


Sunday, March 29, 2015

Searching for the Anthropological Alien

An eminently sensible article in today's New York Times from Seth Shostak, the Director of SETI and a tireless advocate for our continuing quest to find intelligent life beyond the Earth.  But not just that: he's also been a leader in the continuing discourse of what each of the terms in the acronym "SETI" should mean: what kind of search?  Where?  And what should constitute "intelligence"?  This time, he's weighing in on a debate over actively courting extraterrestrial neighbors by broadcasting transmissions into space.  What should we say?  And shouldn't we be more careful?  Perhaps extraterrestrial intelligence will be less-than-impressed with the ravages that modernity and capitalism have wrought.  Or perhaps they'll see our various weaknesses, and swoop down to attack!  These arguments, Shostak suggests, have more to tell us about contemporary, Hollywood scripts than about the intentions of aliens, and he counters with another, suitably contemporary, proposal: send the aliens Big Data!

But this Big Data approach to SETI (Big Data SETI?) seems just as implicated in our vision of human futures as any Hollywood evocation of alien invasion.  "Big Data" seem poised to solve all of our problems, and it was just a matter of time before the idea came up in the context of extraterrestrial life.  And this is ok.  Unavoidably, SETI is about communicating with humans--today.  Each SETI proposal, each new Arecibo project, is potentially data about extraterrestrial intelligence, but also data about terrestrial intelligence.  As Kant writes (and as David Clark expertly annotates),
"The highest concept of species may be that of a terrestrial rational being [eines irdischen vernünftigen], but we will not be able to describe its characteristics because we do not know of a nonterrestrial rational being [nicht- irdischen Wesen] which would enable us to refer to its properties and consequently classify that terrestrial being as rational. It seems, therefore, that the problem of giving an account of the character of the human species is quite insoluble [sie schlechterdings unauflöslich], because the problem could only be solved by comparing two species of rational beings on the basis of experience, but experience has not offered us a comparison between two species of rational beings."  
To put it another way--we have already given Kant his aliens, and each SETI experiment is simultaneously an encounter with an extraterrestrial rationality with which to measure ourselves.  As we move from SETI@home to what will undoubtedly be fascinating experiments with Big Data, we uncover more and more of our own assumptions about intelligence and communication, and our own concern about the intentions of the humans and nonhumans around us.  In this case, "we" (keeping in mind this is hardly a universal "we") worry about the messages we're sending, the networks we're forming.  The albatross of Big Data around around our necks continues to compel us (like the Ancient Mariner) to tell the governments and institutions around us everything about ourselves, all of the time.  Do we really want aliens mining our Big Data?  Do we really want the terrestrial, non-human agents around us to mine our Big Data (search engines, social network analysis, etc.)?



      

Friday, June 20, 2014

Poor Data, Rich Data, Big Data, Chief

Over the past 2 years, Big Data has worked its way into public consciousness, courtesy of widespread news exposure and a series of popular books by Big Data scientists with hyperbolic evocations of the analytic power of their methods.  There seems to be nothing that Big Data cannot do: predict health and wellness, illuminate culture change, stop poverty, foil terrorists.  And, of course, tighten the noose of Foucauldian surveillance from governments and corporations.  But what all of these accounts promise (or threaten) is a transparent window onto truth: our social lives, behaviors, hopes and dreams all rendered transparent through the analysis of vast datasets.
Visualization of all editing activity by user. Image courtesy Fernanda B. Viégas and wikicommons
Visualization of all editing activity by user. Image courtesy Fernanda B. Viégas and wikicommons
Many qualitative researchers—including anthropologists—have sounded an alarm over this drive to datafictaion, where, as Chris Andersonhas famously concluded, “numbers speak for themselves.”  If Data Scientists can tell us what everyone is doing and what everything is thinking, what need is there for 60 in-depth interviews and two years of participant observation?  As Tricia Wang asks, “What are ethnographers to do when our research is seen as insignificant?”  What are we to do, in other words, when community relationships that we painstakingly elucidate over months of field research can be scraped from social media in a few minutes?
For Wang, the answer is to engage Big Data—and to make ethnographic research relevant in a world of hyper quantification.  Dana Boyd and Kate Crawford (2012) make some of the same points, additionally going on the offensive by exploring the assumptions underlying the drive to Big Data.  Do numbers really speak for themselves?  And does having all the data mean that you have privileged access to all the facts?
But these questions should be familiar to cultural anthropologists; we are no strangers to Big Data.  While we haven’t generally dealt with millions of data points, the hyperbolic claims of Big Data echo the hubris of anthropology in its contact with small societies.  By looking back on these earlier methodologies, we might reconceptualize Big Data as another chapter in what Walter Mignolo has called the “enduring enchantment” of modernity.
In 1898, Alfred Hort Haddon and his team (which included Charles Seligman and W.H.R. Rivers) set out on an expedition to the Torres Strait islands off the coast of New Guinea.  With broad goals for their field surveys, including salvage anthropology, experimental psychology, linguistics and physical anthropology, the team quickly amassed huge amounts of filed data—enough for 6 huge volumes.  Along with these compendia, the team additionally developed novel methodologies, with W.H.R Rivers’s “genealogical method” being the best remembered (as well as the most excoriated).
In order to compensate for his ignorance of native languages, and for the shallowness of the expedition’s contact, Rivers began asking people (in pidgin English and through interpreters) for the names of their “father,” “mother,” “husband,” “wife,” etc.—never mind that these terms were a priori mired in his British, middle-class assumptions about filiation and descent.  Surprised by the impressive, genealogical memories of his informants, he was able to generate vast amounts of “data” using this ham-fisted approach, including “complete” records for some the islands the Torres Straits team surveyed.  From that data, he was able to generate numerous insights into marriage, naming practices, fertility, “totemistic systems,” and even history and culture change.  In other words, without engaging people in real conversations about their lives, and without actually observing islander life, Rivers believed he could apprehend the “whole” of Torres Strait culture and society through applications of his “concrete” method.
The Genealogical Method of Anthropological Inquiry by  W. H. R. Rivers, 1910. Image courtesy the Sociological Review
The Genealogical Method of Anthropological Inquiry by W. H. R. Rivers, 1910. Image courtesy the Sociological Review
Big Data starts from any of the same assumptions.  Without direct windows onto people themselves, Big Data scientists harvest proxy data from the residue of our complex lives in information society.  Do you want to know if people are getting sick?  You could ask people—and observe their behavior—but you could also (as with Google Flu Trends) compile search data on symptoms.  Or do you want to know about the mobility of people in cities?  You could interview people and follow them are their daily round, or you could, as Barabasi and his team did, analyze the billing records from 100,000 cell phone users in order to generate maps of movements over a 6-moth period.
Is it specious to compare huge datasets from Google with Rivers’s collected genealogies?  Both proceed from the same assumptions about the whole.  After all, anthropological research on small populations of people living in putative isolation on islands was premised on the assumption that one could collect and understand everything about a simple society.  Big Data builds a similar edifice upon massive computing power and the integration of networks.  For Google, flu trends provides a window onto vectors of illness because it collects the whole of Google search data—an island, as it were, secured by a near-monopoly over Internet traffic.   In addition, the problems of the genealogical methods are the problems of proxy data in general.  Massive data can be collected, analyzed  and correlated, but what do these data describe?  When Rivers asks the Torres Strait islanders who their “proper” father is, how useful are those data?  And if he’s managed to solicit genealogies out to five generations, what insights might he derive from these facts?
Of course, big data scientists debate the suitability of data proxies—but it would be a mistake to assume that we have nothing to add to that argument.  Moreover, anthropologists have a long history of questioning the synecdochic fallacy.  Is kinship the foundation for society?  Can we understand the whole of society by considering key institutions like kinship, subsistence and exchange?  And what does it mean to understand the “whole” to begin with?   These are ultimately the questions to pose Big Data: if I collect all of the tweets (as the Library of Congressis doing), can I now understand how people live in the city?  Or how they relate to other people?  Or is there always some destabilizing meaning that lies between these hundreds of terabytes?
Most of all, we can utilize our own experiences to reflect on Big Data as a technological imaginary.  Why do we think it’s desirable to collect all of the data?  What do we imagine the truth of the whole to be?


Cybernetics and Anthropology - Past and Present

 I continue to wrestle with the legacy of cybernetics in anthropology - and a future premised on an anthropological bases for the digital.  ...