Skip to main content

Posts

Showing posts with the label betweenness centrality

Anthropology on the Long Tail

Small Big Data? Of the many hyperbolic predictions in bestselling books devoted to big data, none is more astounding than  Mayer-Schönberger’s and Cukier’s  claims that big data will eliminate the need for sampling (why sample when you’ve got all the data?). But here’s the thing. We don’t have all of the data. Let’s look at Twitter. First, people who tweet are not a representative sample of the population. Second, like most commercial platforms, Twitter has moved towards more proprietary policies on the data they have mined from us. Most of us can only access up to 1% of relevant tweets for a given query. That can still be a lot of tweets, and that data is, for the moment, free.  But is that big data?  In other words, we’ve got sampling bias. If you can detect it, though, you can correct for it— Morstatter et al  recommend bootstrapping the data in order to correct for the biased sample. But it may not be so easy with some of the work we do. For example, t...