Showing posts with label Generative AI. Show all posts
Showing posts with label Generative AI. Show all posts

Sunday, January 5, 2025

Network Ghosts in the Age of Generative AI

 

What are faculty thinking about generative AI? In my role at our faculty center, I speak to faculty often on the problems they face teaching in the era of AI, and the workarounds they've come up with. The advent of publicly available generative AI platforms was not something people in my field (anthropology) or other faculty in the social sciences and humanities were clamoring for. And yet here we are. This has led to many responses: anguish, certainly, but also ways of incorporating--or at east channeling--the usage of generative AI in the classroom.

But what about faculty outside of my university? I used NodeXL to download Reddit data from the "/Professors" subreddit using the keyword "AI." This generated records of about 2500 users posting, commenting or replying for a total of 7000 contributions to the debate. I then grouped the data in clusters of similar postings, and abstracted the top words from each group as indicated by "up-vote" (which functions as more of a "like" in Reddit). As you can see, faculty were not particularly optimistic about AI in 2024. Yes, there were a couple of more computopian posters (and at least one computer scientist) who chided the community for rejecting what they saw as inevitable. But most worried that their efforts to teach writing, critical thinking, methodology and analysis were thwarted by student reliance on generative AI. Cynically, they predicted their university's tolerance for AI cheating, and speculated over their ability to continue as faculty under these conditions.

In 2024, Reddit sold their content to Google to train their large language models. This would have been been more objectionable, perhaps, if it wasn't already abundantly clear that generative AI have already been trained on Reddit, which maintains a relatively open API at a time when most social media have monetized their social network data. But what happens to that Reddit data when its re-constituted by generative AI? I decided to prompt Microsoft's Co-Pilot (to which I have enterprise-level access) to generate a spreadsheet of a Reddit conversation on AI between professors. Here's the prompt: "I would like you to generate an excel file similar to a Reddit conversation on a subreddit called "professors." The posts should discuss ChatGPT and student work from the perspective of the professor, and should include comments and replies to those comments. There should be 4 columns in the spreadsheet: A (person commenting or replying); B (person whom A is replying to); C (the text of the comment or reply); and D (the date of the reply or comment). Please populate the spreadsheet with at least 20 comments and 350 replies to those comments."

Co-pilot returned a network with with just 10 users, with 350 edges representing multiple re-postings(?) of user posts. Re-posting really isn't a thing with Reddit, so perhaps there's some confusion here with XTwitter. Since this is a much smaller network, I just labeled the 10 nodes with key words from their posts. The comments are a near "upside down" to the actual Reddit discourse over 2024, generally praising the efficiencies of generative AI and, when critical, speculating over the need for faculty at all (hence the precarity). Of course, there's a snarky comment on "Clippy," the irritating Microsoft assistant. The network itself, while smaller, is also structurally different. The actual Reddit network has a density of .001158737. In network measures of density, "1" would represent 100% connection--everyone connected to everyone else. So .0012 may not seem like much, but it's typical of social media networks where, after all, most of us don't feed the trolls and we save our replies for issues (and users) that we really care about. On the other hand, my AI-generated network has one of 0.966666667--an almost perfectly connected network where everyone has replied to everyone in a style of a polite and ploddingly inclusive panel discussion.




So, I guess that Co-Pilot does a lousy job simulating a subreddit? Yes, but, I think, more than that. It wasn't that long ago (2023) when XTwitter adopted a fee-based model for API access. That decision placed Twitter data beyond the reach of most of us. When social media data disappears behind paywalls, we (ordinary researchers) no longer really have access to the "connected action" of social media. While we can certainly look at social media, this only exposes us to our respective corners of the media platforms we inhabit and the structural components of social media are lost. But what happens when social media content is sold to OpenAI or Google Gemini? When social media disappears into a large language model, both the content and the connections are lost, and the simulated networks produced through generative AI manage to misrepresent social media on both fronts. Since Co-Pilot's inner workings are opaque to us, it is unclear if these results are the result of deliberate choice, unintended bias or something else.

Tuesday, May 28, 2024

Anthropology's Sad AI Archive

 

There are 3 approaches to generative AI in the classroom: 1) an outright ban on it; 2) a limited use policy that covers certain assignments or parts of assignments, and 3) an open approach that allows students to do what they would. None of these are fool-proof, whatever the intentions of the professor. Ultimately, generative AI are third-party, black-boxed products–more tempting to students, perhaps, than Wikipedia, but also more treacherous. I feel for my colleagues in the humanities attempting to wrest essays from students on Shakespeare or Aristotle: generative AI is all too good at producing a mediocre essay on these subjects. I also understand my colleagues in the computer and information sciences, who utilize these chatbots to help with their instruction.

 

But with anthropology, there are several caveats. 99.99% of writings on other peoples of the world are drenched in ethnocentrism, colonialism and racism. The internet is awash in complete nonsense about “tribes'' and their “traditional culture,” and, in generative AI, all of this is ground up and, like sausage, pumped into prompt-driven content. Yet typically, students don’t know enough to be able to distinguish a “good” and “bad” response from ChatGPT or Gemini. 

 

This is a somewhat longer way of saying that students often tried to utilize generative AI in my introductory assignments and take-home exams, and their grades suffered for it. Not because I was penalizing them for cheating; proving that they’ve used AI is almost impossible, and generative AI detectors are unreliable at best. Instead, the questions that I asked were all about the anthropology I’ve taught in classes, and generative AI is, unfortunately, only too willing to spit out all manner of palaver. Only someone who knows what to ask can minimize the racism and colonialism inherent in generative AI engines. The default is ideology. And hallucinations. 

 

One thing I want to include next year is some process of education. I really think that students don’t really know any better. The least I can do is show them that it’s not so easy and explain why that is–that generative AI is not giving them the “truth.” Or, rather, it is: the truth of colonialism and racism that underlies Western thinking about non-Western peoples. Anthropology’s sad archive. But to someone who’s never taken anthropology before, this stuff looks correct to them, and the temptation is too strong, especially in the panicked moments before a deadline.

Cybernetics and Anthropology - Past and Present

 I continue to wrestle with the legacy of cybernetics in anthropology - and a future premised on an anthropological bases for the digital.  ...