Tracing the traces in online spaces

Soldering Time Fun!

Each of this week’s recommended links is about getting down and dirty with the technological details of internet communication.

In a new paper, Geiger and Ribes offer a compelling picture of Wikipedia’s “vandal fighting” editors that largely departs from the existing literature. By engaging with the day-to-day practices of the vandal fighters, the researchers learned to make meaning of an overwhelming heap of Wikipedia data in order to reconstruct the scene of a malicious user being banned.

Joel Spolsky usually writes for an audience of computer programmers and this essay about character encoding is no exception. In and among the technical details, however, Spolsky’s history of the digitized alphabet is a parable about the growing pains of a global computing network. Two hexadecimal bytes represent 255 unique values: plenty of space for American engineers to store 26 lowercase letters, 26 uppercase letters, 10 Arabic numerals, and a handful of punctuation — but what happens when we start to trade files with colleagues overseas? How do today’s software designers account for the thousands upon thousands of characters used around the world?

Finally, Asheesh Laroia runs a fascinating workshop about web scraping at PyCon, the annual gathering of Python programmers. In this play-along-at-home presentation, he walks the audience through a variety of tools and techniques to automate data collection from nearly any resource on the web. Novice programmers should feel comfortable to jump right in. Laroia provides plenty of example code to play with.

  • Laroia, A. (2009) Scrape the Web: Strategies for programming websites that don’t expect it. [Video] PyCon, Chicago, May 8. Retrieved from:

(If you’re not yet a programmer but want to learn, Python is a great language for beginners. If you’re looking for an introductory book, try Think Python.)


  1. Kevin, I skimmed the Geiger and Ribes article. It was especially exciting to read even wider deployment of Hutchins’ “distributed cognition” (DC) conception. I have to still properly read and then mull over their harnessing of the concept, and my a priori critical worry as always is: how much can we use a theoretical construct to fit into a phenomenon without undermining the theoretical construct in the first place? Or maybe theoretical constructs are meant to be harnessed (duh!)? I think the key difference they are hinting at between navigation on ships and botty Wikipedians is that DC in the former makes the work of the non-human agents “directly observable” (p. 4 of the article), whereas in the latter the work of the non-human agents forms a “largely invisible infrastructure”. Again, we have to ask, visible for whom and at what level of analysis? Further, I worry about their stronger claim about connecting DC to epistemological standards. Not sure if this claim is necessary to their argument, and it might make it vulnerable to criticism from philosophers and cognitive scientists. Finally, I think their section on redistributing moral agency is particularly and uniquely relevant to this area of enquiry and therefore interesting.

    Again, I still have to read in-depth, so all my claims above are tentative. But thank you for submitting this reading! Hope DC lives on!

Speak Your Mind