Book review: "New Digital Worlds" by Roopika Risam

Thanks to some swift book-ordering by Glen Worthey, our Digital Humanities Librarian, Stanford became one of the first universities with a copy of Roopika Risam’s book New Digital Worlds last week. It was recalled before I even picked it up, leaving me with only a week to read it. As it turned out, there was no need to be concerned: Risam has accomplished the rare feat of crafting a monograph that is simultaneously scholarly, engaging, and applicable. Last Thursday, I started reading it on my 2-hour morning commute, continued during lunch, resumed on my 2-hour evening commute, and — with a brief hiatus for a rousing bedtime chorus of favorite cartoon theme songs with two small children (of such things are modern folksongs made) — spent the coveted post-bedtime hours finishing it. It was around lunchtime that I realized that I needed my own copy and ordered it on the spot.

The introduction through chapter 3 feel like a contiguous thread, with chapter 4 pivoting towards the applied through its focus on pedagogy, and chapter 5 taking a different tack to provide a humanities response to data science and associated tools and methods. The relationship between digital humanities and data science is an active area of negotiation, and chapter 5 is a thoughtful contribution to the discussion. It’s now at the top of my list of suggested readings on that topic, though it felt a little tangential while reading through the whole book in one day.

Regardless of what variety of DH you practice — what discipline, what medium, what century, what language — New Digital Worlds has food for thought. One thing I took from reading it is the importance of being clear and precise in articulating what you’re talking about. I've started to be more attentive to sweeping generalizations, either stated or implied, as well as unspoken assumptions about corpora (language, author's gender, etc.) The necessity of explicitly stating these things has become a lens through which I read other scholarship. It's important to not forget that there are materials we won't have access to because they weren’t preserved, and that there are gaps in our understanding of the scope and nature of history, literature, art, culture, and language as a result of colonialist choices that were made by those in power. You should acknowledge the “known unknowns". If you discover something interesting in a corpus of pre-20th-century novels by Americans, published in major American publishing houses, don’t declare it a finding about “literature” — or even “American novels”. It’s worth thinking through and spelling out (to the extent you can) the materials missing from your corpus. In the Stanford LitLab talks I’ve been to on their “typicality in the novel” project, I appreciate the fact that Mark Algee-Hewitt has always made time to talk through the things their (licensed) corpus doesn’t include, and who gets excluded as a result.

I’ll confess that I requested this book with the expectation of using it for just one week of the course I’m teaching next semester on non-English textual DH. On Tuesday, January 29th, I was planning to include a shout-out to postcolonial DH as part of a discussion of thematic research collections, in the context of Mukurtu (which itself gets a nice write-up in New Digital Worlds). Instead, what I have in New Digital Worlds feels like an important conceptual scaffolding and a unifying thread for a course that is very much a product of my own engagement with DH: always in an alt-ac / librarian role (in spirit or in fact), strongly oriented towards “doing” and strongly inclined towards leaving the “theorizing” to someone else.

The introduction and chapter 1 provide background, context, and an explanation of postcolonial digital humanities. In situating it relative to other approaches, disciplines, movements, and individual projects, Risam has done the larger community a tremendous service in compiling a substantial bibliography. Alas, it’s undermined by Northwestern University Press’s regrettable insistence on Chicago-style footnotes and no bibliography as such (I asked Risam about it on Twitter, after getting befuddled by one of the footnotes), but so it goes. There were multiple points where it felt like Risam had managed to anticipate my own to-do list for planning out this course, and preempted my meandering attempt to explain various facets of DH culture (e.g. “hack vs. yack”, anti-neoliberalist criticism of DH, “disrupt DH”, etc.) with a clear articulation of key issues, arguments, and individuals.

One small note on chapter 1: the point is well taken that there are widely-used tools and approaches that have been trained on English, and are unusable with other languages. Stanford’s named-entity recognition, part of the nltk package, is primarily trained on English, though you can separately download German, Spanish, or Chinese models. I was concerned about a reference to the MALLET topic modeling package as one with pernicious results as a consequence of being trained on English — that wasn’t my impression about how LDA topic modeling worked. Following the citation trail, though, it turns out to be a corpus issue: for the “Distant Reading of Empire” project, faculty and students at Swarthmore College used MALLET on 3,000 text files requested from HathiTrust. The presentation notes cited in chapter 1 state that they tried to just get English books but "If MALLET comes across a foreign language it will place all of the words in that language into one topic" — which isn’t exactly what's going on. If MALLET is working with a corpus that’s primarily in English and there happens to be a couple documents in another language, the words that appear in the topic model that aren’t in English will cluster together because they occur with one another, and do not occur with English words. But if you run MALLET on a text that includes code-switching and free interweaving of languages (like Spanish and English in Gloria Anzaldúa’s Borderlands/La Frontera: The New Mestiza), you get bilingual topics because English and Spanish words appear in close proximity to one another. Where one can raise concerns with MALLET on decolonizing DH grounds is the fact that it presupposes that the “words” that make up the topics are separated by spaces. Chinese texts don’t follow that convention, as one example, and the use of spaces in Arabic also doesn't quite align with the expectations of tools like MALLET or Voyant that rely on spaces. Luckily, Stanford NLP has a word segmenter for both those languages, but if the language you’re working with doesn’t separate words with spaces (and this includes medieval European documents written in older varieties of languages whose modern forms use spaces), and you don’t have a way to impose that convention on your documents, MALLET and similar implementations of topic modeling aren't available to you.

Chapter 3, “Colonial Violence and the Postcolonial Digital Archive”, is what I was looking for and expecting from this book. It includes the discussion of Mukurtu, and — to my surprise — a criticism of NINES. Risam notes that the materials that appear there are mostly centered on canonical white writers from the United States, United Kingdom, and Canada, and “These omissions appear, unseen in their absence, in NINEs, which does not include nineteenth-century Anglophone writing from Great Britain’s colonies and underrepresented black and indigenous voices… Yet its connections to colonialism have gone unremarked. Colonial violence in NINEs appears in its reipscription of colonial legacies in digital form and the rehearsal of the colonial dynamics of knowledge production that have bothered large swathes of the human population. The erasures within NINES are examples of colonial violence that persists in digital humanities scholarship.” (p. 51)

As someone who’s run directories before (populated through manual data entry or one-time bulk imports, rather than acting as an aggregator of data stored elsewhere), the question of responsibility for gaps in that data is of real interest for me. It also calls to mind — as a consequence of my own background as a Slavist — those “famous Russian questions”: “Who’s guilty?” and “What to do?” Are there 19th century postcolonial projects being submitted to NINES that are being rejected? Do the creators of such projects not even submit them to NINES, due to perceived technical barriers or biases in the approval process? Are people who work in postcolonial DH less likely to be familiar with NINES, or less likely to think of it as a possible venue for dissemination? If it’s an outreach problem, are there things that NINES does in its outreach that are a turn-off to postcolonial DH folks, or is it just that NINES isn’t doing much outreach at all these days? The easiest way to build a resource that draws on information from a wide variety of sources is to work within the networks of people you already know, as it’s difficult to get a general-purpose call for participation to gain traction on Twitter or mailing lists. My own network can largely be traced (in one way or another) back to the first phase of the ill-fated digital humanities cyberinfrastructure initiative Project Bamboo (2008-2010), which has led to a set of connections that are hardly representative of digital humanities as it’s practiced today, and I know I need to work on remedying that. There’s clearly more that NINES could and should to be clear about the colonial context of some of its materials, and there's an opportunity to engage with postcolonial DH projects (including the Bichitra Online Tagore Variorum frequently mentioned in Risam's book), but from the paucity of news, my sense is that NINES is making do, keeping the site running, but the project doesn't currently have funding to do much more than that. If rethinking, reframing, and reworking how to better cover a wider range of 19th century scholarship and acknowledge what’s been lost to colonialism is off the table, it’s much easier to add more materials that are compatible with the current structure of NINES. To that end, the list of partner sites for NINES is more diverse (linguistically, at least) than I expected based on Risam's description. There are multiple sub-listings for the Blue Mountain Project ("a digital thematic research collection of art, music and literary periodicals published between 1848, the year of the European Revolutions, and 1923”) that expand the scope of NINES to include materials in languages including French, German, Danish, Italian, and Czech. The National Library of Bulgaria and National Museum in Warsaw (both via Europeana) are partners. For resources that would appear to engage with postcolonial DH, there’s The Digital Library of the Caribbean, Caribbean Literature (featuring poetry and fiction in "English, French, Spanish, Dutch, and various Creole languages"), and North American Indian Thought and Culture — though it’s important to note the caveat that the latter two collections are freely searchable through NINES, but are commercial resources and full text is only available by subscription. Sustainability is expensive, and even if you can prevent a long-running site from becoming technically outdated, Risam’s critique has left me thinking about how even aggregator-type resources (that aren’t meant to make an argument in and of themselves) should dedicate time to periodically reviewing and updating at least their framing prose to respond to new scholarship and discourse in the field.

In the course I’m teaching next quarter, there are two class sessions that will explicitly deal with national and international DH organizations; the ethos of DH as practiced in the US, and how that’s similar and different from other DH communities; and the ways that DH is similar and different from students’ own disciplinary cultures. I knew this would mean I’d have to spend some time putting together a clear distillation of key groups and organizations, and how they relate to one another, and I was surprised and delighted to see that Risam has done a significant amount of that work for me in chapter 3. What’s more, by coming to DH through a different path than I’ve taken, she has insight into the history of some of these organizations that I didn’t know about. I remember the general zeitgeist when GO::DH started, but I only had a vague idea of the context of its creation before reading this book. Likewise, I hadn’t realized that Risam was the first person of color to be elected to the ACH exec board (she predated me on the board and her time partially overlapped with mine). The degree to which ACH has come to be associated internationally with advocating for diverse vision of DH — along parameters of race, gender, and background, and not just language — is a testament to Risam’s effectiveness on the exec board, and the effectiveness of those she has helped bring into the organization (including Élika Ortega). In addition to a discussion of organizations, chapter 3 contrasts Melissa Terras’s 2012 map of DH centers (which don't show much DH activity in the Global South) to other approaches to mapping DH, such as that used by the “Around DH in 80 Days” project. Whether on the single-institutional level or the global level, it’s remarkable how different your results can be depending on how you “count” or “map” DH, as I’ve been discovering locally over the course of my first quarter at Stanford.

Chapter 4 shifts focus to the classroom, with some concrete ideas and suggestions for implementing postcolonial digital pedagogy. The chapter is largely oriented towards teaching undergrads, and while I only have a few of them in my class next quarter, it nonetheless served as support for the approach I’m taking with assignments. My course will have no traditional written essays, but the students taking the course for the maximum number of credits will be developing a tutorial aimed at others in their discipline for how to a tool or method that we've covered in class, writing conference proposals (one for their major disciplinary conference, and one for a DH conference) to hypothetically present their final projects, and designing a poster for the final poster session, in addition to blogging about their experiences applying the tools we discuss to materials in their non-English language of choice. I’m not sure to what extent I’ll be contextualizing what we do within “the politics of knowledge production laid bare by postcolonial studies” and “the colonial and neocolonial knowledge formations that link both print and digital culture” (p. 94), but it’s something to consider. We’ll definitely be talking about where tool and library development happens, and how people have responded to having their language “left out" — which will be something I’ll also be reading up on in greater depth over the next couple months.

Chapter 5 would make a wonderful article, and I worry that without some “marketing”, it won’t find its way onto the right reading lists: people interested in debates about the intersection of digital humanities with data science may not think to check the last chapter of a book on postcolonial digital humanities. Risam first contextualizes the rhetoric of the “crisis of the humanities” within time (with similar statements dating to the 17th century) and space (through a comparison to European anxiety over the Research Excellence Framework, and South Africa’s crisis-focused report on the humanities in 2011). The challenge of modeling “the human” — as is implicit in many undertakings in data science — connects the chapter to the postcolonial focus of the rest of the book. “In the broader context of these technologies, an area that remains underexplored is the way that the ‘human’ is articulated, produced, and normed in the drive towards emulating ‘human’ processes. At stake is the way that universalist framings of the ‘human’ are produced through natural language-processing software, machine learning, and algorithms … they reinforce the notion that there are normative and singular ways of being human in the twenty-first century.” (p. 125.) Risam also raises ethical concerns about the use of crowdsourced coding labor using Amazon’s Mechanical Turk, and also calls into question our ability to make universalizing statements using this data without being able to consider the influence of the coders’ identities, cultural backgrounds, and geographic locations. “Failing to identify its own standpoint, the project elides cognitive processes that may be shaped by the particulars of lived and embodied experience." (p. 131). Risam similarly warns against technological “black boxes”, as well as approaches to computational literary analysis such as those that are "tasked with decisions about narratological salience that are themselves subtended by universalist notions of the human, rather than situated in the contexts informing the text. Like other algorithms, they are steeped in the cultural and political implications of computation and code, which are themselves overdetermined by the ontological categories and epistemological processes of the Global North. Furthermore, the datasets and databases used in conjunction with algorithms are themselves constructed and subject to political social forces as well.” (p. 132). I do plan to teach NLP methods next quarter, and I’m involved in a couple research projects that use computational text analysis. I think that, done carefully, NLP can give us information about by-what-means different aspects of literature— be they authorship signals, suspense, genre, etc.— are implemented, but that context (cultural, author identity, temporal, etc.) is crucial for helping us understand and account for variation in that data. And likewise, I don’t think I’ll be renouncing large datasets, but it’s a good reminder to be both clear with yourself and explicit in your analysis regarding the limitations that their scope and the context of their creation places on the scope of the claims you can make.

The conclusion is framed as a call to action, and is one of the more stirring proposals of a vision for digital humanities as a force for good both within the academy, and within the broader world through its engagement, collaboration, and partnerships with communities whose history, literature, and culture have been objects of study. It emphasizes the value of local practices, pushing back against “universals" that stem from the context of the Global North, and calls for more making that draws on epistemologies with different roots. Intervening “in the channels of capital, knowledge, and power in which the academy is implicated” towards the goal of “a digital cultural record that puts social justice at its center — a record that is postcolonial, feminist, antiracist, intersectional” (p. 144) is an inspiring vision and a noble goal. It positions digital humanities to make a unique and broadly meaningful contribution to the humanities, rather than simply opening up a new set of research questions that will be met by the same uneven interest as any other research agenda in the humanities. That said, not everyone is in an equal position to take up this call to action. I don’t think it would end well for me if I took reshaping the digital cultural record as a major vetting factor in what projects I support. There are things I can and should do, like more engagement with the Department of Iberian and Latin American Cultures (which includes “the literature and cultures of the Iberian Peninsula, Latin America, Brazil, Lusophone Africa, and Latinx communities in the United States”). But for myself, the biggest takeaway point is to get clear about about the scope, audience, goals, and limits of any project, and be explicit about those things in whatever form the project is disseminated.

New Digital Worlds has a thoughtful and refreshingly different read on, and criticism of, digital humanities. There’s something to consider for people in a wide variety of roles, from librarians who develop (or purchase) corpora, to museum and archives folks who directly grapple with issues around cultural heritage materials, to developers who build digital humanities tools and projects, to scholars and instructors engaged with research and/or teaching, to students who are trying to choose a focus for their own scholarship. Even if you don’t agree with everything in it, it is very much a worthwhile read. Now that I have my own copy, I’m looking forward to revisiting it with sticky notes and highlighter, and keeping it close at hand next quarter.