DLCL ATS round-up, winter 2021

Winter quarter was a time of consolidation and slow progress, without much to share by way of exciting new developments. I worked on a lot of things that I expect will bear fruit down the road, but I can't point to anything concrete yet. Here's where things stand as of spring break 2021.

Existing Projects

With some advice from Matthew Lincoln at Carnegie Mellon, I'm close to finishing the migration of the Global Medieval Sourcebook into a Jekyll site fashioned after CMU's DH Literacy Guidebook. It's been a multi-step process, including some XSLT, some Python, some OpenRefine, and I've kept notes along the way. I'm planning to write it all up during the spring, if only as a set of tips for people considering migrating from Drupal to a static website.

Thanks to the tireless nights-and-weekends work of my CIDR colleague Simon Wiles, we launched the reimagined Palladio bricks (embeddable Palladio) as Palladio webcomponents this quarter. There was a lot of enthusiasm from the broader DH community about the possibility of being able to embed Palladio in other webpages, and we also got the go-ahead to start work on reimagining the Mapping the Republic of Letters site using this new version of Palladio. I'm looking forward to working more on that next quarter.

Migrating the metadata and files for the recent "Entitled Opinions" podcast to the University Archives is on hold until the summer.

Over the course of this quarter I've been involved with ongoing discussions in a number of the DLCL's research units (interdisciplinary working groups) about website infrastructure, which ties into larger departmental discussions about the same. The challenges of balancing aesthetics, information needs, archiving, and findability are still very real, and difficult to negotiate.

This quarter Anouk Lang and I published The Ghost in Anouk's Laptop, a new Data-Sitters Club book on GPT-2, machine learning, and text generation, thanks to some generous feedback and consultation with Annie Lamar, a grad student in Classics at Stanford, and Jeff Tharsen from UChicago. I'm really proud of this one, particularly the way we came up with some new and different analogies for explaining machine learning (e.g. no references to cars or baseball)! It's hard to believe we've been working on it, on and off, for an entire year. I've also got a new Multilingual Mystery half-completed with Lee Skallerup Bessette and Isabelle Gribomont, looking at the adaptations of food words across the various French translations. (Scanning, OCR-ing, and aligning those corpora was its own adventure!) The Data-Sitters Club gave talks for the Stanford Literary Lab and for UIUC's Center for Children's Books. More than anything we accomplished, though, I'm excited about what we've got in the pipeline for the Data-Sitters Club, especially on the multilingual front. Our Multilingual Mystery team has exploded this quarter, now including numerous DLCL grad students and recent grads, along with colleagues abroad. We're waiting to hear back about an EADH poster proposal that would expand the work on French food translation to include Italian, peninsular Spanish, Hebrew, German, Dutch, Catalan, and/or Portuguese. I'd love to have the excuse to introduce so many people to one another and collaborate on something as fun as adaptations of 90's snack foods like s'mores, Gatorade, and Spaghetti O's.

Thanks to the mentorship of Henry Lowood, I had the chance to go through the whole administrative process of acquiring a new collection for the library this quarter, with Ken Whistler's Unicode archive. I can't wait to explore some of these early materials about the Unicode consortium, and incorporate them into my teaching in fall 2022 with the next multilingual DH class.

Sadly, in February we got word that the proposal Cécile Alduy and I submitted on behalf of the DLCL, for creating text corpora in the major DLCL languages, didn't receive an internal humanities grant. I've been thinking about ways to move the project forward nonetheless, including trying to gather information about records of French publications and publication reviews, to try to construct -- for at least one of the DLCL languages -- the kind of metadata, and corpora, that colleagues have access to in English.

Other projects that are still on hold include the Harry Potter fanfic project, text mining JSTOR, and cleaning up the OCR for the Ostrov zine (though now we're imagining ways of working with that text without having to break it up into sub-articles). Something to take up in the spring or summer, perhaps?

New Projects

This quarter I've been playing a bit with corpora of middle-school and YA novels in English (with Nichole Nomura and Jennifer Wolf), as well as Star Wars novels in English (with Mark Algee-Hewitt, Nichole Nomura, and Matt Warner). The former corpus came in handy on a new project where I've been able to support Nick Fenech in an investigation of how Stanford gets referred to over time in different kinds of text corpora. I've found it helpful to have a few English-language corpora to experiment with, to help me wrap my head around different computational methods and techniques, and then work on adapting those to the other languages of the DLCL.

I also had the chance to talk with Ben Albritton in Special Collections about workflows for ingesting transcriptions of special collection material created by Transkribus, and adding it to the Library catalog records for those digital objects. I haven't had the chance to try it out much yet, but I'm hoping to make it part of my regular workflow in the spring.

I got to learn something about the TEI Publisher platform by answering some questions from Johannes Ruhland about his digital edition project. Once again, Elisa Beshero-Bondar from Penn State Behrend is a benevolent genie of expertise when it comes to TEI and the quirks of systems that process it.

Writing

This quarter was the due date for revisions to the Debates in DH: Computational Humanities piece I put together with colleagues from a former DH + HPC birds-of-a-feather group I started at my former job at UC Berkeley. Next week is also the deadline for two pieces for a DH edited volume, one on whether coding matters (tl;dr: not as much as other things), and one on multilingual DH with Pedro Fernandez (tl;dr: it's a good thing that mainstream DH should support better). I also got the final revisions in for the Debates in DH 2021 piece with Patrick Burns about multilingual DH, and heard the news that the Debates in DH volume with my obituary for the defunct DiRT (Directory of Research Tools), which I wrote while at Berkeley, will be published later this year. Along with J.D. Porter and Yulia Ilchuk, I helped draft a piece about our translation project from 2019, and it was satisfying to be able to wrap that up in some form.

Talks and Events

I've been enjoying the opportunity to participate in events around the world, but I'll confess the early scheduling of everything is starting to get exhausting. There are only so many times one can get up at 5 or 6 AM to watch (or, worse, give) a talk. This quarter I was up especially early for DHARTI (the Indian DH association), talking about the opportunities for doing computational text analysis on popular culture materials in non-English languages.

Liz Grumbach and I continued our Animal Crossing: New Horizons DH talk series, with two talks: one by Camille Villa from Stanford Library's DLSS group about incorporating IIIF images into Animal Crossing, and one by Allie Alvis from Type Punch Matrix on communicating book materiality in digital spaces. Our talk series also won first runner-up in "Best Use of DH for Fun" in the DH Awards 2020.

I presented at a Critical Digital Pedagogies for Modern Languages event "at" King's College London, following up on the tutorial-writing hackathon from summer 2019 (that turned into this paper on preparing non-English texts for computational text analysis).

As part of the ACH mentorship offerings, I organized a discussion with Julia Flanders and Greg Palermo of Digital Humanities Quarterly and Journal of Interactive Technology & Pedagogy about publishing your first article.

I presented for the CESTA fellows on data modeling, databases, and alternatives to databases-as-such. At the beginning of the quarter, we also organized through CESTA a few Python co-learning/co-working sessions, but it was hard to keep up the momentum with shifting scheduling times.

Looking ahead, I put in a proposal for the MSU Global DH conference with a number of students from my non-English DH course last quarter, which we'll be presenting in mid-April. I also gave feedback on multiple students' submissions for the ACH conference this summer. I hope I'll see them there virtually, and also be able to do the collaborative roundtable on tool directories that didn't happen in 2020. Towards the end of the quarter, I had various meetings with potential DLCL grad students. I'm excited to see the level of DH interest among our incoming cohort, as it promises many new and interesting projects down the line. Relatedly, during winter quarter I was involved in the job search for a Digital Scholarship Coordinator (joint CESTA/CIDR position), and I look forward to that wrapping up soon, and having a new colleague to collaborate with before long.