DLCL ATS round-up, fall 2024

Textile Makerspace

This fall was the first chance for the Textile Makerspace and GSE Makery to pilot a joint textile-specialist staffing program. We got off to a bit of a slow start after our initial staffing plans fell apart the week before the quarter began, but by the end of the quarter we had three new student assistants onboarded and holding regular hours, in addition to one of my returning students from last year. With any luck, we'll be on steadier footing with staffing this quarter.

The logistics involved with getting the necessary spinning wheels have been non-trivial, but Anna Jerve (from Stanford Libraries / Doerr School) and I have continued working towards our "fleece to fiber" spinning club idea, which is now likely to debut as an optional track as part of the Data Visualization with Textiles class in the spring.

Classes

This fall I taught my non-English text analysis class on a condensed schedule, helping students make progress on projects ranging from Ukrainian poetry social media, to Japanese literature, to multilingual corpus building as scaffolding for several future projects. DSC 19: Shelley and the Bad Corpus continues to be one of the most useful Data-Sitters Club books to give to students getting started. This was also the first time I attempted to teach some basic Python for literary analysis using the path of of LLM prompting with Codeium. It was an interesting and useful experience, and good food for thought for how I might teach similar things in the future.

Existing projects

During the fall, the Data-Sitters Club published DSC Super Special #1: The Data-Sitters Debate at Dartmouth, which is a near-verbatim transcript of an argument we got into when we met in person in March 2023 about the nature of the project and what we're trying to accomplish with it. If you ever wanted to be a fly on the wall for a project meeting with a mix of enthusiasts and skeptics of computational text analysis, here's your chance. That debate also led to a new spin-off series, Data-Sitters Little TL;DR, aimed at beginners, with our starter book, DH Curious?.

It was great to see The Futurist Archive (Ty Davidian), Flamenco letras (Tania Flores), Network Analysis of Vsesvit (Georgii Korotkov) and Where is the world for Montréal? (Chloé Brault) presented at the ACH conference this fall. As a small point of pride, my own department (DLCL) made up a majority of the Stanford representatives at the conference. It feels like I've actually gotten some traction on increasing the visibility of the multilingual DH work that goes on here, over the course of the last 6 years.

The Senegalese Countercultural Movements project, Global Medieval Sourcebook, Multilingual Harry Potter Fanfic, and letters to Christine Blasey Ford were all more or less on hold during the fall.

We finally had the Browsertrix Cloud pilot project debrief, where we decided to continue the pilot for another year and start exploring better pipelines for getting the data from the Browsertrix cloud servers into the Stanford Digital Repository. I think this is going to be an invaluable tool for meaningfully preserving a lot of website-based DH work in my department and beyond.

Collaborating with Eitan Kensky, Simon Wiles, and Kristen Valenti, the first phase of the Jewish cookbooks project is near completion. Due to assorted delays, the exhibit wasn't fully installed by the end of fall quarter, but it will be up in the beginning of the winter. Our exploration of the data we have so far has definitely motivated Eitan and me to continue building out that data set and interrogating it in different ways. I'm hoping to put together some kind of web-based version of the exhibit to share some of the fun things we've found and solicit ideas for future directions.

During fall quarter, I participated in a strategic planning meeting for SILICON and gave summer interns feedback on their presentations for the Unicode technology workshop. I've also been participating in the Script Encoding Working Group meetings, mostly as a notetaker, but I'm hoping to get more involved later this year. I also helped sort out what Stanford can offer around computing resources for building and running inference on language models, as part of the SILICON Practitioners' Program.

Interest in Transkribus continues, with periodic requests from people throughout campus to get access to the user seats that would enable them to access the more powerful large models, and I'm happy to oblige.

I did some more work with Adrian Daub and Connor Yankowitz, looking at the history of feminist, gender, and sexuality studies courses at Stanford, helping clean and otherwise wrangle their extensive data set of course descriptions from across the 20th century.

Following the results of the 2024 election, SUCHO returned to a place of prominence in my work life: both in relation to our ongoing work with Ukraine, and as a model for capturing other at-risk data. SUCHO and Webrecorder collaborated on a couple of web archiving workshops at the end of the year, co-sponsored by ACH, and we worked with Amanda Visconti on a zine capturing the highlights.

Finally, the ACH AI working group met at the ACH conference and discussed some concrete things that we can do together. One idea that came up as a way to capture some of the conversations we've been having on our own campuses was a zine, and we had a couple co-working sessions at the end of the year. I'm hoping to finish that by the end of the month.

New projects

Since spring of last year, I've been exploring a Multilingual DH Co-Op, with a mix of workshops, co-working / debugging, and informally sharing what we've been working on. As usual with starting up something new, figuring out the scheduling was a major challenge. There was also some question about what the way in could be for participants who don't currently have a DH project of their own. The answer turned out to be the DLCL dissertations project: grad students in my department were very interested, as well as interested in related data sets that are not currently compiled (or necessarily easily compilable), around course offerings / syllabi and reading lists. In a moment where at least one of my sub-departments (Slavic) is talking with grad students about reshaping things like program requirements and reading lists, the students were very interested in the potential of a data-driven perspective on those issues. Realistically, we won't be picking up this thread until spring, but I'm hoping to make some progress on getting some of the course data (or at least getting a sense of what is possible to get) in the meantime.

Writing

As part of the upcoming IMLS National Forum Data Speculations: A National Forum on Library Digital Stewardship for Copyrighted Contemporary Culture, I wrote a position paper on what we lose when in-copyright data is not available to us in a computable (and shareable) format. I'll be discussing it at the forum later this month.

Adrian Daub and I also finished up an article on "Cancel Culture" articles on Wikipedia.

Talks and events

At ACH 2024, I presented on a #DHmakes roundtable, talking about data visualization with textiles, and was one of the speakers on the panel on "Emerging Pathways to Supporting Digital Humanities Research on Copyrighted Literature". I also revisited my "Taxonomy of Failure" 2018 talk for the HERMES doctoral network in Germany, at a virtual event on error culture. Finally, I held a community event on data visualization with textiles at UCLA at the end of the year, organized by Cindy Nguyen, an assistant professor at UCLA who I worked with at UC Berkeley when she was a grad student.

Upcoming

For the first time, I won't be writing an ATS round-up for winter 2025. I'll be taking leave in February and March -- the first time I've taken any sizable amount of time off in my entire life. I'm looking forward to picking things up in spring, teaching Data Visualization with Textiles, continuing with the Multilingual DH Co-Op, along with ongoing project work.