DLCL ATS round-up, spring 2020

Quinn DombrowskiJune 22, 2020
DLCL ATS round-up, spring 2020

In describing this quarter, it’s hard to avoid the cliches of our time, starting with “unprecedented”. Daycare and public schools in Berkeley shut down at the end of winter quarter, and daycare — for my two youngest children, 4 and just-turned-2 — didn’t reopen for 50 workdays. The 6-year-old has been upgraded to “junior coworker”, and all signs suggest that he’ll be home every day for at least the first half of next school year. Our current “new normal” has no shortage of challenges, but it’s been an interesting space for experimentation.

Teaching

I've signed up to teach my non-English DH course online this coming fall, and I’ve started thinking about how to rework it for that new medium. Because the course is now filling in for a medieval DH offering as well, I’m thinking the technical portions of the class will be something of a choose-your-own-adventure — any students working on medieval texts might be very interested in Transkribus for handwritten text recognition, but may be more limited in what they can do out-of-the-box with NLP (though I hope at least some of them can give CLTK a try.) Putting this course together will be one of the major things I’ll be working on this summer.

I held an info session for undergrads potentially interested in DH late in the quarter, and had a chance to talk to a rising junior who had initially considered majoring in comp sci before switching to comp lit. I’m hoping to hold more of these over the next year, and encourage interested undergrads to include a DH component as part of an honors theses.

As much as teaching support is not usually in scope for my position (and we’re fortunate to have really excellent staff at the Center for Teaching and Learning), unusual times meant stepping in to help with the emergency online shift for spring quarter. For a few weeks, I held office hours for anyone who wanted to talk about online teaching, and fielded inquiries ranging from how to demonstrate writing the Persian alphabet via Zoom, to options for students to record their attempts at chanting, to ways to adapt a 3-hour discussion seminar, to tools for discussion via text annotation. I came up with some instructions for how instructors could use Canvas pages + the Hypothesis browser plugin for text annotation, worked with librarians to find the necessary course readings in a digital form, and did a lot of OCR. This summer, the Canvas team is piloting the Hypothesis Canvas plugin, which streamlines the annotation process. If the pilot continues past the summer, I’ll definitely be using it in the fall.

Existing projects

Global Medieval Sourcebook

After scheduling, re-scheduling, and re-re-scheduling, we finally found a time that worked for the project team and all the library groups involved in the new version of this project. We put together a plan for how we can get all the texts and metadata published in the Stanford Digital Repository and make them publicly findable through MODS records for our text collections, all of which should be ready by January 2021.

We got an extension on our paper for the Germanic Studies journal Seminar, which we finally submitted last week. I’m happy with how it turned out, evolving into an honest reflection on what worked and what didn’t, rather than the laudatory project description so typical among these kinds of papers.

French Revolution Digital Archive

Further progress on the user-facing documentation for the new web interface of this project has been on hold.

Poetic Media Lab projects

The Poetic Media Lab has been very active over the spring. I’ve consulted on the new “Life in Quarantine: Witnessing Global Pandemic” project collecting stories from around the world, and discussed technical platform choices for another new project building on student writing. Earlier in the quarter, I ran some Zoom sessions where I shared my screen as I did some WordPress customization work on the Poetic Thinking sites, both to help debug a set of issues and talk through the thought process involved in doing this kind of work.

On the Florentine Codex project, I trained my first Transkribus model which was able to do downright magical things — even on low-res page images.

This initial model was trained on only 60-some pages, but it was enough for bootstrapping more pages. I used it to do a first pass on transcribing another 30 pages, which other team members corrected, and we trained a better model on about 90 pages. I let the model loose on the full 700 pages of volume 1 of the codex… only to discover that most of the pages were rotated sideways, rendering them untranscribable. I’m looking forward to trying again once everything has been rotated correctly!

One of the most useful things I was able to do this quarter was to sit in on a meeting with an external web development company as they discussed what they could offer for rebuilding the Lacuna text annotation platform. Meetings between academics and external developers can easily end with each group leaving with a very different understanding of what was discussed or agreed to. When I was at UC Berkeley, I often joined these meetings as an interpreter, clarifying for both groups what the other was trying to say. This time, I worked more in the background, interpreting what the developers were actually saying through a back-channel text thread, and encouraging follow-up questions. It was satisfying to be able to put my background in IT and web development to use in a way that could inform the early stages of a project (including whether or not to go forward with it), instead of being brought in later to clean up something that had gone sideways.

Other web-based projects

Without access to the recording studio, Entitled Opinions has been re-releasing earlier episodes. Getting their content accessioned to the Stanford Digital Repository has been on hold this quarter.

Multilingual Harry Potter Fanfic

Our data for this project needs more cleaning, and writing Python takes more attention and focus (and quiet) than I’ve managed to get this quarter. So it’s been on hold.

The Data-Sitters Club

This project continues to be one I fall back on for comfort and a bit of fun, during nights and weekends when I’m too tired or scattered to do much else. Early in the quarter, Lee Skallerup Bessette and I finished DSC Multilingual Mystery #3: Quinn and Lee Clean Up Ghost Cat Data-Hairballs, which covers web scraping and data cleaning with OpenRefine. Anouk Lang published DSC #4: AntConc Saves the Day, which I’ve already pointed students to as a resource for getting started with text analysis. For a while, I turned BSC book covers into COVID-19 themed memes, which caught the attention of Elizabeth Redden, a reporter for Inside Higher Ed and childhood Baby-Sitters Club fan with an amazing memory for the series. She ended up writing up the Data-Sitters Club for that venue, which had the dubious honor of being the only non-depressing thing on the front page of IHE for a day or two.

A Czur book scanner, courtesy of the DLCL, has meant that the French part of this project can carry on. My pile of books to scan and OCR has grown this quarter, too; as a sort of birthday present, Lee and I bought a bunch of Quebec translations, and I picked up a few more Belgian translations as well. We’ve also started to compare the older translations from France with the newly-re-released versions, with some interesting preliminary results — more on that in a future DSC Multilingual Mystery.

JSTOR Data for Research

Analyzing the data that Masha Gorshkova and I got from JSTOR has been on hold.

Ostrov: Russian radical feminist zine

Most weeks, I’ve been meeting with Margarita Nafpaktitis and Christine Jacobson (from Harvard) for an attempt at having a reliable time to sit down and work on our data cleaning for our respective projects, including Ostrov. Progress has been hit-or-miss (sometimes these calls devolve into collaborative online fabric shopping), but if nothing else, it’s been a reliable point of social connection with coworkers during the week, which has been valuable in its own right.

Palladio Bricks

My CIDR developer colleague Simon Wiles has had some wonderful ideas for reimagining Palladio Bricks, and has made progress on a prototype. Meanwhile, I’ve been mulling over what it might look like integrated with Wax for static exhibits. That’s about as far as this quarter has let us get.

New projects

Other than the new Poetic Media Lab projects, I haven’t had the time for all my existing projects, let alone anything new. Further Russian NLP group developments are on hold. Around DH 2020 is on hold indefinitely.

Writing

This quarter, I was involved in the peer review process for the Debates in DH 2021 collection, and peer reviewed two articles for Digital Studies/Le champ numérique.

I didn’t manage to write a paper for the DHSI   project management workshop, but I’m really happy with the video I put together for the workshop, “Rolling the Dice on Project Management”, talking about the course I taught in the winter.

Along with Agnieszka Backman, Sabrina Grimberg, and Melissa Hosek (who all took the project management course), I got an abstract accepted to a future Debates in DH volume on the future of graduate education, as a short 2,000 word piece. The piece will be on how the course, and the associated RPG, provided a space for the students to directly confront and discuss how universities actually work, and their place within those systems and structures.

Yulia Ilchuk, J.D. Porter, and I made some progress on an article write-up of our translation project, following our LitLab presentation in the winter, but there’s still some work to be done on it.

A few colleagues in the library started a zine, OK Zoomer, that I’ve enjoyed contributing to. So far we have two issues; for our most recent issue on “home economics” I wrote a review of a wacky 80’s sweatshirt pattern.

My former DH HPC colleagues and I are wrapping up our chapter for Debates in DH: Computational Humanities, which is due at the end of the month.

There are a handful of other things I’d committed in the before-times to writing this summer, but mercifully, almost all of them have had their deadlines extended. Academic writing has been the second-hardest thing to get done with kids at home, after writing Python.

Talks and Events

Last summer, I’d committed to giving a talk for CMEMS in April, and I ended up being the first in their series of virtual talks. While notionally being about the Old Novgorod birchbark letters, it took a bigger-picture approach, reflecting on the possibilities for expanding the community of participants in events like CMEMS’s lunch talks to welcome, for instance, medievalists who’d left the academy after getting an advanced degree, but who continue to be interested in these topics.

I worked with CESTA and DH organizations up and down the West Coast to organize a CESTA lightning talk event and a West Coast DH meet-up as part of the global Day of DH 2020 event hosted by centerNet. As a special add-on for parents with kids at home, under the auspices of Stanford’s Textile Makerspace, I ran a coloring contest for Day of DH.

Through a last-minute invite, I ended up giving a 3-minute talk on failure at DARIAH VX in the early hours of the morning.

UC Berkeley’s DH working group invited me to give a talk, and I adapted the talk I gave at the University of Virginia about the Multilingual Harry Potter fanfic work; sadly, the timing didn’t work out for the rest of the project team to join me, but I’m holding out the hope that the day will come when we can all present together.

Over a weekend (which brought no shortage of challenges related to multitasking while child-wrangling), I attended and presented at the DHSI workshop on project management, and attended the DHSI RTL workshop, primarily via pre-recorded videos with discussion on Twitter.

The Canadian DH organization, CSDH-SCHN, held their conference online the same week as DH Benelux. It made for a fascinating, if exhausting, experience watching how two different organizations handled the shift online. I organized an afternoon discussion about Multilingual DH, participated in a discussion around Feminist/Queer/Trans DH, and sat in on some early planning around graduate student mentorship for CSDH-SCHN. And as often happens, the talks about current work underway in Europe (especially around non-English NLP, text analysis, and machine learning) tends to speak more directly to the kinds of work I’ve been supporting and collabrating on.

Working Groups and Organizations

By and large, it hasn’t been feasible for me to continue running most of the working groups and organizations that I’d been involved with in the before-times. We managed one round of CESTA lightning talks for our local DH community at Stanford as part of the Day of DH event. DH-WoGeM has been on hold, along with Danger Noodle Club (the Python co-learning group), the Russian NLP group, the DH reading group for grad students, and Six Septembers: Mathematics for the Humanist reading group. (The one time we tried having a Six Septembers reading group, no one had finished the readings. A pandemic is awful for the focus necessary to understand the math behind machine learning.)

The Textile Makerspace, finally hitting its stride in the winter, is not likely to return anytime soon. Even if everyone were on campus regularly (which shouldn’t, and won’t, be happening), social distancing would be hard in the small, no-ventilation room we use. But there are some new creative opportunities in the works, including the possibility of “carts” of sewing supplies that students could check out from the library. I’ve also been co-leading a project organized through the library to provide 2-3 washable cloth face coverings to every library staff member. The group, named “The Masquaraders” (the QUAR is for quarantine), has been using the #TextileMakerspace Slack channel to trade tips and questions, and we’ve even had some Zoom-based sewing time together (on mute, so the clacking of the sewing machines doesn’t drive everyone crazy). Over 50 people signed up to participate as sewists, and working with The Masquaraders has been one of the most satisfying — and fun — things I’ve done this quarter.

Along with recent history graduate Rachel Midura, I started up a working group for people who use Transkribus for handwritten text recognition. It’s been great to have a place to share the things I’m figuring out about Transkribus, and help folks in the departments and library at Stanford use this amazing tool. I expect it’ll play an important role in growing the Transkribus user community on campus, particularly as we move ahead with joining the READ Co-op that maintains Transkribus.

When I wrote my winter quarter round-up, the ACH election was ongoing, and I was running for VP/president-elect against fellow Data-Sitter Roopika Risam. Before the election closed, we went to ACH with a radical proposal: becoming co-VPs. They accepted our offer, and we wrote a blog post about it. While we’re not “official” until the ACH exec meeting in July, we were invited to participate in the group's ongoing work, such as crafting and following up on the statement on Black Lives Matter, structural racism, and the organization.

The Summer Ahead

It’s still hard to know how the summer will go, though I’m trying to make plans as far as it makes sense to. I’ve been encouraging everyone I work with to “attend” DH 2020, since it’s now online and free. I’ll be preparing materials for my fall course, and supporting my division around digital pedagogy as needed. Faculty and grad student research projects have been starting to pick up again, and I’ll be helping to support CESTA’s undergraduate summer intern program. And The Masquaraders will keep sewing until all the library staff have face coverings. The feasibility of any of this is going to depend on the younger kids’ daycare remaining open — not to be taken for granted as COVID-19 case counts in California continue to rise. We’ll see how it all plays out.