It’s hard to believe that fall is already here. A couple weeks of virtual first grade have given me some time to start preparing for teaching my non-English DH course this fall, but as usual, a lot of it will be worked out in the moment, responding to the students’ own languages, projects, and needs. Before getting too far into the work of fall quarter, here’s the summer 2020 round-up.
Over the summer, I held a workshop for CESTA’s summer interns about how to structure and organize data, and consulted with a number of summer projects on various code and data issues. I also had the chance to chat with a number of undergrads, recent grads, and people starting graduate programs in the humanities and library and information science about DH. Most of them had a background in some non-English language. It’s really beautiful to see the interest in DH from people who work in other languages — seeing DH methods make real inroads in non-English languages has been my dream for the last 15+ years, and now it’s happening. I have a great deal of hope for a future where people who want to use DH methods for non-English languages don’t have to justify that choice (beyond what anyone has to justify when doing DH), or feeling like they’re the only person doing that work in their language.
Stanford has extended its pilot of the Hypothesis/Canvas integration through the fall, so I did some outreach work to promote the integration to faculty in my division. I coached a few people through using Hypothesis without that integration last spring, but having it part of Canvas makes a huge difference. (I’m using it as a major part of my own course this coming fall, too, but I may not have bothered if it weren’t for that integration.) I’ve also worked on facilitating access to films and film clips that the library has purchased. More than once this summer, I’ve appreciated the split departmental/library nature of my position, and having a metaphorical seat at both tables.
Understandably, there’s a lot of interest in DH methods among later-stage grad students who are looking at the current (general lack of) academic job market with horror. My division chair, Cécile Alduy, has been taking steps to support professional development for graduate students with an eye towards a broad range of job outcomes, which I’ve personally found inspiring. I’m hoping to get into questions of what “DH” looks like in other, non-academic contexts as part of my non-English DH course in the fall.
Global Medieval Sourcebook
Over the summer, I’ve been working with Danny Smith to add some new texts to the Global Medieval Sourcebook. One of the interesting challenges of this work has involved RTL languages: the Versioning Machine software wasn’t built to handle those. Luckily, I have a generous and brilliant colleague in Simon Wiles at CIDR in the library, and he helped me sort out the CSS necessary for the RTL-languages to be right-justified, and the LTR-languages to be left-justified (for both prose and poetry, which use different encoding).
By the end of the year, we’re hoping to retire the current Drupal site for the Global Medieval Sourcebook and replace it with a static, Wax-based site.
Poetic Media Lab projects
This quarter, I’ve wrapped up my involvement with the Florentine Codex project, after using Transkribus to develop a model that could transcribe a large amount of the text. Compared to having project PI Obed Lira working with undergrads to transcribe the entire manuscript, Transkribus has provided a hugely impactful technical intervention.
I was happy to support Nelson Endebo's Life in Quarantine project officially launching over the summer, as well. This, along with another project focused on the impact of COVID-19 on international students, have benefited from consultation with Josh Schneier, the Library’s new University Archivist, along with Natalie Marine-Street, who works with the Stanford Oral History Program. For project that are trying to capture the present moment, having the input and the support of people with experience in these areas is a huge help.
Over the summer, I also explored the options for doing IIIF-compatible annotation without running an annotation server for Fyza Parviz’s project looking at Arabic astronomy manuscript texts.
It’s been a relatively quiet summer for the Data-Sitters in terms of publications (other than “The DSC and the Impossible TEI Quandaries”, our “book” on the Text Encoding Initiative with Elisa Beshero-Bondar), but I and others have been working on multiple new “books” that we should be able to publish throughout the fall, on topics including copyright, text comparison, machine learning models for text generation, stylometry, and others.
Lee Skallerup-Bessette and I got a proposal accepted to the “Flyover Comics Symposium” (about different French translations of the Baby-Sitters Club graphic novels), and this fall we’ll be giving talks about the Baby-Sitters Club at a DH symposium at the University of British Columbia, and at Northeastern University (where Ryan Cordell is teaching most of the things we’ve written so far as part of his Intro to DH course).
Projects on hold
There are many projects I wasn’t able to make progress on this summer, including data cleaning for Multilingual Harry Potter Fanfic, JSTOR for research, Ostrov: a Russian radical feminist zine, and Palladio bricks.
I’ve consulted with professors Marie Hubert and Sarah Prodan about their ideas for new DH projects. As with so much DH work, there’s a long road from gathering the necessary data in a workable format to taking the first step towards analysis.
As it happens, the summer wasn’t terrible for being able to write up some of the work I’ve done so far, in a format that other people can learn from. Along with a group of current and former high-performance computing (HPC) colleagues, I finished an article for “Debates in DH: Computational Humanities” about the intersection of HPC and DH.
In collaboration with three students from my winter course on project management and ethical collaboration, I wrote a chapter for a “Debates in DH: The Digital Futures of Graduate Study in the Humanities” and undertook peer review for other chapters in that volume.
I also wrote up a piece on minimal computing, for a special issue of DH Quarterly — mostly critiquing how usable “minimal computing” web development methods actually are, and proposing infrastructural interventions that the DH community could undertake to improve the situation.
The peer review process is wrapping up for the “Debates in DH 2021” piece that Patrick Burns and I co-authored about multilingual DH, and I’ll be happy to have it out in the world (relatively) soon. It also benefited from some feedback from colleagues in the library as part of the recently-restarted library reading group.
Meanwhile, the tutorial on preparing non-English texts for computational text analysis methods that I wrote last fall has been published, just in time for me to use it in my class this fall.
Talks and Events
Right after my spring write-up, I attended “Building Legal Literacies for Tex and Data Mining”, an NEH-funded workshop relevant to multiple projects that I support. During the course of the workshop, I had the opportunity to meet Matt Sag, a law professor at Loyola University School of Law and an expert on fair use. Since the institute, we've put together an upcoming Data-Sitters Club piece on copyright that should be published this fall. In addition, I’m hoping to put together a LibGuide on the library website about copyright and fair use.
Along with Kalika Bali (from Microsoft Research), I facilitated a weeklong panel as part of the “Disrupting Digital Monolingualism” through “Language Acts and Worldmaking” at University College London. We brought together a group of scholars and industry folks who work with multilingual NLP, and hope to produce a white paper summarizing their perspectives around opportunities and needs for multilingual NLP.
The international DH conference is my conference, and I was very sad (though sympathetic) that it didn’t happen this year. I didn’t have the time to organize something for most of the things I’d gotten accepted for the mostly-asynchronous manifestation of the conference. That said, I did manage to put together a simple website with opinion pieces about DH directories (in lieu of the “roundtable” panel accepted for the conference). I also live-tweeted the “Demystifying ADHO” roundtable, and presented at the tool criticism workshop sponsored by the Digital Literary Stylstics ADHO SIG.
But my favorite memory of DH 2020 is something that I didn’t submit as part of the usual process during fall 2019. Along with Artjoms Šela and Shawn Moore, I organized a series of lightning talks in the game “Animal Crossing: New Horizons” on the Friday of the DH conference. It felt amazing to be able to be somewhere “in person” (in some sense) with colleagues during the DH conference. And since there’s reason to expect the COVID-induced world of virtual everything will persist for some time, I’ve recently been talking with Liz Grumbach about imagining a DH reading group that meets via Animal Crossing and streams to Twitch.
Towards the end of the summer, I did a week of workshops on statistics, machine learning, deep learning, and NLP through the ICME summer workshop series on the fundamentals of data science. The workshops went better than I expected, and I learned a lot — without the domestic tension that would result from early/late days at work if I’d tried to attend them in their pre-COVID in-person form. I really hope this kind of online training sticks around after we can go back to campus.
I also took a couple weeks of “vacation”, but it mostly amounted to housework and left me eager to return to work.
Working Groups and Organizations
It feels so very ironic that DH-WoGeM (Women and Gender Minorities in DH) got approved as an ADHO (international DH umbrella organization) special interest group during the summer, but the amount of juggling I’ve needed to do around managing childcare has meant that I haven’t been able to follow up on it or have it formally announced. I look forward to having downtime at some point that isn’t interrupted by childcare closures due to smoke.
Meanwhile, while the Multilingual DH group SIG proposal hasn’t been accepted yet, I’ve had some wonderful conversations with Cecily Raynor at McGill University, the incoming head of ADHO’s Multilingual and Multi-cultural Committee, about what that committee might be able to do within ADHO to better support non-English DH work.
I started my term as co-VP of ACH, where I’ve been working on reimagining the mentoring committee and creating a series of working groups that people can use as a way to get involved with the organization.
Over the summer, the Masquaraders (the group of library sewists that Julie Sweetkind-Singer and I coordinated) finished up their task of sewing three masks for every library staff member. We’ll be depositing photos from the project, along with our instructions, in the Stanford Digital Repository for posterity, and Julie put together a lovely write-up on the Stanford Libraries blog.
Finally, the thing I’m proudest of having accomplished over the summer is administrative: I worked through the process of Stanford Libraries joining the READ Co-op that supports Transkribus and purchase a pack of 30,000 pages of transcription for when the per-page cost model starts. The process for joining, the rates, and the payment options (and turnaround time) aren’t nearly as well documented as the software, and figuring out issues of licensing review and financial transfers on our end was similarly challenging. But I’m excited to go into this fall knowing that Transkribus will be able to work its magic for scholars throughout the year.
Photo credit Houda Lamqa, from the ADHO ACNH lightning talks