One of the most memorable panels at ACH was the one with DH Developers talking about how they got to where they are. Matthew Lincoln, Zoe LeBlanc, Rebecca Sutton Koeser, and Jamie Folsom all came to their careers in different ways, and work under different conditions. Even the largest shop (Scholars' Lab), with enough staff to have "junior" and "senior" developer roles, feels small, and everyone seemed interested in cultivating more of a community that crosses institutional boundaries. Some folks, including Rebecca, have begun to get involved with the emerging Research Software Engineering scene in the US. At the end, they called for more developers to tell their "origin stories", referencing the 2013 "Speaking in Code" workshop. 

To DH and ACH with a Skeleton in Tow

July was the month of digital humanities conferences, with DH 2019 in Utrecht, closely followed by ACH 2019 in Pittsburgh. I was fortunate enough to attend both, and the experience has left me reflecting on the different shapes that community takes within digital humanities.
My DH 2019 started with a workshop on doing DH in non-Latin scripts, which left me grateful that — other than the occasional Unicode issue — the script isn't a major challenge for doing work on Russian. The workshop mostly focused on Near Eastern (often right-to-left) writing systems, with some Chinese, Japanese, and Korean as well. While relatively little of it was directly relevant to projects I’m currently working on, I was glad to get a better sense of the current state of the art for optical character recognition, and approaches to text linking, annotation, and display that are geared towards non-Latin alphabets. Despite the linguistic diversity in the room, the workshop attendees had a surprisingly good rapport; many of us went out for drinks together that evening to continue the conversation. We decided to start an ad-hoc working group with a mailing list, a basic home page and a collection of resource lists that we’ll collectively maintain, including the guide to non-English NLP that I initially put together for a talk at UCLA last spring.

DLCL ATS round-up, spring 2019

My first academic year at Stanford has come to a close, ushering in a summer that promises to be surprisingly busy, despite the relatively empty hallways and offices around campus. Working in central IT, I’d forgotten what it’s like to be so directly impacted by the rhythms of the school year, and the way everyone heads overseas as soon as finals wrap up.

Summer is already underway, with CESTA’s undergraduate interns starting yesterday, and deadlines drawing near for finishing things in time for the international DH conference. But before I get too far into things this summer, here’s what I’ve been up to over the course of the spring.

Brandes in translation: multilingual corpora at the Digital Brandes Hackathon

What do you do when you're invited to a hackathon around a text in a language you can't read? In keeping with my tendency to navigate difficulty by means of additional complications, I added more languages!

Prof. Tim Tangherlini (UCLA) organized the Digital Brandes hackathon at the UC Berkeley Scandinavian department on April 25-26, which brought together the Danish team of Brandes and Danish computational linguistics experts behind an upcoming digital edition of Georg Brandes's groundbreaking work on 19th century literature. While not all the technical folks (which included my CIDR developer colleague Peter Broadwell, Dave Shepherd from UCLA, and Peter Leonard—virtually— from Yale) could read Danish, I was at a double disadvantage by having a background in Slavic (not Scandinavian) linguistics (not literature). The first thing I had to learn was who Georg Brandes was. The second thing  I had to learn was what romanticism was, and what followed it, and I'm grateful to Peter Broadwell for his Cliff Notes version thereof.

Translating Language, Culture, & Form at "Workshop on Digital Humanities to Preserve Knowledge and Cultural Heritage"

On April 15th, CESTA hosted the Workshop on Digital Humanities to Preserve Knowledge and Cultural Heritage, bringing together scholars working with a wide range of materials and methods. The workshop was convened by the Rosetta Project (ResOurces for Endangered languages Through TranslAted texts), a collaboration between Stanford English professor and Director of American Studies Shelley Fisher Fishkin, and colleagues from the Université de Lille  Ronald Jenn (Professor of Translation Studies and Digital Humanities) and Amel Fraisse (Associate Professor of Information and Computer Science, Digital Humanities and Language Processing), along with Zheng Zhang (PhD student in Natural Language Processing at Université Paris-Saclay). The project builds on the work of an earlier “Global Huck” project (which aimed to collect and examine all the translations of “Huckleberry Finn”) by using that collection as a large parallel corpus for developing NLP resources.