Skip to content Skip to navigation

David Blei: Probabilistic Topic Models of Text and Users

Date/Time: 
Monday, 24 February 2014 - 1:00pm to 2:30pm
Location: 
Graduate School of Business McClelland Bldg, Room M104

The Institute for Research in the Social Sciences Data, Society, and Inference Seminar Series is pleased to announce a talk by David Blei, Associate Professor of Computer Science, Princeton University

Probabilistic Topic Models of Text and Users

Abstract: Probabilistic topic models provide a suite of tools for analyzing large document collections.  Topic modeling algorithms discover the latent themes that underlie the documents and identify how each document exhibits those themes.  Topic modeling can be used to help explore, summarize, and form predictions about documents.

Traditional topic modeling algorithms take a document collection as input and analyze the texts to estimate its latent thematic structure.  However, for many collections, there is an additional type of data:  how people use the documents.  For example, readers click on articles in a newspaper website, scientists place articles in their personal libraries, and lawmakers vote on a collection of bills.  User behavior data about documents is essential for building automatic recommendation systems and, further, gives new ways of understanding how a collection and its users are organized.

In this talk, Professor Blei will review the basics of topic modeling and describe our recent research on collaborative topic models, which simultaneously analyze texts and corresponding user behavior data.  His group studied collaborative topic models on a large collection of 80,000 scientists' libraries and the 250,000 abstracts of the corresponding articles.  With this analysis, we can build recommendation systems that point scientists to articles they will like and, further, organize the scientific literature according to the discovered patterns of readership.  As examples, we can identify articles that are important within a field and articles that transcend disciplinary boundaries.

More broadly, topic modeling is a case study in the large field of applied probabilistic modeling.  Finally, Professor Blei will survey some recent advances in this field.  He will show how modern probabilistic modeling gives data scientists a rich language for expressing statistical assumptions and scalable algorithms for uncovering hidden patterns in massive data.

Lunch is provided and will be served at 12:40 pm. Please RSVP to kswall@stanford.edu by Friday, Feb 21.