Not every mathematical paper generates international press, but that’s exactly what a research team including statistics professor Brianna Heggeseth experienced a year ago with “Assessing Diversity in Major US Art Museums,” published in PLoS ONE. Last fall Heggeseth built on those findings in a National Gallery of Art datathon, using data science to quantify representation on the museum’s walls and help curators reflect on past choices. “I’m looking to build bridges across disciplines—that’s the reason why I structure my scholarship and teaching the way I do,” she says. We sat down with Heggeseth to learn more.
Why did last year’s paper make waves?
Our analysis reflected people’s anecdotes: generally, there isn’t much diversity at art museums. We got press for that paper not because our findings shocked anyone, but because it was one of the first large-scale studies of artist representation. We looked at 18 art museums, with thousands of objects in each collection, and we wanted to quantify representation by making inferences about gender, race, and ethnicity. But if you tried to tackle that task manually, it would probably take years.
What did you do instead?
We utilized crowdsourcing through the Amazon platform Mechanical Turk and got our results much faster by dividing the work into very small tasks. Guessing one artist’s gender identity and noting when they lived, for example, might take a minute. It’s like if we took all of Macalester’s students and divided up the tasks, the work could be done in an hour. Mechanical Turk’s technology gives us access to many people, who we pay for their time.
But we also didn’t look at every single artist and object. We used random sampling to make statistical inferences. After our work with the National Gallery’s full collection, we found that our original inferences were right on target, with very similar results. That reassures me that it’s a fairly accurate process to do at scale. And that’s the big takeaway: that someone could do this at scale rather than thinking it’s an unattainable task because of the work hours required. It opens up possibility.
What did your team’s work at the datathon actually involve?
After two months with the National Gallery’s data set, we spent two days in Washington, D.C., touring the museum and talking with senior curators. But the vast majority of our work involved cleaning up the data set before we could analyze it: the provided data set had artists’ names, titles of objects, and other metadata, but not gender identity, race, or ethnicity. We were able to merge our data set with a Getty Museum database to add gender information for some of the artists, then we made educated guesses for the rest. We used historical databases to make gender inferences about first names, and for about 700 artists, our art historian used his own contextual knowledge.
What are the implications of these findings?
If we don’t feel like there is representation and we want change, then we need to know where we’ve been and where we want to go. This work provides a baseline to track initiatives aimed at diversifying collections. Most curators and art historians have so much data and knowledge stored in their brain, and they’re juggling many parameters and restrictions on the art. This gives them a data-driven approach to see how things have changed. It’s validating, because it matches what they think is going on and provides leverage for funding.
There are many people thinking about this problem, and the datathon brought us all into one room. It was an opportunity to bridge collaboration not only among disciplines (art history, statistics, data science, and marketing) but also across institutions. I don’t think anyone created the tool, but sharing our ideas and approaches moved us all forward quite a bit.
How did you land on such an interdisciplinary approach in your work?
In an introductory college statistics course, I realized that data has the power to impact many fields to make a tangible difference in the world. It’s through cross-disciplinary collaborations that we can make the most meaningful impact: taking the qualitative work that humanists do and complementing it with data-driven approaches. One is not better than the other, but together you can make more progress and gain more insight.
How does data science fit into academia?
Data science as a field did not exist when I was an undergrad. What we envision as data science has mostly been driven by industry’s need to analyze large amounts of data—and academia is generally working to catch up. I think that our department does a great job of supporting data science, but many other institutions are struggling to revise their curriculum in a way that prepares students for this type of job, which requires subject knowledge (in this case, art history), critical thinking, and technical skillsets. Liberal arts colleges are in a unique position to produce students that have all these skills, because their students are able to translate between research questions in the field and statistical or data science approaches.
How has your understanding of representation changed through this work?
I talk to my classes about the fact that there’s tension in being a statistician: you need to put people in categories to summarize representation, even as you recognize that individuals don’t fit nicely into boxes for gender, race, and ethnicity. I’ve learned from my students how to better think about race and the gender identity spectrum. It’s made me more reflective in how I define these variables and how I present them.
By Rebecca DeJarlais Ortiz ’06 / Illustration by Jon Benson
April 22 2020
Back to top