Design a site like this with
Get started

What is this?

And who is this for?

The fish and the painting is a book blog for a new handbook on doing data-driven research in the humanities. While there are a bunch of tools and resources out there to learn programming, especially with respect to text analysis, what I haven’t seen is a comprehensive place that integrates learning those tools within a humanities research context.

My experience has been that the biggest obstacle to adapting computational methods in the humanities is not technical but conceptual. How can we frame questions based on the way our objects of study become data? And how can we do so in an accessible way — not just in terms of open access (though that’s important too which is why I’m doing this open blog), but also in terms of conceptual and methodological access. My sense is that on-ramps to this kind of work are still not smooth enough for most people. This process of on-ramping and framing good research questions is what this book is going to be about.

I say “going” because it’s a work in progress. The aim is to produce a fully open-access resource, from beginning to end. This is still all too rare for textbooks these days. The field of open access textbooks is nevertheless growing and I want my work to be situated within that movement (and please do make suggestions about future publishing options). There’s really no reason to charge excessive amounts for access to basic knowledge anymore.

My greatest hope in doing this book out in the open from beginning to end is to get as much input and dialogue on the way as possible. I’ve turned on all the comments sections to all pages. This is the type of book that can really benefit from numerous eyes and interlocutors. To that effect, please help me:

  • find bugs, either technical or conceptual
  • draw attention to missing steps (this was confusing to me)
  • highlight overlooked areas that are important to your field.

“Humanities” is a large umbrella, and I have just one point of view. There are undoubtedly going to be different goals for different communities that I could incorporate better into this book. Please do post responses liberally throughout to anything you see here that can help make its audience wider.

And yet one of the things I find most appealing about data-driven research is the way it converges around a series of shared methods. I can talk with someone in library science, polysci, sociology, history, or English and while our documents and questions are always different, much of what we do in between is highly connected. This book takes advantage of that convergence and tries to foreground these common methods to facilitate novel questions in multiple domains.

Who is this book for?

This book is for anyone in the humanities who is interested in studying a larger number of documents than they can read by hand. It is geared towards someone with little or no knowledge of computational methods and thus should be suitable for advanced undergraduates, graduate students, and faculty alike. It is meant to be fun and somewhat lighthearted or at least not too text-book-y. Because textbooks can be intolerably boring, so I thought, really why not?

As I discuss in the hands-on portion of the book, the assumption is that you have done some “intro to R” type tutorial (whether in class or online) so that the commands are legible to you. For those who are more advanced, the book will give you a host of usable scripts to perform the analysis you want along with discussions about issues to consider as you implement them. When it comes to analyzing documents nothing is ever set in stone (except of course texts written in stone). By that I mean the field is both evolving and involves a considerable degree of researcher flexibility when it comes to applying and interpreting tools. This is very important and researchers at all levels will benefit from the reflection here on pros and cons of different approaches.

The book does have one particular focus, however, and that is “text.” If you are interested in studying images or sound, then most of the sections on tools will be less relevant, though modules on machine learning and social network analysis, as well as the larger theoretical stance of distributional modeling, will still be very important for you. The first long section on modeling should still be required reading for anyone wishing to undertake computational work in the humanities. But for the actual nuts and bolts of handling sound or image data, for that you’ll have to look elsewhere.

Last, this book is for people who are excited to help build a new community of scholar practitioners. What pieces do we need to put in place so more people can do more of this kind of work? And what mindsets do we need to change to orient our scholarship around real-world needs and constituencies? How can data be a useful analytical tool for understanding culture in the past and present? My hope is to facilitate the creation of a community of positive, creative and reflective thinkers.

Ok, dive in.

One thought on “What is this?

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: