What’s your question? That’s the first question I always ask when students or colleagues come to my office to ask for help. Invariably they want to know what tools they can use to study a bunch of documents and how those tools work. What I want to know is what it is they want to know. Some questions that researchers have asked using computational models are:
– how semantically coherent are genres over long periods of time?
– how do bestselling novels differ from prizewinning ones?
– how predictable is the construction of gender in novels and how has this changed?
– is there a relationship between linguistic repetition and a subjective style in East Asian modern literature?
– what qualities are distinctive of fictional writing and how universal are they?
– how many creative periods do poets typically go through and is there something unique about their late style?
These are just some examples to get your wheels turning. The important point is that questions can range from the very specific (a single linguistic feature in a single geographic and historical context) to the very general (what are the features that distinguish fictional storytelling from the truthful kind?). The even more important point is that they are all framed as questions: what is it that you (and we) do not yet know that you would like to know? Perhaps most importantly, Why does this question matter?
Developing a good question can be thought of as an engagement with existing theories about how the world works. When defining your question, it is essential that the question be a combination of something that you personally care about (researcher passion is essential) and also something that other researchers care about (you are not working in a vacuum). A good question emerges from an immersion within an existing body of knowledge. By studying the work of others, you gradually come to see what they have missed. Where is a theory incomplete?
The first practice of modeling then is called the questioning phase. It is here where your question emerges from an existing theory about the world. One of my favorite examples to illustrate this point is the short paper by Ian Lancashire and Graeme Hirst.[1] In it, they aim to understand the relationship between cognitive decline and creative writing in the work of Agatha Christie. Based on their knowledge of the medical literature on human aging, they encountered theories about the relationship between age-related cognitive decline and the decline of linguistic competence. Would this relationship show up, they decided to ask, in the work of a creative writer, whom they knew from biographical information suffered extreme cognitive decline in her late age?
You can see how Lancashire and Hirst start with knowledge of a given field and then ask a question that had not been asked before. This brings us to the first rule of computational modeling: domain knowledge is essential. This may sound obvious but in practice it is often skipped. All of the work that scholars have done up until now in your field matters greatly. Engage with it and immerse yourself in it. Then try to figure out what’s wrong with it.
A second aspect to note about Lancashire and Hirst’s example is the way it also addresses the question of why a question matters. There are lots of questions out there. Many of them are meaningless. Why is this a good question? For Lancashire and Hirst, understanding the relationship between a writer’s output and their cognitive decline could potentially contribute to knowledge about early diagnosis. It isn’t just “interesting” to know whether Christie’s decline manifests itself on the page. It is also potentially useful. There is a long way to go between the first insights about Agatha Christie and a general understanding of the relationship between creative writing and cognitive decline. But that’s the nature of research. It’s a pathway or space that can’t be explored all at once. So never stop asking yourself: Why does this question matter? And: Whom am I asking it for? Is it just to impress your teachers or senior colleagues or is there some constituency out there that will benefit from what you know? Thinking about the potential audience or public that your knowledge might benefit is an essential part of the research process. If the answer is “everyone,” then you probably don’t know whom your work is really for.
A final point I want to make is how Lancashire and Hirst’s question engages an existing more general theory at a lower level of specification. They ask whether the relationship between creative writing and cognitive decline manifests itself within a particular writer, not all writers ever or certain kinds of writers. The important point is that you can scale up and scale down the generality of your question in both directions. You may be interested in Agatha Christie, but it is important to see how your question is part of a more general line of inquiry about the relationship between mental illness and creative writing (or even creativity itself). Even if you apply your method to a single writer, in theory it should be scalable to study larger populations of writers.
This brings us to a second rule of computational modeling: modeling is fundamentally about aprocess of specification. As we build a model we move increasingly down the ladder of generality towards greater degrees of specificity. You may begin somewhere in the middle, but it is important to know where the top is (roughly speaking) and just how far away you are from it when you’ve finally implemented your model. This will help you modulate your claims when you’re done. How far did you descend and what can you say about the upper portions of your theoretical ladder given how far away you are now? This is what makes much of traditional scholarship in the humanities seem so ridiculous. A tiny piece of evidence is used to make massive claims about the world with no reflection on potential limitations. Try to avoid this.
[1] Lancashire, Ian, and Graeme Hirst. “Vocabulary Changes in Agatha Christie’s Mysteries as an Indication of Dementia: A Case Study.” Nineteenth Annual Rotman Research Institute Conference, 8–10 Mar. 2009, Inter‐ continental Centre Hotel, Toronto.