Literary Mathematics: Table of Contents | Stanford University Press

Literary Mathematics

Quantitative Theory for Textual Studies

Michael Gavin

Introduction: The Corpus as an Object of Study

Across the humanities and social sciences, scholars increasingly use quantitative methods to study textual data. Considered together, this research represents an extraordinary event in the long history of textuality. More or less all at once, the corpus has emerged as a major genre of cultural and scientific knowledge. This chapter describes how quantitative methods for the study of textual data offer powerful tools for historical inquiry and sometimes unexpected perspectives on theoretical issues of concern to literary studies.

1.Networks and the Study of Bibliographical Metadata

This chapter examines the conceptual form and structure of Early English Books Online's metadata, focusing in particular on how such data organizes and describes relations among authors, booksellers, and printers. Tracing patterns within this network reveals how the digital archive sorts itself into segments that offer a new perspective on historical periodization. A network model of EEBO's metadata represents literary history as an aggregate of basic elements. From that aggregate, the model can generate complex representations of literary history that are true in broad strokes and at more granular scales. The advantage of a mathematical, complex-systems approach to bibliographical metadata is that it makes such perspectives possible.

2.The Computation of Meaning

Vector-space models use word collocation to represent the meanings of words. The computational methods were designed for the purpose of information retrieval and machine translation but now can be marshaled for cultural analytics. Through many examples drawn from Early English Books Online, this chapter argues that quantitative models of meaning offer a powerful set of tools for historical inquiry and a fascinating account of meaning in their own right. They share many assumptions with Julia Kristeva's idea of "intertextuality." Both view language as an algebraic system where every word entails latent that supervene over every particular use. Computational semantics embrace intertextuality's basic premises while grounding them in empirically observable units, thus lending that theory greater precision and analytical power.

3.Conceptual Topography

Building from a small data-curation project that identifies over 9,000 unique geographical places in the Early English Books Online corpus, this chapter tracks the evolution of geographical writing over the period, showing which places English writers referred to most frequently and how their attention changed over time. By correlating EEBO's references to places with its uses of other words, this chapter demonstrates a method of text analysis that charts the topographies of concepts and reveals the geographical biases that inform the corpus. Whereas geographers conventionally distinguish location from meaning, this chapter demonstrates that place is part of the conceptual structure that subsists among words.

4.Principles of Literary Mathematics

Literary mathematics names the point of contact between cultural analytics and literary theory, where scholars connect the measurable with the meaningful. This chapter charts a path through corpus-based inquiry from text, to structure, to claim, outlining a set of key assumptions that motivate corpus-based inquiry and surveying mathematical concepts for the analysis of corpus data. To study variation across corpora requires modes of description that cross three broad categories – discrete mathematics, matrix algebra, and statistics – which correspond to general concepts in literary studies – form, difference, and significance. Taken together, these concepts constitute a vast, heterogeneous, and highly sophisticated body of theory that remains almost wholly unknown to literary scholars but will prove crucial to studying the digital collections now available.

Conclusion: Similar Words Tend to Appear in Documents with Similar Metadata

The conclusion returns focus to the book's central theoretical argument: Similar words tend to appear in documents with similar metadata. This strangely worded proposition is offered as an invitation to consider the fundamental principles of cultural analytics and to open new lines of inquiry between literary studies, digital humanities, the social sciences, and the information sciences. Quantitative approaches to textual studies can only proceed effectively if grounded on rigorously conceived principles. Those principles, once articulated, reveal how little we actually know about the deep structures of language and history. But they also suggest a path forward and open new vistas of historical and theoretical inquiry.