site banner

Small-Scale Question Sunday for February 16, 2025

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

2
Jump in the discussion.

No email address required.

Why aren't novels' word counts common knowledge?

Go on IMDb, look up the pilot episode of an obscure American sitcom which was cancelled after one season, and you will find its duration in minutes (a single, objective measure of how long it will take to consume that piece of media). Go on Wikipedia, look up Blade Runner and it will list the various durations of all the various theatrically released cuts, directors' cuts and so on. Any album noteworthy enough to have its own Wikipedia page will have its duration listed in minutes and seconds, broken down by individual track duration (including the duration of various special/bonus editions). If it isn't notable enough to be on Wikipedia, it will be on Rate Your Music.

Meanwhile, if you want to find out how long it will take you to read a book, Wikipedia might tell you the page count, which is next to useless given how many variables contribute to it: font, font size, page size, margin width and height, formatting decisions (a novel which uses numerous paragraph breaks will take up more pages than a novel of the same word count which uses them sparingly; putting a page break before the start of a new chapter can easily add ten pages to a novel's length; because of its bizarre formatting, House of Leaves's word count is probably 25-40% shorter than its massive page count would imply). Various editions of the same unabridged novel with the exact same wording can have enormous variation in how many pages they take up (e.g. this edition of Moby-Dick is 768 pages, while this one is 608). Last night I Googled "Finnegan's Wake word count", one of the most widely discussed novels of the twentieth century, and the first result was one of these automated websites which calculates an estimate of the word count based on the page count (under the rule of thumb that 1 page = 250 words).

I'm not asking for anyone to laboriously go through the process of counting each word by hand. Finnegan's Wake can be purchased as an ebook, which means its contents have been digitised. If you want to find out the word count, all you have to do is open the text file/EPUB file/AZW file and check. Presumably somewhere in the region of 99% of all novels composed in this century were composed using a word processor, meaning the word counts were known (or at least trivially knowable) to the author, publisher, typesetter etc. well in advance of publication.

Before you book to see a film in the cinema, you'd want to know how long it is so you can plan your day accordingly, so cinemas always include this information (although not, annoyingly, the duration of ads and trailers prior to the movie - state Congress to the rescue!). No one would accept a vague ambiguous proxy for the duration of a film like "there are 1,300 cuts in this film" or "there are 30 scenes in this film" - how long is a "scene"? By the same token, before you start reading a book, you generally want to have some kind of idea of how long it will take you to read it. The publisher has access to an objective measure of the book's length (its word count) but refuses to make this information public, instead relying on a vague proxy for its length which is prone to error and can prove enormously misleading. Why is this?

Because we are talking about books. It is very easy to judge a book's length by its physical size.

I outlined at length various reasons why the physical size of a book might be misleading.

Yes, different books of similar size can take different levels of time/effort to read. But even so, extreme outliers are rare. Thus the metric is good enough for common use, thus there's no popular support pushing to have a different metric.

Also, even if we did use a different metric there are going to be outliers. You mentioned House Of Leaves, but that book took me longer to read than books with an equivalent word count. The footnotes are slower going, and the parts of the book where the text is in odd directions take longer because you have to physically turn the book. So if moving to a new metric will still have outliers, why bother?

But even so, extreme outliers are rare.

What are you basing this assertion on?

So if moving to a new metric will still have outliers, why bother?

It irritates me that we insist on using a proxy for the real metric when the real metric is so trivially accessible. To return to the example in the original post: the film's duration is an objective metric for how long it lasts. Some ninety-minute films are a chore to sit through, some films are three-and-a-half hours long but subjectively feel like half that; but at all events, the objective length of the film is a trivial metric to determine. But wouldn't we find it weird if cinemas, distributors, Blu-Ray manufacturers etc. refused to use this metric, and instead were fixated on referring to how many "scenes" or "cuts" a movie has? I mean, sure, either of these is a good enough metric if you assume that a typical movie has X many scenes or X many cuts, but both of these have obvious weaknesses that the metric they're proxies for doesn't have (e.g. there's at least one movie which is nearly two hours long and could be said to only have three scenes total; there are many ninety-minute movies which have far more scenes than some two-hour movies; some movies are ninety or even one hundred and forty minutes long and feature zero cuts), and in any case the objective, unambiguous metric that these are serving as proxies for isn't remotely difficult to determine, so why do you insist on using the proxy metrics anyway?

What are you basing this assertion on?

Extensive personal experience.

It irritates me that we insist on using a proxy for the real metric when the real metric is so trivially accessible.

But that's exactly my point: your proposed metric is a proxy too! What you seem to want to measure is "how long will it take to read this book". But even for the same reader, two different books with the same word count can have a different time-to-read. Which brings us right back to: we already have a widely accepted proxy, and it is accurate enough that almost nobody cares about the margin of error. So what advantage do we gain from switching to a different proxy measurement? None that I can see, and we incur all the disadvantages that normally come from switching measurements. Doesn't seem very worth it to me.

What you seem to want to measure is "how long will it take to read this book"

No - what I want is to know how long the book is. Knowing the word count would answer my question exactly, because the length of a book is its word count, in the same way that the duration of a film is how many minutes it takes up (not how many scenes, not how long it feels - just how many minutes). Knowing the word count wouldn't answer the question of how long I can expect it would take me to read it (in the same way that some ninety-minute films can "feel" longer than some films which are two hours long or more), but it would answer the question of how long it is, which is exactly what I want to know. The word count and the page count are both proxy metrics for "how long would the average reader take to read this book"; the page count is an imprecise proxy metric for "how long is this book", which is the word count.

Why word count and not syllable count?

Word count, syllable count and character count would all be equally valid objective metrics for the length of a book, in the sense of how much content it contains. I used word count because it's a standard metric used in numerous contexts (including, obviously, publishing).

*mora count (taking into account differences in syllable length)

In that case, then I still don't see your objection. The page count is in fact an exact metric for how long the book is, just as word count is. It doesn't matter how the size of the type face, or how it's laid out, a given volume is by definition N pages long. You might prefer the metric of word length, but it seems like most others prefer the metric of number of pages. So we aren't going to be switching any time soon.

The page count is in fact an exact metric for how long the book is...

Why is Hamlet 128x as long as Hamlet?

It doesn't matter how the size of the type face, or how it's laid out, a given volume is by definition N pages long.

Me, seeing a conversation piece: That's a great book, I read it back in high school.

SubstantialFrivolity, probably: You must be thinking of something else. This book was printed in 2024. I'm glad you're interested, because it brings up fond memories of my highschool English class where we read a book that contains identical text.


In common conversation, "book" refers to the text. Hamlet is 31,873 words regardless of which physical structure the words are in. Pages can only refer to specific editions of books: The Dover Publications Reprint edition (Sept. 24 1992) of Hamlet is 128 pages, while the One Page Book Company edition is one page.

More comments

The page count is not an exact metric for how long a book is (i.e. how much content it contains), for the simple reason that the same book can have multiple editions with drastically varying page counts. As outlined in the original post.

More comments