site banner

Small-Scale Question Sunday for June 4, 2023

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

3
Jump in the discussion.

No email address required.

How would someone go about using statistics to determine if the name frequency in a book is too improbable to be by chance? For instance, if there’s a book in which three important characters share the same name which has a frequency of 25%, and then two other important characters share the same name, and all these characters are linked thematically. My intuition is that this is impossible to be chance, but how could you argue this statistically?

In a "spherical cow" sense, you could do an exact multinomial test on the observed counts versus some hypothesis for the name distribution - e.g. assume there are n different names, all equally likely, or get a frequency table from official statistics somewhere.

In real life, though, all that will be confounded by who the characters are, and what they have in common - e.g. it's much less weird if they're all Muslim, and the name is Muhammad.

For a proper test, you need a model for "how likely is it for n people linked by a narrative structure to have repeated names", which seems harder to do? The simplest approach might be to sample a bunch of other books, then you could at least say "this book repeats names way more than usual".