Word Usage in SciFi Stories

As a professional programmer I sort text files and analyze their content. Everytime I look at my own writing I have the habit of analyzing it the same way. Yesterday I realized the value to running a purely statistical analysis of my word usage. I have been listening to Richard Morgan's trilogy Altered Carbon, Broken Angels, and Woken Furies. I noticed that Morgan really likes the word "shrugged". Kovacs and others use this word 30+ times per book. It's annoying, probably an indicator that he rushed the books past an overworked editor. It also might be an indication that words can be overused in audiobooks even though the work is great on paper.

I assumed that I had a few overused words and I wondered how to find them. Word doesn't offer a word count histogram, so I wrote one in perl. If you are lucky enough to use a Mac you are only about 3 minutes away from running your own word count on any document you like. If you are on a PC you probably will have to download ActiveState Perl and get it running. This might take awhile. I'll look into building an exe file if there is sufficient interest.

On a Mac you have to do 4 things:

1) open the Terminal and go to your target directory

2) save this perl code in your target directory as an executable text file named wordcount.pl

3) save your document as a text file in the same directory, let's say you name it doc.txt. This program will not read Word files or other formats, just text.

4) type ./wordcount.pl doc.txt

The lightning fast result is a histogram analysis of your word usage. A 53,000 word document on a MacBook Pro runs in about 2 seconds. If you want to save the output for future review type ./wordcount.pl doc.txt > results.txt then open results.ext in your word processor.

Each line of output starts with the number of times the word is used, followed by the word, like this: 1176 of 1255 and 1268 a 1441 to 2502 the

At the end is a summary: Total of 53955 words, 10352 distinct words used. My word count agreed exactly with Microsoft Word!! I would have bet money they would not be exactly the same. But at least it gives me confidence in my code.

Your most heavily used words will be of course: the, to, a, and....etc. You will have to dig through the list to find the first word which is not common.

My first word to study is Phillip, the name of my protagonist which I have used 122 times. This might be an indication I'm overusing his name, although it will take a careful reading of the book to decided when to drop the name.

Suppose you only want to see the dreaded adverb. It's trivial to look for words containing in "ly". You can modify my script by deleting the first # sign around line 21 => unless ($word =~ /ly/) {next;} # remove the first # sign if you want to look for adverbs

I learned that my novel contains 261 distinct words ending in ly, and I use them 800 times. I used "only" 98 times, "nearly" 53 times, and "really" 30 times. Awful, just awful. Instead of slowly reading each paragraph I can now target specific words in much faster editing sessions.

Editors look for overusage of adverbs and specific words. Statistical analysis of your work is a tool you can use to get past these roadblocks. I look forward to your feedback.