Word Counter for Text Comparisons

Overview

Below is a word counter that compares the frequency of words in Pamela; or, Virtue Rewarded by Samuel Richardson to The Life and Adventures of Robinson Crusoe by Daniel Defoe. There are two separate word counters: one includes every word, and the other ignores common words (i.e. and, the, to, very, must). For rendering purposes, only the first 100 words are presented.

Both texts are in the public domain, and this script is intended for educational and research purposes in accordance with fair use guidelines. Both texts provided by Project Gutenberg, located below:

Pamela: https://www.gutenberg.org/cache/epub/6124/pg6124-images.html

Crusoe: https://www.gutenberg.org/cache/epub/521/pg521-images.html

The code for this project is available for use on Github. There, you will find instructions on how to count the words of any .txt file of your choosing.

Link to GitHub: https://github.com/fastball-marty/Word-Counter

Count all words

Counts the frequency of all words in Pamela and Robinson Crusoe and displays the 100 most popular words for each.

Pamela Robinson Crusoe
Total words: 221113

i: 10253
and: 9033
to: 7521
the: 5601
you: 4793
my: 4584
of: 3787
me: 3686
a: 3413
said: 3140
he: 2866
for: 2865
in: 2747
that: 2648
it: 2443
as: 2408
be: 2379
have: 2302
not: 2247
but: 2053
she: 1918
so: 1898
your: 1861
her: 1830
was: 1755
with: 1656
will: 1572
is: 1527
this: 1385
all: 1239
if: 1168
his: 1156
had: 1145
would: 983
at: 963
what: 945
good: 848
by: 812
mrs: 812
am: 809
sir: 795
him: 783
shall: 736
dear: 732
when: 710
on: 708
no: 707
well: 692
should: 676
are: 661
do: 657
master: 617
one: 602
very: 601
see: 593
upon: 591
from: 589
them: 578
they: 578
may: 575
now: 552
has: 548
then: 546
poor: 529
more: 528
or: 523
how: 521
which: 521
could: 516
pamela: 512
know: 510
can: 508
much: 503
think: 499
we: 494
out: 493
any: 490
say: 487
mr: 466
little: 464
been: 459
must: 456
up: 453
too: 450
such: 447
lady: 446
let: 442
make: 423
an: 407
than: 406
myself: 404
jewkes: 387
come: 374
hope: 370
who: 345
though: 344
did: 338
there: 335
thought: 330
made: 328
Total words: 120799

the: 5912
i: 5102
and: 4758
to: 4257
of: 3513
a: 2267
my: 2125
was: 1968
in: 1944
that: 1854
it: 1805
had: 1553
as: 1524
for: 1297
me: 1220
but: 1089
with: 1078
not: 969
which: 883
he: 877
them: 860
so: 836
this: 821
they: 770
all: 754
or: 746
at: 700
him: 660
be: 655
on: 604
were: 587
we: 573
by: 568
could: 566
upon: 541
would: 482
have: 482
his: 462
very: 452
from: 445
no: 443
when: 439
out: 422
one: 421
if: 419
up: 382
some: 382
what: 376
two: 358
made: 350
more: 345
into: 340
great: 340
there: 339
their: 336
been: 317
any: 312
might: 301
now: 296
being: 295
myself: 294
found: 289
about: 289
came: 283
should: 282
time: 277
little: 271
much: 265
shore: 260
did: 259
first: 258
than: 253
before: 244
other: 240
our: 239
after: 237
boat: 237
go: 228
then: 226
where: 225
make: 223
such: 221
how: 219
ship: 218
these: 216
again: 210
went: 206
us: 205
well: 204
way: 203
an: 201
three: 201
down: 200
you: 200
place: 200
come: 198
though: 195
is: 191
began: 185
island: 182

Ignore common words

Counts the frequency of all words in Pamela and Robinson Crusoe excluding words in list of common words.

List of common words: the, and, a, to, of, in, is, you, that, it, he, was, for, on, are, as, with, his, they, at, be, this, have, from, or, one, had, by, but, not, what, all, were, we, when, your, can, said, there, an, each, which, she, do, how, their, if, will, up, other, about, out, many, then, them, these, so, some, her, would, make, like, him, into, time, has, look, two, more, go, see, no, way, could, people, than, first, been, who, its, now, find, long, down, day, did, get, come, made, may, part, me, am, shall, should, very, upon, might, much, such, though, yet, too, any.

Pamela Robinson Crusoe
i: 10253
my: 4584
good: 848
mrs: 812
sir: 795
dear: 732
well: 692
master: 617
poor: 529
pamela: 512
know: 510
think: 499
say: 487
mr: 466
little: 464
must: 456
lady: 446
let: 442
myself: 404
jewkes: 387
hope: 370
thought: 330
own: 327
sure: 313
o: 310
thing: 310
before: 309
great: 296
jervis: 288
father: 287
tell: 287
heart: 279
take: 278
give: 277
came: 276
dont: 272
never: 271
every: 262
why: 262
letter: 259
here: 251
god: 249
indeed: 249
us: 245
ill: 241
again: 240
nothing: 236
most: 235
went: 235
happy: 231
after: 229
till: 228
only: 225
mother: 219
pray: 219
man: 218
love: 213
put: 211
cannot: 208
away: 207
hand: 206
honour: 205
kind: 204
both: 203
took: 202
thou: 202
our: 198
because: 197
ever: 194
mind: 193
williams: 193
madam: 186
always: 181
house: 178
better: 171
goodness: 170
quite: 167
told: 167
where: 165
last: 163
woman: 161
done: 159
girl: 158
saw: 158
another: 157
being: 156
believe: 156
wish: 153
without: 151
things: 151
nor: 149
soon: 149
pleased: 147
leave: 146
just: 145
gave: 144
got: 144
over: 142
ladies: 141
stay: 141
i: 5102
my: 2125
great: 340
being: 295
myself: 294
found: 289
came: 283
little: 271
shore: 260
before: 244
our: 239
after: 237
boat: 237
where: 225
ship: 218
again: 210
went: 206
us: 205
well: 204
three: 201
place: 200
began: 185
island: 182
good: 179
friday: 175
sea: 168
thought: 165
over: 165
things: 164
having: 156
saw: 154
indeed: 147
men: 146
away: 146
took: 146
nothing: 145
life: 143
man: 143
god: 141
water: 141
never: 140
knew: 138
off: 135
say: 129
however: 128
without: 126
till: 125
only: 125
side: 123
lay: 121
told: 119
brought: 118
got: 116
another: 116
own: 116
take: 115
thoughts: 114
near: 110
next: 109
also: 108
while: 105
set: 104
captain: 104
gave: 100
mind: 100
just: 99
same: 99
resolved: 98
work: 98
most: 97
enough: 97
done: 97
here: 97
left: 97
those: 96
must: 96
every: 94
nor: 90
last: 90
put: 89
ground: 88
back: 87
hands: 84
killed: 83
still: 83
soon: 82
country: 81
night: 81
called: 80
let: 80
head: 79
condition: 79
world: 78
pieces: 78
going: 77
home: 77
even: 77
days: 76
least: 76
think: 75

Words exclusive to each novel

Displays the 100 most common words in Pamela that are not in Robinson Crusoe and vice versa.

Pamela Robinson Crusoe
mrs
pamela
lady
jewkes
jervis
williams
madam
girl
ladies
ladyship
sister
youll
shew
chariot
john
herself
davers
dearest
closet
longman
darnford
ladys
thats
andrews
sex
thursday
peters
makes
dare
unworthy
guineas
deserve
colbrand
jones
proud
permit
simon
window
merit
vile
marry
stairs
mistress
nan
wench
parlour
dutiful
im
dine
shewed
naughty
etc
monday
marriage
disgrace
virtuous
yesterday
honours
shant
b
ladyships
vexed
pamelas
anger
surely
parson
jonathan
wednesday
saturday
thomas
chapel
saucy
intentions
plot
loves
maids
wept
faults
tuesday
beck
displeasure
noble
bedfordshire
kinsman
favours
folks
lincolnshire
wit
shewn
trials
arthur
kiss
lovely
methinks
ah
reputation
ungrateful
airing
maiden
overcome
island
killed
gun
savages
shot
corn
cave
tree
powder
goats
savage
tent
wreck
current
habitation
fired
brazils
plantation
rain
weather
xury
prisoners
sand
canoe
canoes
cargo
creek
raft
castle
devoured
loaded
rocks
tame
spaniard
tobacco
showed
signs
chests
wolves
bade
negroes
blew
anchor
mate
rice
stakes
grapes
heat
mountains
beasts
lisbon
barley
cabin
guns
east
fence
governor
north
kid
june
spaniards
vessel
seed
snow
daily
seamen
sails
leagues
ammunition
muskets
crop
moidores
hollow
islands
hatchet
portuguese
goat
bower
pistol
stayed
league
knocked
raisins
pots
print
earthen
boards
fowls
enclosure
fortification
rainy
eddy
flock
society
africa
latitude
sallee
bullets
south
brazil

Sentiment Analysis (Beta)

Using the sentiment analysis tools nltk and VADER, each sentence in Pamela was given a numerical value for its positive, neutral, negative sentiments. Below are the 15 sentences with the highest negative score, and the 15 sentences with the highest positive score.

  • compound reflects an overall score calculated from the other values.

  • VADER was specifically designed for sentiment analysis of text that is often encountered in social media, product reviews, and other short texts, so it does not provide an ideal model. Utilizing custom training models instead of VADER would enhance accuracy by tailoring the analysis to the specific domain of 18th century literature.

Highest Negative Sentiments: 
Alas!: {'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound': -0.3382}
O frightful!: {'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound': -0.5562}
Ruin!: {'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound': -0.6239}
shame!: {'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound': -0.5255}
disgrace!: {'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound': -0.5411}
Foolish!: {'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound': -0.3382}
No, no!: {'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound': -0.5707}
I cried sadly for vexation;: {'neg': 0.892, 'neu': 0.108, 'pos': 0.0, 'compound': -0.8074}
Wicked, wicked man!: {'neg': 0.876, 'neu': 0.124, 'pos': 0.0, 'compound': -0.7959}
All sadly vile:: {'neg': 0.873, 'neu': 0.127, 'pos': 0.0, 'compound': -0.7845}
sad poor stuff!: {'neg': 0.867, 'neu': 0.133, 'pos': 0.0, 'compound': -0.7574}
Poor, poor man!: {'neg': 0.867, 'neu': 0.133, 'pos': 0.0, 'compound': -0.7574}
I wept bitterly, however;: {'neg': 0.857, 'neu': 0.143, 'pos': 0.0, 'compound': -0.7184}
I meant no harm;: {'neg': 0.851, 'neu': 0.149, 'pos': 0.0, 'compound': -0.6908}
I meant no harm.: {'neg': 0.851, 'neu': 0.149, 'pos': 0.0, 'compound': -0.6908}

Highest Positive Sentiments: 
Innocent!: {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.4003}
Well!: {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.3382}
Sweet excellence!: {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.8122}
I welcome!: {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.5093}
yes, surely!: {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.7088}
O good God!: {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.6476}
O help!: {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.4574}
happy.: {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.5719}
O God!: {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.3382}
Kind, lovely charmer!: {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.8858}
contented;: {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.34}
Great and good God!: {'neg': 0.0, 'neu': 0.096, 'pos': 0.904, 'compound': 0.8553}
God bless your honour!: {'neg': 0.0, 'neu': 0.101, 'pos': 0.899, 'compound': 0.8356}
I kissed his dear hand:: {'neg': 0.0, 'neu': 0.106, 'pos': 0.894, 'compound': 0.8126}
happy, happy Mr.: {'neg': 0.0, 'neu': 0.119, 'pos': 0.881, 'compound': 0.8126}