home > results and commentary > Kaine vs Pence

Word Analysis of 2016 U.S. Presidential Debates

Tim Kaine vs Mike Pence

4 October 2016



Word Statistics

Debate Word Count

Summary Word Count

The summary word count reports the total number of words and the number of unique, non-stop words used by each candidate. Word number is expressed as both absolute and relative values.

Table 1a
all words
Number of all words and unique words used by each speaker.
set word count
Tim Kaine
7,502 1,423
48.8% 19.0%
60791423
Mike Pence
7,884 1,436
51.2% 18.2%
64481436
total
15,386 2,175
100.0% 14.1%
132112175

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 1b
exclusive and shared words
Words exclusive to speaker (e.g. speaker A but not speaker B) and shared by speakers (speaker A and B).
set word count
Tim Kaine
1,149 739
15.3% 64.3%
410739
Mike Pence
1,251 752
15.9% 60.1%
499752
both candidates
12,986 684
84.4% 5.3%
12302684

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 1
legend
a c
b d
3010

a :: word count

b :: word count, as fraction in total in debate

c :: unique words in (a)

d :: unique words in (a), as fraction in (a) bar :: proportion of (a-c):c

Table 1
commentary

Stop Word Contribution

In the table below, the candidates' delivery is partitioned into stop and non-stop words. Stop words (full list) are frequently-used bridging words (e.g. pronouns and conjunctions) whose meaning depends entirely on context. The fraction of words that are stop words is one measure of the complexity of speech.

Table 2a
non-stop words
Counts of stop and non-stop words.
speaker all stop non-stop
Tim Kaine
7,502 1,423
100.0% 19.0%
60791423
3,984 149
53.1% 3.7%
3835149
3,518 1,274
46.9% 36.2%
22441274
Mike Pence
7,884 1,436
100.0% 18.2%
64481436
4,152 136
52.7% 3.3%
4016136
3,732 1,300
47.3% 34.8%
24321300
total
15,386 2,175
100.0% 14.1%
132112175
8,136 154
52.9% 1.9%
7982154
7,250 2,021
47.1% 27.9%
52292021

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 2b
exclusive and shared non-stop words
Non-stop words exclusive to speaker (e.g. speaker A but not speaker B) and shared by speakers (speaker A and B).
set word count
Tim Kaine
1,112 721
31.6% 64.8%
391721
Mike Pence
1,237 747
33.1% 60.4%
490747
both candidates
4,901 553
67.6% 11.3%
4348553

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 2
legend
a c
b d
3010

a :: total number of words, for a given category (all, stop, non-stop)

b :: (a) relative to words in the debate if category=all, otherwise relative to words by the candidate

c :: number of unique words with set (a)

d :: (c) relative to (a)

bar :: proportion of (a-c):c

Table 2
commentary

Word frequency

The word frequency table summarizes the frequency with which words were used. I show the average word frequency and the weighted cumulative frequencies at 50 and 90 percentile. The average word frequency indicates how many times, on average, a word is used. For a given fraction of the entire delivery, the weighted cumulative frequency indicates the largest word frequency within this fraction (details about weighted cumulative distribution).

Table 3a
word use frequency
Average and 50%/90% percentile word frequencies.
speaker word frequency
all stop non-stop
Tim Kaine
5.3 19 239
5.27219.000239.000
26.7 80 283
26.73880.000283.000
2.8 5 26
2.7615.00026.000
Mike Pence
5.5 23 260
5.49023.000260.000
30.5 93 431
30.52993.000431.000
2.9 5 35
2.8715.00035.000
total
7.1 40 499
7.07440.000499.000
52.8 162 548
52.831162.000548.000
3.6 7 48
3.5877.00048.000

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 3b
exclusive and shared non-stop word use frequency
Average and 50%/90% cumulative percentile word frequencies. Non-stop words exclusive to speaker (e.g. speaker A but not speaker B) and shared by speakers (speaker A and B).
set word frequency
Tim Kaine
1.54 2 4
1.5422.0004.000
Mike Pence
1.66 2 6
1.6562.0006.000
total
3.59 7 48
3.5877.00048.000

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 3
legend
a b c
51025

a :: average word frequency

b :: largest word frequency in 50% of content

c :: largest word frequency in 90% of content

bar :: proportion of a:b:c

Table 3
commentary

Sentence Size

Table 4
sentence size
Number of sentences spoken by each speaker and sentence word count statistics. Number of words in a sentence is shown by average and 50%/90% cumulative values for all, stop and non-stop words.
speaker number of sentences sentence size
all stop non-stop
Tim Kaine
581
581
13.0 17 36
13.01017.00036.000
7.2 10 19
7.15210.00019.000
6.3 9 17
6.2579.00017.000
Mike Pence
540
540
14.7 22 55
14.72822.00055.000
8.1 11 28
8.09711.00028.000
7.2 11 27
7.21811.00027.000
total
1,121
1121
15.8 20 48
15.83820.00048.000
9.6 11 25
9.60511.00025.000
8.7 11 23
8.71811.00023.000

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 4
legend
a b c
51025

a :: average sentence size

b :: largest sentence size for 50% of content

c :: largest sentence size for 90% of content

bar :: proportion of a:b:c

Table 4
commentary

All further word use statistics represent content that has been filtered for stop words, unless explicitly indicated.

Part of Speech Analysis

In this section, word frequency is broken down by their part of speech (POS). The four POS groups examined are nouns, verbs, adjectives and adverbs. Conjunctions and prepositions are not considered. The first category (n+v+adj+adv) is composed of all four POS groups.

Part of Speech Count

Table 5
part of speech count
Count of words categorized by part of speech (POS).
part of speech
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Tim Kaine
3,282 1,219
43.7% 37.1%
11276834503742392778250
1,810 683
55.1% 37.7%
1127683
824 374
25.1% 45.4%
450374
516 277
15.7% 53.7%
239277
132 50
4.0% 37.9%
8250
Mike Pence
3,555 1,247
45.1% 35.1%
128166345539930226612960
1,944 663
54.7% 34.1%
1281663
854 399
24.0% 46.7%
455399
568 266
16.0% 46.8%
302266
189 60
5.3% 31.7%
12960
total
6,837 1,946
44.4% 28.5%
26861068103664263944524180
3,754 1,068
54.9% 28.4%
26861068
1,678 642
24.5% 38.3%
1036642
1,084 445
15.9% 41.1%
639445
321 80
4.7% 24.9%
24180

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 5
legend
a c
b d
1535

a :: total number of words for a given POS (all, noun, verb, adjective, adverb, pronoun)

b :: (a) relative to all words by candidate

c :: unique words in (a)

d :: (c) relative to (a)

bar :: proportion of (a-c):c

Table 5
commentary

Part of Speech Frequency

Table 5
part of speech frequency
Frequency of words categorized by part of speech (POS).
part of speech frequency
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Tim Kaine
2.69 4 28
2.6924.00028.000
2.65 4 44
2.6504.00044.000
2.20 3 18
2.2033.00018.000
1.86 2 11
1.8632.00011.000
2.64 4 28
2.6404.00028.000
Mike Pence
2.85 5 35
2.8515.00035.000
2.93 6 52
2.9326.00052.000
2.14 3 14
2.1403.00014.000
2.13 3 15
2.1353.00015.000
3.15 7 34
3.1507.00034.000
total
3.51 7 48
3.5137.00048.000
3.52 7 95
3.5157.00095.000
2.61 4 36
2.6144.00036.000
2.44 4 23
2.4364.00023.000
4.01 9 62
4.0139.00062.000

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 5
legend
a b c
51025

a :: average word frequency

b :: largest word frequency in 50% of content

c :: largest word frequency in 90% of content

bar :: proportion of a:b:c

Table 5
commentary

Part of Speech Pairing

Through word pairing, I extract concepts from the text. The number of unique word pairs is a function of sentence length and is one of the measures of complexity.

Table 6a
part of speech pairing — Tim Kaine
Word pairs (total and unique) categorized by part of speech (POS)
part of speech pairings - Tim Kaine
noun verb adjective adverb
noun
4,599 3,705
  80.6%
8943705
verb
3,862 3,358
  86.9%
5043358
781 677
  86.7%
104677
adjective
2,288 1,974
  86.3%
3141974
981 862
  87.9%
119862
298 263
  88.3%
35263
adverb
594 538
  90.6%
56538
288 262
  91.0%
26262
166 150
  90.4%
16150
29 27
  93.1%
227

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 6b
part of speech pairing — Mike Pence
Word pairs (total and unique) categorized by part of speech (POS)
part of speech pairings - Mike Pence
noun verb adjective adverb
noun
7,270 5,625
  77.4%
16455625
verb
5,883 5,002
  85.0%
8815002
1,160 999
  86.1%
161999
adjective
3,468 2,844
  82.0%
6242844
1,392 1,211
  87.0%
1811211
378 319
  84.4%
59319
adverb
1,245 1,019
  81.8%
2261019
498 420
  84.3%
78420
292 256
  87.7%
36256
63 53
  84.1%
1053

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 6c
unique part of speech pairing — candidate comparison
Unique word pairs categorized by part of speech (POS)
unique part of speech pairings
noun (n) verb (v) adjective (adj) adverb (adv)
noun
3,705 5,625
  151.8%
3705
5625
verb
3,358 5,002
  149.0%
3358
5002
677 999
  147.6%
677
999
adjective
1,974 2,844
  144.1%
1974
2844
862 1,211
  140.5%
862
1211
263 319
  121.3%
263
319
adverb
538 1,019
  189.4%
538
1019
262 420
  160.3%
262
420
150 256
  170.7%
150
256
27 53
  196.3%
27
53

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 6 a,b
legend
a c
  d
3010

a :: total number of pairs, for a given category (e.g. verb/noun)

c :: number of unique pairs within set (a)

d :: (c) relative to (a)

bar :: proportion of (a-c):c

Table 6c
legend
a c
  d
50
45

a :: unique pairs for Tim Kaine

c :: unique pairs for Mike Pence

d :: (c) relative to (a) (i.e. Mike Pence relative to Tim Kaine)

bars :: (a) and (c)

Table 6
commentary

Exclusive and Shared Usage

This section enumerates words that were exclusive to a candidate (e.g. used by one candidate but not the other). This content provides insight into what the candidates' priorities are and reveals differences in perspective on similar topics.

For a given part of speech, the table breaks down the number of words that were spoken by only one of the candidates or both candidates (intersection). The last row includes words spoken by either candidate (union).

Table 7
exclusive word usage
Total and unique words used exclusively by a candidate, or by both.
part of speech
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Tim Kaine
1,056 699
100.0% 66.2%
15.4% 35.9%
357699
1843618820249149419
545 361
51.6% 66.2%
14.5% 33.8%
184361
184361
290 202
27.5% 69.7%
17.3% 31.5%
88202
88202
198 149
18.8% 75.3%
18.3% 33.5%
49149
49149
23 19
2.2% 82.6%
7.2% 23.8%
419
419
Mike Pence
1,198 727
100.0% 60.7%
17.5% 37.4%
471727
234361137244481331229
595 361
49.7% 60.7%
15.8% 33.8%
234361
234361
381 244
31.8% 64.0%
22.7% 38.0%
137244
137244
181 133
15.1% 73.5%
16.7% 29.9%
48133
48133
41 29
3.4% 70.7%
12.8% 36.2%
1229
1229
both candidates
4,583 520
100.0% 11.3%
67.0% 26.7%
4063520
22382787831314919822430
2,516 278
54.9% 11.0%
67.0% 26.0%
2238278
2238278
914 131
19.9% 14.3%
54.5% 20.4%
783131
783131
589 98
12.9% 16.6%
54.3% 22.0%
49198
49198
254 30
5.5% 11.8%
79.1% 37.5%
22430
22430
total
6,837 1,946
100.0% 28.5%
100.0% 100.0%
48911946
26861068103664263944524180
3,754 1,068
54.9% 28.4%
100.0% 100.0%
26861068
26861068
1,678 642
24.5% 38.3%
100.0% 100.0%
1036642
1036642
1,084 445
15.9% 41.1%
100.0% 100.0%
639445
639445
321 80
4.7% 24.9%
100.0% 100.0%
24180
24180

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 7c
legend
a d
b e
c f
4030
40302015105

a :: total number of words in set (e.g. obama \ romney, obama ∩ romney, obama ∪ romney , for a given part of speech

b :: (a) relative to all exclusive words in n+v+adj+adv

c :: (a) relative to all words in n+v+adj+adv

d :: unique words in (a)

e :: (d) relative to (a)

f :: (d) relative to all unique words in n+v+adj+adv

bar1 :: normalized ratio of (a-d):d

bar2 :: absolute ratio of (a-d):d for all POS groups (first column) or POS group (other columns)

Table 7
commentary

Noun Phrase Usage

Noun phrases were extracted from the text and analyzed for frequency, word count, unique word count and richness. Single-word phrases were not counted.

Top-level noun phrases are those without a parent noun phrase (a parent phrase is one that a similar, longer phrase). Derived noun phrases are those with a parent (more details about noun phrase analysis).

The top-level noun phrases can be interpreted as independent concepts. Derived noun phrases can be interpreted as variants on concepts embodied by the top-level phrases.

Noun Phrase Count and length

This table reports the absolute number of noun phrases, which is related to the number of nouns, and their length.

Table 8a
noun phrase count
Counts of noun phrases in words and per noun.
speaker noun phrase count
all top-level
Tim Kaine
693 310
100.0% 44.7%
0.38 0.45
383310
529 303
76.3% 57.3%
0.29 0.44
226303
Mike Pence
718 293
100.0% 40.8%
0.37 0.44
425293
557 288
77.6% 51.7%
0.29 0.43
269288

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 8b
noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
speaker noun phrase length
all top-level
Tim Kaine
2.29 2 3
2.2932.0003.000
2.38 2 4
2.3802.0004.000
Mike Pence
2.27 2 3
2.2692.0003.000
2.34 2 4
2.3382.0004.000

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 8a
legend
a d
b e
c f
1070

a :: number of noun phrases

b :: (a) relative to number of all noun phrases

c :: number of noun phrases per noun

d :: number of unique phrases

e :: (c) relative to (a)

f :: number of unique noun phrases per unique noun

bar :: normalized ratio of (a-c):c

Table 8b
legend
a b c
102080

a :: average noun phrase size, in words

b :: largest noun phrase size in 50% of content

c :: largest noun phrase size in 90% of content

bar :: proportion of a:b:c


Table 8
commentary

Exclusive and Shared Noun Phrase Count and length

Table 9a
exclusive and shared noun phrase count
Counts of exclusive and shared noun phrases in words and per noun.
speaker noun phrase count
all top-level
Tim Kaine
507 298
35.9% 58.8%
209298
482 293
95.1% 60.8%
189293
Mike Pence
547 279
38.8% 51.0%
268279
513 278
93.8% 54.2%
235278
both candidates
357 31
25.3% 8.7%
32631
91 19
25.5% 20.9%
7219

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 9b
exclusive and shared noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
speaker noun phrase length
all top-level
Tim Kaine
2.39 2 4
2.3872.0004.000
2.41 2 4
2.4072.0004.000
Mike Pence
2.34 2 4
2.3442.0004.000
2.36 2 4
2.3612.0004.000
both candidates
2.03 2 2
2.0342.0002.000
2.09 2 3
2.0882.0003.000

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 9a
legend
a c
b d
1070

a :: number of noun phrases

b :: (a) relative to number of all noun phrases

c :: number of unique phrases

d :: (c) relative to (a)

bar :: normalized ratio of (a-c):c

Table 9b
legend
a b c
102080

a :: average noun phrase size, in words

b :: largest noun phrase size in 50% of content

c :: largest noun phrase size in 90% of content

bar :: proportion of a:b:c


Table 9
commentary

Windbag Index

The Windbag Index is a compound measure that characterizes the complexity of speech. A low index is indicative of succinct speech with low degree of repetition and large number of independent concepts.

Table 10
windbag index
Windbag Index for each speaker. The higher the value, the more repetitive the speech.
speaker Windbag Index
index value index terms
Tim Kaine
386
-39.4%
386.710156118017
0.469 0.362 0.377 0.454 0.537 0.379 0.447 0.977
-0.9% +4.0% +10.6% -2.9% +14.6% +19.3% +9.6% -0.6%
0.4689416155691820.3621375781694140.3773480662983430.4538834951456310.5368217054263570.3787878787878790.4473304473304470.97741935483871
Mike Pence
638
+65.0%
638.234358039749
0.473 0.348 0.341 0.467 0.468 0.317 0.408 0.983
+0.9% -3.8% -9.6% +2.9% -12.8% -16.2% -8.8% +0.6%
0.4733637747336380.3483386923901390.3410493827160490.4672131147540980.468309859154930.3174603174603170.4080779944289690.982935153583618
Table 10
legend
The Windbag Index is 1/(t1*t2*...*t9) where t1,t2,...,t8 are

t1 :: fraction of words that are non-stop

t2 :: fraction of non-stop words that are unique

t3 :: fraction of nouns that are unique

t4 :: fraction of verbs that are unique

t5 :: fraction of adjectives that are unique

t6 :: fraction of adverbs that are unique

t7 :: fraction of noun phrases that are unique

t8 :: fraction of noun phrases that are top-level


Large individual terms t1...t9 contribute to a smaller index.

The percentage values below the index and each term are relative differences to the other speaker's corresponding term (i.e. 100*(a-b)/b where a is the value for one speaker and b for the other).
Table 10
commentary

Word Clouds

In the word clouds below, the size of the word is proportional to the number of times it was used by a candidate (method details).

Not all words from a group used to draw the cloud fit in the image — less frequently used words for large word groups may fall outside the image.

All Words for Each Candidate

Each candidate's debate portion was extracted and frequencies were compiled for each part of speech (noun, verb, adjective, adverb), with words colored by their part of speech category.

The distribution of sizes within a tag cloud follows the frequency distribution of words. However, word size cannot be compared between clouds, since the minimum and maximum size of the words is fixed.

Debate Word Cloud for Tim Kaine - all words

Debate tag cloud for Tim Kaine

Debate Word Cloud for Mike Pence - all words

Debate tag cloud for Mike Pence
commentary

Exclusive Words for Each Candidate

The clouds below show words used exlusively by a candidate. For example, if candidate A used the word "invest" (any number of times), but candidate B did not, then the word will appear in the exclusive word tag cloud for candidate A.

Words exclusive to Tim Kaine

Debate tag cloud for Tim Kaine

Words exclusive to Mike Pence

Debate tag cloud for Mike Pence
commentary

Part of Speech Word Clouds

In these clouds, words from each major part of speech were colored based on whether they were exclusive to a candidate or shared by the candidates.

The size of the word is relative to the frequency for the candidate — word sizes between candidates should not be used to indicate difference in absolute frequency.

Cloud of noun words, by speaker

commentary

Cloud of verb words, by speaker

commentary

Cloud of adjective words, by speaker

commentary

Cloud of adverb words, by speaker

commentary

Cloud of all words, by speaker

commentary

Word Pair Clouds for Each Candidate

word pairs for Tim Kaine

adjective/adjective by Tim Kaine
adjective/adverb by Tim Kaine
adjective/noun by Tim Kaine
adjective/verb by Tim Kaine
adverb/adverb by Tim Kaine
adverb/noun by Tim Kaine
adverb/verb by Tim Kaine
noun/noun by Tim Kaine
noun/verb by Tim Kaine
verb/verb by Tim Kaine

word pairs for Mike Pence

adjective/adjective by Mike Pence
adjective/adverb by Mike Pence
adjective/noun by Mike Pence
adjective/verb by Mike Pence
adverb/adverb by Mike Pence
adverb/noun by Mike Pence
adverb/verb by Mike Pence
noun/noun by Mike Pence
noun/verb by Mike Pence
verb/verb by Mike Pence
commentary

Downloads

Debate transcript

Parsed word lists and word clouds (word lists, part of speech lists, noun phrases, sentences) (word clouds)

Raw data structure

Please see the methods section for details about these files.