home > results and commentary > Pence vs. Harris

Word Analysis of 2020 U.S. Presidential Debates

Mike Pence vs. Kamala Harris

7 October 2020



Introduction

After the first Trump vs Pence debate, this was delightfully boring, measured and evasive. I'll take it.

Speaking Turns and Interruptions

Here, I look at the length of each turn of uninterrupted speech.

Table 1
length of sections in words
The number of uninterrupted deliveries (sections), mode/median/mean length of sections in words, and the shortest section length in words that composed 10%, 50% and 90% of the debate.
speaker sections section length debate contiguity (L10 L50 L90)
Mike Pence
113
113
11.0 19.0 53.6
11.00019.0000000053.566
25 126 184
25.000126.000184.000
Kamala Harris
94
94
6.0 40.5 61.8
6.00040.5000000061.766
45 136 182
45.000136.000182.000

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 1
legend
a b c
51025

a — section length (mode), shortest section length in 10% of debate

b — section length (median), shortest section length in 50% of debate

c — section length (mean), shortest section length in 90% of debate

bar — proportion of a:b:c

Table 1
commentary

Although Pence talked over both Harris and the moderator, both candidates took roughly the same number of turns speaking, which was about a third of what we saw in the Trump vs Pence debate. There, Trump's transcript had +177.0% (313 vs 113) more sections than Pence's and Biden's transcript had +164.9% (249 vs 94) more sections than Harris.

The median length of Pence's sections was –53.1% (19 vs 40.5) shorter in words than Harris', though their means are closer. This difference between median and mean indicates that the distribution of section lengths is quite asymmetric — a few very long sections skew the mean. Half of Pence's delivery was made in sections shorter than 126 words and this is slightly higher at 136 for Harris.

Flesch-Kincaid Reading Ease and Grade Level

The Flesch-Kincaid reading ease and grade level metrics are designed to indicate how difficult a passage in English is to understand.

This metric does not take repetition into account. A grade level 10 sentence that is repeated 100 times still generates the same metrics because the words per sentence and syllables per word remain constant. To measure how many times a speaker repeats themselves, I use my Windbag Index, below.

Reading ease ranges from 100 (easiest) down to 0 (hardest) and can be interpreted as follows

100 –905th gradeVery easy to read. Easily understood by an average 11-year-old student.
90 – 806th gradeEasy to read. Conversational English for consumers.
80 – 707th gradeFairly easy to read.
70 – 608th & 9th gradePlain English. Easily understood by 13- to 15-year-old students.
60 – 5010th to 12th gradeFairly difficult to read.
50 – 30collegeDifficult to read.
30 – 10college graduateVery difficult to read. Best understood by college/university graduates.
10 – 0professionalExtremely difficult to read. Best understood by college/university graduates.

The grade level corresponds roughly to a U.S. grade level. It has a minimum value of –3.4 and no upper bound.

Two sets of readability scores are calculated. One for the entire debate and one that only considers section with at least 9 words.

Table 2a
readability — entire debate
Flesch-Kincaid reading ease and grade level.
speaker grade level reading ease sections sentences words syllables
Mike Pence
8.61
0.0%
8.61
61.79
0.0%
61.79
113
0.0%
113
379
0.0%
379
6,053
0.0%
6053
9,218
0.0%
9218
Kamala Harris
7.69
0.0%
7.69
68.74
0.0%
68.74
94
0.0%
94
359
0.0%
359
5,806
0.0%
5806
8,351
0.0%
8351

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 2b
readability — excluding short sections
Flesch-Kincaid reading ease and grade level for sections with at least 9 words.
speaker grade level reading ease sections sentences words syllables
Mike Pence
8.94
0.0%
8.94
60.85
0.0%
60.85
91
0.0%
91
356
0.0%
356
5,971
0.0%
5971
9,102
0.0%
9102
Kamala Harris
8.10
0.0%
8.10
67.51
0.0%
67.51
67
0.0%
67
332
0.0%
332
5,691
0.0%
5691
8,202
0.0%
8202

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 2
legend
a
b
30

a — value for candidate

b — value relative to Donald Trump

bar — proportion of a

Table 2
commentary

After Trump's 3.60 and Biden's 4.27 grade level deliveries in the first debate, it was good to see discourse level elevated. Pence's grade level was +12.0% (8.61 vs 7.69) higher than Harris'. Excluding short sections are excluded (I added this option to exclude interruptions from the grade level calculation), grade level increases by about 0.3 for both candidates but the difference of +10.4% (8.94 vs 8.1) remains roughly the same.

It's interesting to see why the grade level is different between Pence and Harris. Pence delivered slightly more sentences (+5.6% (379 vs 359)) and slightly more words (+4.3% (6,053 vs 5,806)) but quite a few more syllables (+10.4% (9,218 vs 8,351)). And so while Pence's word per sentence count was actually –1.2% (15.97 vs 16.17) lower than Harris, his syllables per word was +5.9% (1.52 vs 1.44) higher.

Sentence Size

Table 3
sentence size
Number of sentences spoken by each speaker and sentence word count statistics. Number of words in a sentence is shown by average and 50%/90% cumulative values for all, stop and non-stop words.
speaker number of sentences sentence size
all stop non-stop
Mike Pence
380
380
16.0 21 43
16.02621.00043.000
7.8 11 20
7.84211.00020.000
8.2 11 23
8.18411.00023.000
Kamala Harris
359
359
16.3 22 45
16.30922.00045.000
9.1 13 26
9.06413.00026.000
7.2 10 21
7.24510.00021.000
total
739
739
18.2 23 45
18.16423.00045.000
10.4 13 25
10.43613.00025.000
9.7 12 23
9.72812.00023.000

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 3
legend
a b c
51025

a — average sentence size

b — largest sentence size for 50% of content

c — largest sentence size for 90% of content

bar — proportion of a:b:c

Table 3
commentary

Sentence size was roughly equal with Harris using +16.7% (9.1 vs 7.8) more stop words in a sentence, on average.

Word Statistics

Debate Word Count

Summary Word Count

The summary word count reports the total number of words and the number of unique, non-stop words used by each candidate. Word number is expressed as both absolute and relative values.

Table 4a
all words
Number of all words and unique words used by each speaker.
set word count
Mike Pence
6,090 1,265
51.0% 20.8%
48251265
Kamala Harris
5,855 1,134
49.0% 19.4%
47211134
total
11,945 1,858
100.0% 15.6%
100871858

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 4b
exclusive and shared words
Words exclusive to speaker (e.g. speaker A but not speaker B) and shared by speakers (speaker A and B).
set word count
Mike Pence
1,147 724
18.8% 63.1%
423724
Kamala Harris
900 593
15.4% 65.9%
307593
both candidates
9,898 541
82.9% 5.5%
9357541

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 4
legend
a c
b d
3010

a — word count

b — word count, as fraction in total in debate

c — unique words in (a)

d — unique words in (a), as fraction in (a)

bar — proportion of (a-c):c

Table 4
commentary

Pence used +4.0% (6,090 vs 5,855) more words and had +27.4% (1,147 vs 900) more exclusive words.

Harris used relatively Δrel=+4.4% (Δabs=+2.8%, 65.9% vs 63.1%) more unique words.

Stop Word Contribution

In the table below, the candidates' delivery is partitioned into stop and non-stop words. Stop words (full list) are frequently-used bridging words (e.g. pronouns and conjunctions) whose meaning depends entirely on context. The fraction of words that are stop words is one measure of the complexity of speech.

Table 5a
non-stop words
Counts of stop and non-stop words.
speaker all words stop words non-stop words
Mike Pence
6,090 1,265
100.0% 20.8%
48251265
2,980 129
48.9% 4.3%
2851129
3,110 1,136
51.1% 36.5%
19741136
Kamala Harris
5,855 1,134
100.0% 19.4%
47211134
3,254 129
55.6% 4.0%
3125129
2,601 1,005
44.4% 38.6%
15961005
total
11,945 1,858
100.0% 15.6%
100871858
6,234 141
52.2% 2.3%
6093141
5,711 1,717
47.8% 30.1%
39941717

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 5b
exclusive and shared non-stop words
Non-stop words exclusive to speaker (e.g. speaker A but not speaker B) and shared by speakers (speaker A and B).
set word count
Mike Pence
1,123 712
36.1% 63.4%
411712
Kamala Harris
881 581
33.9% 65.9%
300581
both candidates
3,707 424
64.9% 11.4%
3283424

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 5
legend
a c
b d
3010

a — total number of words, for a given category (all, stop, non-stop)

b — (a) relative to words in the debate if category=all, otherwise relative to words by the candidate

c — number of unique words with set (a)

d — (c) relative to (a)

bar — proportion of (a-c):c

Table 5
commentary

Harris used proportionately Δrel=+13.7% (Δabs=+6.7%, 55.6% vs 48.9%) more stop words than Pence.

Word frequency

The word frequency table summarizes the frequency with which words were used. I show the average word frequency and the weighted cumulative frequencies at 50 and 90 percentile. The average word frequency indicates how many times, on average, a word is used. For a given fraction of the entire delivery, the weighted cumulative frequency indicates the largest word frequency within this fraction (details about weighted cumulative distribution).

Table 6a
word use frequency
Average and 50%/90% percentile word frequencies.
speaker word frequency
all stop non-stop
Mike Pence
4.8 19 188
4.81419.000188.000
23.1 91 367
23.10191.000367.000
2.7 5 50
2.7385.00050.000
Kamala Harris
5.2 25 191
5.16325.000191.000
25.2 67 211
25.22567.000211.000
2.6 4 38
2.5884.00038.000
total
6.4 42 355
6.42942.000355.000
44.2 145 667
44.213145.000667.000
3.3 6 101
3.3266.000101.000

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 6b
exclusive and shared non-stop word use frequency
Average and 50%/90% cumulative percentile word frequencies. Non-stop words exclusive to speaker (e.g. speaker A but not speaker B) and shared by speakers (speaker A and B).
set word frequency
Mike Pence
1.58 2 8
1.5772.0008.000
Kamala Harris
1.52 2 4
1.5162.0004.000
total
3.33 6 101
3.3266.000101.000

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 6
legend
a b c
51025

a — average word frequency

b — largest word frequency in 50% of content

c — largest word frequency in 90% of content

bar — proportion of a:b:c

Table 6
commentary

Harris repeated her words +8.3% (5.2 vs 4.8) more often. If we consider non-stop words, then both candidates average word frequency was nearly identical. Pence's 90% word frequency percentile was quite a bit higher than Harris at +31.6% (50 vs 38).

All further word use statistics represent content that has been filtered for stop words, unless explicitly indicated.

Part of Speech Analysis

In this section, word frequency is broken down by their part of speech (POS). The four POS groups examined are nouns, verbs, adjectives and adverbs. Conjunctions and prepositions are not considered. The first category (n+v+adj+adv) is composed of all four POS groups.

Part of Speech Count

Table 7
part of speech count
Count of words categorized by part of speech (POS).
part of speech
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Mike Pence
2,951 1,083
48.5% 36.7%
11155964303141761649759
1,711 596
58.0% 34.8%
1115596
744 314
25.2% 42.2%
430314
340 164
11.5% 48.2%
176164
156 59
5.3% 37.8%
9759
Kamala Harris
2,437 955
41.6% 39.2%
8615203903011391383751
1,381 520
56.7% 37.7%
861520
691 301
28.4% 43.6%
390301
277 138
11.4% 49.8%
139138
88 51
3.6% 58.0%
3751
total
5,388 1,633
45.1% 30.3%
219190193649936025715886
3,092 901
57.4% 29.1%
2191901
1,435 499
26.6% 34.8%
936499
617 257
11.5% 41.7%
360257
244 86
4.5% 35.2%
15886

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 7
legend
a c
b d
1535

a — total number of words for a given POS (all, noun, verb, adjective, adverb)

b — (a) relative to all words by candidate

c — unique words in (a)

d — (c) relative to (a)

bar — proportion of (a-c):c

Table 7
commentary

Both candidates used proportionately roughly the same fraction of nouns and adjectives. Proportionately, Harris used Δrel=+12.7% (Δabs=+3.2%, 28.4% vs 25.2%) more verbs and Pence had Δrel=+47.2% (Δabs=+1.7%, 5.3% vs 3.6%) more adverbs.

In terms of proportion of unique parts of speech, both candidates had roughly the same fraction of unique verbs and adjectives. However, Harris repeated her nouns less often and had Δrel=+8.3% (Δabs=+2.9%, 37.7% vs 34.8%) proportionately more unique nouns. She also had proportionately more unique adverbs compared to Pence Δrel=+53.4% (Δabs=+20.2%, 58% vs 37.8%). Thus, while Pence did use more nouns and adverbs in total, he tended to repeat them more than Harris.

Part of Speech Frequency

Table 8
part of speech frequency
Frequency of words categorized by part of speech (POS).
part of speech frequency
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv) pronouns (pro)
Mike Pence
2.73 5 48
2.7255.00048.000
2.87 7 50
2.8717.00050.000
2.37 3 25
2.3693.00025.000
2.07 2 67
2.0732.00067.000
2.64 6 17
2.6446.00017.000
17.66 62 119
17.66062.000119.000
Kamala Harris
2.55 4 33
2.5524.00033.000
2.66 4 42
2.6564.00042.000
2.30 3 20
2.2963.00020.000
2.01 2 31
2.0072.00031.000
1.73 2 5
1.7252.0005.000
19.52 57 126
19.51957.000126.000
total
3.30 6 90
3.2996.00090.000
3.43 8 101
3.4328.000101.000
2.88 5 46
2.8765.00046.000
2.40 4 98
2.4014.00098.000
2.84 6 15
2.8376.00015.000
31.63 145 245
31.633145.000245.000

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 8
legend
a b c
51025

a — average word frequency

b — largest word frequency in 50% of content

c — largest word frequency in 90% of content

bar — proportion of a:b:c

Table 8
commentary

We see Pence's repetition of nouns and adverbs here quite starkly. His 50% percentile noun and adverb frequency was +75.0% (7 vs 4) and +200.0% (6 vs 2) higher, respectively, than Harris.

Part of Speech Pairing

Through word pairing, I extract concepts from the text. The number of unique word pairs is a function of sentence length and is one of the measures of complexity.

Table 9a
part of speech pairing — Mike Pence
Word pairs (total and unique) categorized by part of speech (POS)
part of speech pairings - Mike Pence
noun verb adjective adverb
noun
412 162
  39.3%
250162
verb
79 73
  92.4%
673
5 5
  100.0%
05
adjective
217 150
  69.1%
67150
3 3
  100.0%
03
19 17
  89.5%
217
adverb
11 10
  90.9%
110
38 38
  100.0%
038
5 5
  100.0%
05
4 4
  100.0%
04

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 9b
part of speech pairing — Kamala Harris
Word pairs (total and unique) categorized by part of speech (POS)
part of speech pairings - Kamala Harris
noun verb adjective adverb
noun
269 129
  48.0%
140129
verb
62 54
  87.1%
854
7 7
  100.0%
07
adjective
157 117
  74.5%
40117
2 2
  100.0%
02
13 13
  100.0%
013
adverb
3 3
  100.0%
03
15 15
  100.0%
015
6 5
  83.3%
15
2 2
  100.0%
02

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 9c
unique part of speech pairing — candidate comparison
Unique word pairs categorized by part of speech (POS)
unique part of speech pairings
noun (n) verb (v) adjective (adj) adverb (adv)
noun
162 129
  79.6%
162
129
verb
73 54
  74.0%
73
54
5 7
  140.0%
5
7
adjective
150 117
  78.0%
150
117
3 2
  66.7%
3
2
17 13
  76.5%
17
13
adverb
10 3
  30.0%
10
3
38 15
  39.5%
38
15
5 5
  100.0%
5
5
4 2
  50.0%
4
2

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 9 a,b
legend
a c
  d
3010

a — total number of pairs, for a given category (e.g. verb/noun)

c — number of unique pairs within set (a)

d — (c) relative to (a)

bar — proportion of (a–c):c

Table 9c
legend
a c
  d
50
45

a — unique pairs for Mike Pence

c — unique pairs for Kamala Harris

d — (c) relative to (a) (i.e. Kamala Harris relative to Mike Pence)

bars — (a) and (c)

Table 9
commentary

With the exception for verb/verb pairs, Pence had more pairs unique to him than Harris.

Detailed Part of Speech Tags

You can really get into the weeds here. Parts of speech are counted more granularly in these tables — nouns and verbs are split into classes and many other word types are shown, such as conjunctions and prepositions.

Table 10a
detailed POS tags — nouns and verbs
Count by part of speech tag: NN (noun, singular), NNP (proper noun, singular), NNPS (proper noun, plural), NNS (noun plural), VB (verb, base form), VBD (verb, past tense), VBG (verb, gerund/present participle), VBN (verb, past participle), VBP (verb, sing. present, non-3d), VBZ (verb, 3rd person sing. present)
Penn Treebank part of speech tag
NN NNP NNPS NNS VB VBD VBG VBN VBP VBZ
Mike Pence
708
11.63%
708
683
11.22%
683
38
0.62%
38
296
4.86%
296
249
4.09%
249
239
3.92%
239
107
1.76%
107
114
1.87%
114
234
3.84%
234
156
2.56%
156
Kamala Harris
714
12.19%
714
433
7.40%
433
24
0.41%
24
236
4.03%
236
285
4.87%
285
184
3.14%
184
130
2.22%
130
127
2.17%
127
229
3.91%
229
209
3.57%
209

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 10b
detailed POS tags — adjectives, pronouns, adverbs and wh-words
Count by part of speech tag: JJ (adjective), JJR (adjective, comparative), JJS (adjective, superlative), PRP (personal pronoun), PRP$ (possessive pronoun), RB (adverb), RBR (adverb, comparative), RBS (adverb, superlative), WDT (wh-determiner), WP (wh-pronoun), WP$ (possessive wh-pronoun), WRB (wh-abverb)
Penn Treebank part of speech tag
JJ JJR JJS PRP PRP$ RB RBR RBS WDT WP WP$ WRB
Mike Pence
337
5.53%
337
22
0.36%
22
11
0.18%
11
454
7.45%
454
102
1.67%
102
289
4.75%
289
7
0.11%
7
2
0.03%
2
26
0.43%
26
35
0.57%
35
43
0.71%
43
Kamala Harris
281
4.80%
281
13
0.22%
13
10
0.17%
10
488
8.33%
488
107
1.83%
107
244
4.17%
244
7
0.12%
7
3
0.05%
3
39
0.67%
39
95
1.62%
95
1
0.02%
1
50
0.85%
50

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 10c
detailed POS tags — prepositions, conjunctions, determiners and others
Count by part of speech tag: CC (coordinating conjunction), CD (cardinal digit), DT (determiner), EX (existential there), FW (foreign word), IN (preposition/subordinating conjunction), MD (modal), PDT (predeterminer), POS (possessive ending), RP (particle), TO (to), UH (interjection)
Penn Treebank part of speech tag
CC CD DT EX FW IN MD PDT POS RP TO UH
Mike Pence
283
4.65%
283
91
1.49%
91
605
9.93%
605
10
0.16%
10
656
10.77%
656
71
1.17%
71
2
0.03%
2
15
0.25%
15
16
0.26%
16
188
3.09%
188
1
0.02%
1
Kamala Harris
240
4.10%
240
81
1.38%
81
529
9.04%
529
12
0.20%
12
755
12.89%
755
105
1.79%
105
46
0.79%
46
18
0.31%
18
156
2.66%
156
4
0.07%
4

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 10
legend
a
b
c
10

a — total number of words with a given tag

b — (a) relative to all tagged words

c — (a) relative to number of words with this tag used by Donald Trump

bar — proportion of a

Table 10
commentary

Proportionately Pence had more proper nouns (Δrel=+51.6% (Δabs=+3.8%, 11.22% vs 7.4%)) and Harris used Δrel=+19.7% (Δabs=+2.1%, 12.89% vs 10.77%) more prepositions and subbordinating conjunctions.

Exclusive and Shared Usage

This section enumerates words that were exclusive to a candidate (e.g. used by one candidate but not the other). This content provides insight into what the candidates' priorities are and reveals differences in perspective on similar topics.

For a given part of speech, the table breaks down the number of words that were spoken by only one of the candidates or both candidates (intersection). The last row includes words spoken by either candidate (union).

Table 11
exclusive word usage
Total and unique words used exclusively by a candidate, or by both.
part of speech
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Mike Pence
1,070 678
100.0% 63.4%
19.9% 41.5%
392678
24636672190441081232
612 366
57.2% 59.8%
19.8% 40.6%
246366
246366
262 190
24.5% 72.5%
18.3% 38.1%
72190
72190
152 108
14.2% 71.1%
24.6% 42.0%
44108
44108
44 32
4.1% 72.7%
18.0% 37.2%
1232
1232
Kamala Harris
825 550
100.0% 66.7%
15.3% 33.7%
275550
146285781683386425
431 285
52.2% 66.1%
13.9% 31.6%
146285
146285
246 168
29.8% 68.3%
17.1% 33.7%
78168
78168
119 86
14.4% 72.3%
19.3% 33.5%
3386
3386
29 25
3.5% 86.2%
11.9% 29.1%
425
425
both candidates
3,493 405
100.0% 11.6%
64.8% 24.8%
3088405
17822157751162744514024
1,997 215
57.2% 10.8%
64.6% 23.9%
1782215
1782215
891 116
25.5% 13.0%
62.1% 23.2%
775116
775116
319 45
9.1% 14.1%
51.7% 17.5%
27445
27445
164 24
4.7% 14.6%
67.2% 27.9%
14024
14024
total
5,388 1,633
100.0% 30.3%
100.0% 100.0%
37551633
219190193649936025715886
3,092 901
57.4% 29.1%
100.0% 100.0%
2191901
2191901
1,435 499
26.6% 34.8%
100.0% 100.0%
936499
936499
617 257
11.5% 41.7%
100.0% 100.0%
360257
360257
244 86
4.5% 35.2%
100.0% 100.0%
15886
15886

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 11c
legend
a d
b e
c f
4030
40302015105

a — total number of words in set (e.g. obama \ romney, obama ∩ romney, obama ∪ romney , for a given part of speech

b — (a) relative to all exclusive words in n+v+adj+adv

c — (a) relative to all words in n+v+adj+adv

d — unique words in (a)

e — (d) relative to (a)

f — (d) relative to all unique words in n+v+adj+adv

bar1 — normalized ratio of (a-d):d

bar2 — absolute ratio of (a-d):d for all POS groups (first column) or POS group (other columns)

Table 11
commentary

Pence had more words exclusive to him for each of the noun, verb, adjective and adverb categories. He had a Δrel=+30.1% (Δabs=+4.6%, 19.9% vs 15.3%) higher fraction of exclusive words than Harris.

Pronoun Usage

This section explores pronoun use in detail. Refer to the methods section for details.

Pronoun Count

Fraction of all words that were pronouns.

Table 12a
pronoun fraction
Fraction of words that were pronouns.
speaker all pronouns
Mike Pence
6,090 1,265
100.0% 20.8%
48251265
883 50
14.5% 5.7%
83350
Kamala Harris
5,855 1,134
100.0% 19.4%
47211134
1,015 52
17.3% 5.1%
96352
total
11,945 1,858
100.0% 15.6%
100871858
1,898 60
15.9% 3.2%
183860

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 12b
exclusive and shared pronouns
Pronouns exclusive to speaker (e.g. speaker A but not speaker B) and shared by speakers (speaker A and B).
set word count
Mike Pence
12 8
0.6% 66.7%
48
Kamala Harris
17 10
0.9% 58.8%
710
both candidates
1,869 42
98.5% 2.2%
182742

Hover over fields with (e.g. 155) to download the corresponding data file.

Pronoun by Person, Gender and Count

Pronoun usage by person (1st, 2nd, 3rd), gender (masculine, feminine, neuter) and count (singular, plural).

Table 13a
Pronoun by person
Count of pronouns by first, second or third person.
pronoun person
all first second third
Mike Pence
556 19
100.0% 3.4%
2696119314910
275 6
49.5% 2.2%
2696
122 3
21.9% 2.5%
1193
159 10
28.6% 6.3%
14910
Kamala Harris
595 19
100.0% 3.2%
2606111220511
266 6
44.7% 2.3%
2606
113 2
19.0% 1.8%
1112
216 11
36.3% 5.1%
20511

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 13b
Pronoun by gender
Count of pronouns by masculine, feminine or neuter gender.
pronoun gender
all masculine feminine neuter
Mike Pence
109 7
100.0% 6.4%
243172612
27 3
24.8% 11.1%
243
19 2
17.4% 10.5%
172
63 2
57.8% 3.2%
612
Kamala Harris
152 7
100.0% 4.6%
513122822
54 3
35.5% 5.6%
513
14 2
9.2% 14.3%
122
84 2
55.3% 2.4%
822

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 13c
Pronoun by number
Count of pronouns by singular or plural.
pronoun number
all singular plural
Mike Pence
677 38
100.0% 5.6%
4002223916
422 22
62.3% 5.2%
40022
255 16
37.7% 6.3%
23916
Kamala Harris
733 37
100.0% 5.0%
4632123316
484 21
66.0% 4.3%
46321
249 16
34.0% 6.4%
23316

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 13
legend
a b
c d
153

a — total number of pronouns, by type

b — unique pronouns in (a)

c — (a) as fraction of all pronouns

d — (b) as fraction in (a)

bar — proportion of (a – b):b

Table 13
commentary

In this debate, pronoun use was quite a lot lower than in the first Trump vs Biden debate. There, Trump used 1,681 pronouns, which is +90.4% (1,681 vs 883) more than Pence (nearly twice as many) and Biden used +35.6% (1,376 vs 1,015) more pronouns than Harris.

Harris used Δrel=+19.3% (Δabs=+2.8%, 17.3% vs 14.5%) proportionately more pronouns than Pence.

Pronoun usage by person was Δrel=+10.7% (Δabs=+4.8%, 49.5% vs 44.7%) higher for Pence for 1st person pronouns but Δrel=+26.9% (Δabs=+7.7%, 36.3% vs 28.6%) higher for Harris for 3rd person pronouns.

Looking at gender, Harris used Δrel=+43.1% (Δabs=+10.7%, 35.5% vs 24.8%) more masculine pronouns but Pence used Δrel=+89.1% (Δabs=+8.2%, 17.4% vs 9.2%) more feminine pronouns. This makes sense as this indicates that they referred to one another. The differences aren't symmetric because, presumably, there were more reasons why Harris would use a masculine pronoun and not refer to Pence, whereas there were fewer reasons why Pence would use a feminine pronoun but not be referring to Harris. Basically, there are fewer women to refer to.

Both candidates used singular and plural pronouns in roughly the same proportion.

First and third person pronouns — a closer look

These tables break pronouns by interesting contrasts. For example, the ratio of singular to plural 1st person pronouns reveals the use of "I/my/myself" vs. "we/our/ours".

Table 14a
1st person pronouns, by count
Count of singular and plural first person pronouns. This table contrasts use of I/my/myself vs. we/our/ours.
pronoun
first first singular first plural
Mike Pence
275 6
100.0% 2.2%
11531543
118 3
42.9% 2.5%
1153
157 3
57.1% 1.9%
1543
Kamala Harris
266 6
100.0% 2.3%
11631443
119 3
44.7% 2.5%
1163
147 3
55.3% 2.0%
1443
Table 14b
3rd person pronouns, by count
Count of singular and plural third person pronouns. This table contrasts he/she/his/her/it vs. they/them/theirs.
pronoun
third third singular third plural
Mike Pence
159 10
100.0% 6.3%
1027473
109 7
68.6% 6.4%
1027
50 3
31.4% 6.0%
473
Kamala Harris
216 11
100.0% 5.1%
1457604
152 7
70.4% 4.6%
1457
64 4
29.6% 6.2%
604
Table 14c
Me and you — 1st person singular and second person pronouns
Count of 1st person singular and second person pronouns. This table contrasts me/my/myself vs you/yours/yourself.
pronoun
all 1st singular 2nd
Mike Pence
240 6
100.0% 2.5%
11531193
118 3
49.2% 2.5%
1153
122 3
50.8% 2.5%
1193
Kamala Harris
232 5
100.0% 2.2%
11631112
119 3
51.3% 2.5%
1163
113 2
48.7% 1.8%
1112
Table 14d
I, me, myself and my — closer look at 1st person singular pronouns
Count of specific 1st person singular pronouns: I, me, myself and my.
pronoun
all I me myself my
Mike Pence
118
100.0%
102.0007.0000.0009.000
102
86.4%
102.000
7
5.9%
7.000
0
0.0%
0.000
9
7.6%
9.000
Kamala Harris
119
100.0%
103.00010.0000.0006.000
103
86.6%
103.000
10
8.4%
10.000
0
0.0%
0.000
6
5.0%
6.000
Table 14
legend
a b
c d
153

a — total number of pronouns, by type

b — unique pronouns in (a) (if more than one)

c — (a) as fraction of all pronouns

d — (b) as fraction in (a) (if less than 100%)

bar — proportion of (a – b):b

Table 14
commentary

Both Pence and Harris, despite their different word counts, used nearly the same number of 1st person pronouns and their proportion of singular and plural forms was nearly identical. Both candidates used roughly 50% more 1st person plural than singular pronouns.

Harris did use +35.8% (216 vs 159) more 3rd person pronouns. Here, the use of plural is about half of the singular.

Both had mostly the same fraction of me vs. you and the same fraction of various forms of 1st person singular pronouns.

The results of this section are quite balanced and nothing like the first Trump vs. Biden debatea>, where the differences were stark.

Pronouns by Category

This table tallies the use of pronoun by category. The categories are personal, demonstrative, indefinite, object, possessive, interrogative, others, relative, reflexive. Note that some pronouns that belong to multiple categories are counted in only one. For a list of pronouns for each category, see the pronoun methods section.

Table 15
Pronouns by cateogry
Count of pronouns by category.
pronoun category
all personal demonstrative indefinite object possessive interrogative others relative reflexive
Mike Pence
883
100.0%
427.000163.00081.00033.00093.00043.00021.00022.0003.000
427
48.4%
4207
163
18.5%
1594
81
9.2%
6120
33
3.7%
285
93
10.5%
885
43
4.9%
403
21
2.4%
174
22
2.5%
211
3
0.3%
12
Kamala Harris
1,015
100.0%
451.000176.00078.00044.00098.000114.00032.00020.0002.000
451
44.4%
4447
176
17.3%
1724
78
7.7%
5919
44
4.3%
395
98
9.7%
926
114
11.2%
1095
32
3.2%
284
20
2.0%
191
2
0.2%
11
Table 15
legend
a b
15

a — total number of pronouns, by category

b — (a) as fraction of all pronouns

bar — proportion of (a)

Table 15
commentary

The biggest difference here is Harris' more frequent use of interrogative pronouns, which she used +165.1% (114 vs 43) more often than Pence. In fact, proportionately, Harris' had Δrel=+128.6% (Δabs=+6.3%, 11.2% vs 4.9%) more interrogative pronouns. She was asking more questions.

Noun Phrase Usage

Noun phrases were extracted from the text and analyzed for frequency, word count, unique word count and richness. Single-word phrases were not counted.

Top-level noun phrases are those without a parent noun phrase (a parent phrase is one that a similar, longer phrase). Derived noun phrases are those with a parent (more details about noun phrase analysis).

The top-level noun phrases can be interpreted as independent concepts. Derived noun phrases can be interpreted as variants on concepts embodied by the top-level phrases.

Noun Phrase Count and length

This table reports the absolute number of noun phrases, which is related to the number of nouns, and their length.

Table 16a
noun phrase count
Counts of noun phrases in words and per noun.
speaker noun phrase count
all top-level
Mike Pence
547 203
100.0% 37.1%
0.32 0.34
344203
383 197
70.0% 51.4%
0.22 0.33
186197
Kamala Harris
398 185
100.0% 46.5%
0.29 0.36
213185
306 183
76.9% 59.8%
0.22 0.35
123183

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 16b
noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
speaker noun phrase length
all top-level
Mike Pence
2.24 2 3
2.2412.0003.000
2.30 2 3
2.2982.0003.000
Kamala Harris
2.18 2 3
2.1832.0003.000
2.24 2 3
2.2392.0003.000

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 16a
legend
a d
b e
c f
1070

a — number of noun phrases

b — (a) relative to number of all noun phrases

c — number of noun phrases per noun

d — number of unique phrases

e — (c) relative to (a)

f — number of unique noun phrases per unique noun

bar — normalized ratio of (a–c):c

Table 16b
legend
a b c
102080

a — average noun phrase size, in words

b — largest noun phrase size in 50% of content

c — largest noun phrase size in 90% of content

bar — proportion of a:b:c


Table 16
commentary

Pence delivered +37.4% (547 vs 398) more noun phrases than Harris, which he repeated sligthly more often.

Exclusive and Shared Noun Phrase Count and length

Table 17a
exclusive and shared noun phrase count
Counts of exclusive and shared noun phrases in words and per noun.
speaker noun phrase count
all top-level
Mike Pence
369 179
39.0% 48.5%
190179
309 177
83.7% 57.3%
132177
Kamala Harris
285 166
30.2% 58.2%
119166
254 167
89.1% 65.7%
87167
both candidates
291 23
30.8% 7.9%
26823
126 19
43.3% 15.1%
10719

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 17b
exclusive and shared noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
speaker noun phrase length
all top-level
Mike Pence
2.34 2 3
2.3442.0003.000
2.35 2 3
2.3532.0003.000
Kamala Harris
2.23 2 3
2.2352.0003.000
2.26 2 3
2.2642.0003.000
both candidates
2.04 2 2
2.0382.0002.000
2.09 2 3
2.0872.0003.000

Hover over fields with (e.g. 155) to download the corresponding data file.

Table 17a
legend
a c
b d
1070

a — number of noun phrases

b — (a) relative to number of all noun phrases

c — number of unique phrases

d — (c) relative to (a)

bar — normalized ratio of (a–c):c

Table 17b
legend
a b c
102080

a — average noun phrase size, in words

b — largest noun phrase size in 50% of content

c — largest noun phrase size in 90% of content

bar — proportion of a:b:c


Table 17
commentary

Pence had +29.5% (369 vs 285) more exclusive noun phrases.

Windbag Index

The Windbag Index is a compound measure that characterizes the complexity of speech. A low index is indicative of succinct speech with low degree of repetition and large number of independent concepts.

Unlike the Flesch-Kincaid readability metrics, the Windbag Index does not take into account the length of sentences or complexity (e.g. number of syllables) of individual words.

Table 18
windbag index
Windbag Index for each speaker. The higher the value, the more repetitive the speech.
speaker Windbag Index
index value index terms
Mike Pence
555
+107.4%
555.027822981642
0.511 0.365 0.348 0.422 0.482 0.378 0.371 0.970
+15.0% -5.5% -7.5% -3.1% -3.2% -34.7% -20.2% -1.9%
0.5106732348111660.3652733118971060.348334307422560.4220430107526880.4823529411764710.3782051282051280.3711151736745890.970443349753695
Kamala Harris
267
-51.8%
267.551805128743
0.444 0.386 0.377 0.436 0.498 0.580 0.465 0.989
-13.0% +5.8% +8.1% +3.2% +3.3% +53.2% +25.3% +1.9%
0.4442356959863360.386389850057670.3765387400434470.4356005788712010.4981949458483750.5795454545454550.4648241206030150.989189189189189
Table 18
legend
The Windbag Index is 1/(t1*t2*...*t9) where t1,t2,...,t8 are

t1 — fraction of words that are non-stop

t2 — fraction of non-stop words that are unique

t3 — fraction of nouns that are unique

t4 — fraction of verbs that are unique

t5 — fraction of adjectives that are unique

t6 — fraction of adverbs that are unique

t7 — fraction of noun phrases that are unique

t8 — fraction of noun phrases that are top-level


Large individual terms t1...t9 contribute to a smaller index.

The percentage values below the index and each term are relative differences to the other speaker's corresponding term (i.e. 100*(a-b)/b where a is the value for one speaker and b for the other).
Table 18
commentary

Pence's Windbag Index is +107.9% (555 vs 267) larger than Harris'. To put things in perspective, Trump's Windbag Index from the first debate was 1,200 and Biden's was 539.

The reason why Pence's index is so much higher is his much smaller fraction of unique adverbs and unique noun phrases, despite his larger fraction of non-stop words.

Word Clouds

In the word clouds below, the size of the word is proportional to the number of times it was used by a candidate (method details).

Not all words from a group used to draw the cloud fit in the image — less frequently used words for large word groups may fall outside the image.

All Words for Each Candidate

Each candidate's debate portion was extracted and frequencies were compiled for each part of speech (noun, verb, adjective, adverb), with words colored by their part of speech category.

The distribution of sizes within a tag cloud follows the frequency distribution of words. However, word size cannot be compared between clouds, since the minimum and maximum size of the words is fixed.

Debate Word Cloud for Mike Pence - all words

Debate tag cloud for Mike Pence
Size proportional to word frequency. Color encodes part of speech: noun verb adjective adverb

Debate Word Cloud for Kamala Harris - all words

Debate tag cloud for Kamala Harris
Size proportional to word frequency. Color encodes part of speech: noun verb adjective adverb
commentary

Both candidates used "American" very frequently. Recall that Trump in the first debate never said that word.

Pence also said "president" proportionately more often than Harris.

Exclusive Words for Each Candidate

The clouds below show words used exlusively by a candidate. For example, if candidate A used the word "invest" (any number of times), but candidate B did not, then the word will appear in the exclusive word tag cloud for candidate A.

Words exclusive to Mike Pence

Debate tag cloud for Mike Pence
Size proportional to word frequency. Color encodes part of speech: noun verb adjective adverb

Words exclusive to Kamala Harris

Debate tag cloud for Kamala Harris
Size proportional to word frequency. Color encodes part of speech: noun verb adjective adverb
commentary

Pence's unique words had quite a lot of adjectives in the center of the word cloud, such as "liberal", "new", "average" and "strong". Pence tended to repeat his unique adjectives.

Harris, on the other hand, repeated her unique adverbs "certainly", "almost", "clearly" and "incredibly" followed by nouns.

The difference in these distributions suggests more substantive concepts in Harris' unique words.

Pronouns for Each Candidate

Word clouds based on only pronouns.

Pronouns for Mike Pence

Debate tag cloud for Mike Pence
Size proportional to word frequency. Color encodes pronoun type: masculine feminine neuter 1st person 2nd person singular plural other

Pronouns for Kamala Harris

Debate tag cloud for Kamala Harris
Size proportional to word frequency. Color encodes pronoun type: masculine feminine neuter 1st person 2nd person singular plural other
commentary

Pronoun usage was very similar.

Part of Speech Word Clouds

In these clouds, words from each major part of speech were colored based on whether they were exclusive to a candidate or shared by the candidates.

The size of the word is relative to the frequency for the candidate — word sizes between candidates should not be used to indicate difference in absolute frequency.

Cloud of noun words, by speaker

Words unique to each candidate (Pence, Harris) and those spoken by both.
commentary

Just as for the first Trump vs Biden debate, we see nouns dominating the exclusive word list for the Democractic candidate. This suggests more of a focus on concepts and ideas.

Cloud of verb words, by speaker

Words unique to each candidate (Pence, Harris) and those spoken by both.
commentary

Unique verb use was more balanced. Pence wants to "continue" and we all remember when Kamala said "speaking".

Cloud of adjective words, by speaker

Words unique to each candidate (Pence, Harris) and those spoken by both.
commentary

Unique adjectives by Pence were repeated a lot more than those by Harris.

Cloud of adverb words, by speaker

Words unique to each candidate (Pence, Harris) and those spoken by both.
commentary

Unique adverb use is a flip of the unique adjective use.

Cloud of all words, by speaker

Words unique to each candidate (Pence, Harris) and those spoken by both.
commentary

The distribution of color in this cloud is nearly the same as from the first Trump vs. Biden debate. Most of the words in the center were from the Democratic candidate (these are exclusive and often repeated).

Word Pair Clouds for Each Candidate

Pairs used only once during the debate are not shown.

word pairs for Mike Pence

JJ/JJ by Mike Pence
JJ/RB by Mike Pence
JJ/N by Mike Pence
JJ/V by Mike Pence
RB/RB by Mike Pence
RB/N by Mike Pence
RB/V by Mike Pence
N/N by Mike Pence
N/V by Mike Pence
V/V by Mike Pence

word pairs for Kamala Harris

JJ/JJ by Kamala Harris
JJ/RB by Kamala Harris
JJ/N by Kamala Harris
JJ/V by Kamala Harris
RB/RB by Kamala Harris
RB/N by Kamala Harris
RB/V by Kamala Harris
N/N by Kamala Harris
N/V by Kamala Harris
V/V by Kamala Harris
commentary

Pence drives "unleashing american", "american people", "back regulation" and "people deserve".

Meanwhile, Harris also repeats "american people" but also "president said" and "people know".

Downloads

Debate transcript

Parsed word lists and word clouds (word lists, part of speech lists, noun phrases, sentences) (word clouds)

Raw data structure

Please see the methods section for details about these files.