▲ Things are getting dumber. The Flesch-Kincaid grade level for each of the Presidential debates in 1960, 2008, 2012, 2016 and 2020.

Word Analysis of 2020 U.S. Presidential Debates

Trump vs. Biden / Harris vs. Pence

Introduction

He counts your words (even those pronouns), an article in the NYT about Pennebaker's approach to analysis of debates and Al Qaeda communication

Lexical Analysis of Obama's and McCain's Speeches by Jacques Savoy

Presidential word use in State of the Union addresses by Jonathan Corum.

Naming Names, a NYT article about candidates' reference to each other during debates (uses Circos).

Randomly Generated Trump Transcripts

If you want more, get more. The debate continues endlessly with Tripsum: Trump Lorem Ipsum—randomly generated text based transcripts from the 2016 Clinton vs Trump debates.

On these pages, I explore word usage in the 2020 U.S. Presidential debates between Donald Trump and Joe Biden and in the Vice-Presidential debate between Mike Pence and Kamala Harris.

Impatient? Skip to the full word analysis of the 1st U.S. Presidential Debate between Donald Trump and Joe Biden.

Formal debates present a unique opportunity to compare the speech patterns of candidates. The debate's format is controlled — though the debates have been thusfar unruly — and each speaker is subject to the same question (in principle) and is given the same amount of time to respond.

That being said, the dynamics of a debate can be greatly affected by one candidate, who can hijack the conversation and use interruptions to influence their opponent's natural style. Thus, the results of the debate analysis cannot be taken out of the context of the debate.

It's important to stress that this analysis is structural and not semantic. I look in detail of how things are said rather than what is said. However, there is a strong connection between the use of specific words (e.g. pronouns) and the speaker's inner dialogue (Your Use of Pronouns Reveals Your Personality).

I use transcripts from rev.com and explore themes such grade level, readability, sentence size, parts of speech usage, pronoun usage, unique and shared words and use of concepts. And I cannot help but draw some word clouds.

The analysis is fully automated and uses the Natural Language Toolkit for tokenizing, tagging and chunking. All data and word lists (tagged and chunked) are available for download in plain-text format — you are welcome to use these files in any manner.

past years

Results from past years are available: 2008 debate analysis, 2012 debate analysis and 2016 debate analysis. Each year's analysis is a collection of stand-alone pages. For a given year, each of the three Presidential debates and the Vice-Presidential debate results are structured identically.

The analysis for the 2020 debates uses a different part of speech tagging engine (NLTK) than used in previous years (Brill tagger). Keep this in mind if you're comparing the 2020 results to past years.

Methods

Transcripts by the Washington Post for each debate were parsed to extract sections for each speaker, chunk the text into sentences and words, tag each word with its part of speech (tagging), and identify noun phrases (chunking).

The tagged and chunked transcripts are analyzed to determine

• reading ease and grade level using the Flesch-Kincaid metric
• word frequency distribution for each candidate
• sentence size and proportion of unique words
• words exclusive to a candidate and those shared by both candidates
• pronoun use by person, gender and count
• frequency of concepts, as defined by part of speech pairings (e.g. noun/verb)
• complexity of noun phrases
• word clouds for a variety of word lists extracted from the transcripts (e.g. all nouns unique to Trump)

I attempt to quantify the overall complexity and repetition by a metric I call the Windbag Index, which is a product of 8 terms each measuring uniqueness in different aspects of speech (more about Windbag Index).

A full description of each of the steps in the analysis is available in the detailed methods section.

The analysis has some limitations.

Results and Commentary

Each debate analysis report contains a lot of data but is shown in exactly the same format, which should help you with making comparisons between debates. To start, you may find these elements the most interesting

• readability and grade level
• pronoun usage
• accounting and word clouds of words used exclusively by a candidate

Results are shown in a tabular format. From each table you can download the word list used to generate it. This makes it easy to, for example, grab all the adjectives used by Biden or all the verbs that Trump used that Biden did not use.

detailed results – tables, word clouds and commentary

Analysis of Donald Trump vs. Joe Biden (1st debate)

Analysis of Donald Trump vs. Joe Biden (town halls)

Analysis of Donald Trump vs. Joe Biden (3nd debate)

Analysis of Donald Trump vs. Joe Biden (combined debates)

Analysis of Mike Pence vs. Kamala Harris

Visualizing the Debates

Each debate is visualized using tables and word clouds — there's obviously a ton more than can be done. The word clouds visually show the words and their frequency and tables provide detailed statistics. You can download each word list directly from the tables.

tables & basic word clouds

▲ Word usage tables describe the structural characteristics of speech by frequency of words, sentence size, proportion of unique and exclusive words and breakdown of words by part-of-speech • see example

▲ Word clouds for each candidate, categorized by parts of speech.

▲ Word clouds, categorized by ownership.

▲ Word clouds for concepts based on part-of-speech pairs.

Candidates's Word Usage Profiles

Below are a few of the tables available in the full results section.

Readability and Grade Level

The Flesch–Kincaid readability tests are designed to indicate how difficult a passage in English is to understand. There are two tests, the Flesch Reading Ease, and the Flesch–Kincaid Grade Level.

Table 2a

readability

Flesch-Kincaid reading ease and grade level.

speaker

grade level

reading ease

sections

sentences

words

syllables

Donald Trump

3.60 download text •
100.0%

85.88
100.0%

313
100.0%

815
100.0%

7,612
100.0%

10,030
100.0%

Joe Biden

4.27 download text •
118.6%

82.74
96.3%

249
79.6%

664
81.5%

6,813
89.5%

9,155
91.3%

Hover over fields with • (e.g. 155•) to download the corresponding data file.

▲ The Flesch-Kincaid grade level is actually a periodic quantity.

Sentence Size

Sentence size with and without stop words.

Table 3

sentence size

Number of sentences spoken by each speaker and sentence word count statistics. Number of words in a sentence is shown by average and 50%/90% cumulative values for all, stop and non-stop words.

speaker

number of sentences

sentence size

all words

stop words

non-stop words

Donald Trump

815

9.5 download text •	12	29

5.4 download text •	7	18

4.1 download text •	5	12

Joe Biden

664

10.4 download text •	15	34

6.0 download text •	9	20

4.3 download text •	6	16

total

1,479

11.9	14	33

7.7	9	20

6.2	7	14

Hover over fields with • (e.g. 155•) to download the corresponding data file.

Part of Speech

Total and unique nouns, verbs, adjectives and adverbs. The parts of speech are identified by their Penn Treebank tags.

Table 7

part of speech count

Count of words categorized by part of speech (POS).

part of speech

n+v+adj+adv

nouns (n)

verbs (v)

adjectives (adj)

adverbs (adv)

Donald Trump

3,042 download text •	934
39.5%	30.7%

1,363 download text •	512
44.8%	37.6%

1,115 download text •	275
36.7%	24.7%

346 download text •	144
11.4%	41.6%

218 download text •	66
7.2%	30.3%

Joe Biden

2,695 download text •	991
39.2%	36.8%

1,302 download text •	530
48.3%	40.7%

938 download text •	320
34.8%	34.1%

306 download text •	145
11.4%	47.4%

149 download text •	49
5.5%	32.9%

total

5,737 download text •	1,509
39.3%	26.3%

2,665 download text •	845
46.5%	31.7%

2,053 download text •	474
35.8%	23.1%

652 download text •	227
11.4%	34.8%

367 download text •	86
6.4%	23.4%

Hover over fields with • (e.g. 155•) to download the corresponding data file.

Pronoun usage

English has many pronouns. Here is an accounting of pronoun use by 1st (e.g. I, we, our), 2nd (e.g. you, yours) or 3rd (e.g. he, she, his, them) person.

Table 13a

Pronoun by person

Count of pronouns by first, second or third person.

pronoun person

all

first

second

third

Donald Trump

1,188 download text •	19
100.0%	1.6%

398 download text •	7
33.5%	1.8%

311 download text •	2
26.2%	0.6%

479 download text •	10
40.3%	2.1%

Joe Biden

812 download text •	19
100.0%	2.3%

235 download text •	6
28.9%	2.6%

128 download text •	2
15.8%	1.6%

449 download text •	11
55.3%	2.4%

Hover over fields with • (e.g. 155•) to download the corresponding data file.

Pronoun contrasts

These tables break pronouns by interesting contrasts. For example, the ratio of singular to plural 1st person pronouns reveals the use of "I/my/myself" vs. "we/our/ours".

Table 14a

1st person pronouns, by count

Count of singular and plural first person pronouns. This table contrasts use of I/my/myself vs. we/our/ours.

pronoun

first

first singular

first plural

Donald Trump

398	7
100.0%	1.8%

241	3
60.6%	1.2%

157	4
39.4%	2.5%

Joe Biden

235	6
100.0%	2.6%

128	3
54.5%	2.3%

107	3
45.5%	2.8%

Table 14b

3rd person pronouns, by count

Count of singular and plural third person pronouns. This table contrasts he/she/his/her/it vs. they/them/theirs.

pronoun

third

third singular

third plural

Donald Trump

479	10
100.0%	2.1%

300	7
62.6%	2.3%

179	3
37.4%	1.7%

Joe Biden

449	11
100.0%	2.4%

355	7
79.1%	2.0%

94	4
20.9%	4.3%

Table 14c

Me and you — 1st person singular and second person pronouns

Count of 1st person singular and second person pronouns. This table contrasts me/my/myself vs you/yours/yourself.

pronoun

all

1st singular

2nd

Donald Trump

552	5
100.0%	0.9%

241	3
43.7%	1.2%

311	2
56.3%	0.6%

Joe Biden

256	5
100.0%	2.0%

128	3
50.0%	2.3%

128	2
50.0%	1.6%

Table 14d

I, me, myself and my — closer look at 1st person singular pronouns

Count of specific 1st person singular pronouns: I, me, myself and my.

pronoun

all

myself

Donald Trump

241
100.0%

188
78.0%

42
17.4%

0
0.0%

11
4.6%

Joe Biden

128
100.0%

98
76.6%

12
9.4%

0
0.0%

18
14.1%

Windbag Index

The Windbag Index is a compound measure that characterizes the complexity of speech. A low index is indicative of succinct speech with low degree of repetition and large number of independent concepts. A large number suggests a stream of repeating words.

Table 18

windbag index

Windbag Index for each speaker. The higher the value, the more repetitive the speech.

speaker

Windbag Index

index value

index terms

Donald Trump

1,200
+122.6%

0.430	0.301	0.376	0.247	0.416	0.303	0.561	0.984
+2.9%	-17.4%	-7.7%	-27.7%	-12.2%	-7.9%	-2.7%	+0.7%

Joe Biden

539
-55.1%

0.417	0.365	0.407	0.341	0.474	0.329	0.576	0.977
-2.9%	+21.1%	+8.4%	+38.3%	+13.9%	+8.6%	+2.8%	-0.7%

Word Clouds

Word clouds below are colored by part of speech: noun verb adjective adverb

▲ Words exclusive to Joe Biden (not spoken by Donald Trump) in the first debate, colored by part of speech.

▲ Words exclusive to Donald Trump (not spoken by Joe Biden) in the first debate, colored by part of speech.

Word clouds below are colored by speaker: Trump Biden both

▲ All nouns in debates, colored by contributing speaker (Trump: red, Biden: blue, spoken by both: grey).

▲ All verbs in debates, colored by contributing speaker (Trump: red, Biden: blue, spoken by both: grey).

Discussion

Let's hope that after things get worse, they get better.

Downloads

Content of word list archive and data structure syntax is described in the methods section.

Donald Trump vs. Joe Biden (1st debate) transcript word lists and tag clouds data structure

Donald Trump vs. Joe Biden (town halls) transcript word lists and tag clouds data structure

Donald Trump vs. Joe Biden (3nd debate) transcript word lists and tag clouds data structure

Donald Trump vs. Joe Biden (combined debates) transcript word lists and tag clouds data structure

Mike Pence vs. Kamala Harris transcript word lists and tag clouds data structure

available analyses

Word Analysis of 2020 U.S. Presidential Debates

Trump vs. Biden / Harris vs. Pence

Introduction

2020 Debate Analysis

2016 Debate Analysis

2012 Debate Analysis

2008 Debate Analysis

1960 Debate Analysis — the gold standard

Other Political Debate Analyses

Randomly Generated Trump Transcripts

past years

Methods

Results and Commentary

detailed results – tables, word clouds and commentary

Visualizing the Debates

tables & basic word clouds

Candidates's Word Usage Profiles

Readability and Grade Level

Sentence Size

Part of Speech

Pronoun usage

Pronoun contrasts

Windbag Index

Word Clouds

Discussion

Downloads