Word Analysis of 2020 U.S. Presidential Debates
Donald Trump vs. Joe Biden (1st debate)
29 September 2020
Introduction
This was a very chaotic debate. Let's get into it.
Speaking Turns and Interruptions
Here, I look at the length of each turn of uninterrupted speech.
Table 1
length of sections in words
The number of uninterrupted deliveries (sections), mode/median/mean length of sections in words, and the shortest section length in words that composed 10%, 50% and 90% of the debate.
Trump spoke more often than Biden +25.7% (313 vs 249) and the length of Trump's deliver was also slightly shorter when looking at the mode –20.0% (4 vs 5) and median –11.3% (24.3 vs 27.4) section length. The contiguity of the debate was low — we see that 50% of Trump's entire delivery was in sections shorter than 83. For Biden, 50% of his delivery was in sections shorter than 92 words.
Flesch-Kincaid Reading Ease and Grade Level
The Flesch-Kincaid reading ease and grade level metrics are designed to indicate how difficult a passage in English is to understand.
This metric does not take repetition into account. A grade level 10 sentence that is repeated 100 times still generates the same metrics because the words per sentence and syllables per word remain constant. To measure how many times a speaker repeats themselves, I use my Windbag Index, below.
Reading ease ranges from 100 (easiest) down to 0 (hardest) and can be interpreted as follows
Very easy to read. Easily understood by an average 11-year-old student. |
Easy to read. Conversational English for consumers. |
Fairly easy to read. |
Plain English. Easily understood by 13- to 15-year-old students. |
Fairly difficult to read. |
Difficult to read. |
Very difficult to read. Best understood by college/university graduates. |
Extremely difficult to read. Best understood by college/university graduates. |
The grade level corresponds roughly to a U.S. grade level. It has a minimum value of –3.4 and no upper bound.
Two sets of readability scores are calculated. One for the entire debate and one that only considers section with at least 9 words.
Table 2a
readability — entire debate
Flesch-Kincaid reading ease and grade level.
Table 2b
readability — excluding short sections
Flesch-Kincaid reading ease and grade level for sections with at least 9 words.
Trump's grade level was –15.7% (3.6 vs 4.27) lower than Biden's, if we consider the entire debate. For sections of at least 9 words in length (this attempts to exclude short interruptions) his grade level was –18.1% (4.02 vs 4.91) lower. There is less of a spread in the reading ease for both speakers — both scored at least 80.
Sentence Size
Table 3
sentence size
Number of sentences spoken by each speaker and sentence word count statistics. Number of words in a sentence is shown by average and 50%/90% cumulative values for all, stop and non-stop words.
Biden had marginally longer sentences, on average, but in as messy a debate as this was, this metric is very sensitive to how interruptions were transcribed.
Word Statistics
Debate Word Count
Summary Word Count
The summary word count reports the total number of words and the
number of unique, non-stop words
used by each candidate. Word number is expressed as both absolute and
relative values.
Table 4a
all words
Number of all words and unique words used by each speaker.
Table 4b
exclusive and shared words
Words exclusive to speaker (e.g. speaker A
but not speaker B) and shared by speakers (speaker A
and B).
Trump delivered +12.1% (7,708 vs 6,873) more words but had –4.2% (1,131 vs 1,181) fewer unique words. Trump spoke 1,004 words that Biden did not use but in this set there were –8.2% (562 vs 612) fewer unique words than in Biden's set of 930 words that Trump did not use.
Stop Word Contribution
In the table below, the candidates' delivery is partitioned into stop and non-stop words. Stop words (full list) are frequently-used bridging words (e.g. pronouns and conjunctions) whose meaning depends entirely on context. The fraction of words that are stop words is one measure of the complexity of speech.
Table 5a
non-stop words
Counts of stop and non-stop words.
Table 5b
exclusive and shared non-stop words
Non-stop words exclusive to speaker (e.g. speaker A
but not speaker B) and shared by speakers (speaker A
and B).
Both candidates used roughly the same fraction of stop words, 57.0% for Trump and 58.3% for Biden.
Word frequency
The word frequency table summarizes the frequency with which words
were used. I show the average word frequency and the weighted
cumulative frequencies at 50 and 90 percentile. The average word
frequency indicates how many times, on average, a word is used. For a
given fraction of the entire delivery, the weighted cumulative
frequency indicates the largest word frequency within this fraction
(details about weighted
cumulative distribution).
Table 6a
word use frequency
Average and 50%/90% percentile word frequencies.
Table 6b
exclusive and shared non-stop word use frequency
Average and 50%/90% cumulative percentile word frequencies. Non-stop words exclusive to speaker (e.g. speaker A
but not speaker B) and shared by speakers (speaker A
and B).
Trump used the same words more often. His average word frequency was +17.2% (6.8 vs 5.8) higher than Biden. When considering words unique to a candidate, Trump repeated these +17.9% (1.78 vs 1.51) more often than Biden did for words that Trump didn't use.
All further word use statistics represent content that has been filtered for stop words, unless explicitly indicated.
Part of Speech Analysis
In this section, word frequency is broken down by their part of
speech (POS). The four POS groups examined are nouns, verbs,
adjectives and adverbs. Conjunctions and prepositions are not
considered. The first category (n+v+adj+adv) is composed of all four
POS groups.
Part of Speech Count
Table 7
part of speech count
Count of words categorized by part of speech (POS).
Fewer of Trump's words were nouns than Bidens by –7.2% (44.8 vs 48.3) but proportionately he used more adverbs by +30.9% (7.2 vs 5.5).
Part of Speech Frequency
Table 8
part of speech frequency
Frequency of words categorized by part of speech (POS).
We've already seen that Trump repeated his words more often. Here, we see that the largest increase in repetition over Biden was in his verbs, which he repeated +38.2% (4.05 vs 2.93) more times, on average. He also repeated his pronouns +26.7% (31.13 vs 24.57) more often.
Part of Speech Pairing
Through word pairing, I extract concepts from the text. The number of unique word pairs is a function of sentence length and is one of the measures of complexity.
Table 9a
part of speech pairing — Donald Trump
Word pairs (total and unique) categorized by part of speech (POS)
Table 9b
part of speech pairing — Joe Biden
Word pairs (total and unique) categorized by part of speech (POS)
Table 9c
unique part of speech pairing — candidate comparison
Unique word pairs categorized by part of speech (POS)
Trump had more pairs of adjectives and nouns by +17.1% (178 vs 152) and more pairs of adverbs and verbs +22.6% (38 vs 31) than Biden. The words pairings unique to a candidate are quite interesting to explore. For example, Trump used the noun noun pair "word smart" (I'm sure you remember it) whereas Biden had noun noun pairs like "kitchen table" and "COVID crisis".
You can really get into the weeds here. Parts of speech are counted more granularly in these tables — nouns and verbs are split into classes and many other word types are shown, such as conjunctions and prepositions.
Table 10a
detailed POS tags — nouns and verbs
Count by part of speech tag:
NN (noun, singular),
NNP (proper noun, singular),
NNPS (proper noun, plural),
NNS (noun plural),
VB (verb, base form),
VBD (verb, past tense),
VBG (verb, gerund/present participle),
VBN (verb, past participle),
VBP (verb, sing. present, non-3d),
VBZ (verb, 3rd person sing. present)
Table 10b
detailed POS tags — adjectives, pronouns, adverbs and wh-words
Count by part of speech tag:
JJ (adjective),
JJR (adjective, comparative),
JJS (adjective, superlative),
PRP (personal pronoun),
PRP$ (possessive pronoun),
RB (adverb),
RBR (adverb, comparative),
RBS (adverb, superlative),
WDT (wh-determiner),
WP (wh-pronoun),
WP$ (possessive wh-pronoun),
WRB (wh-abverb)
Table 10c
detailed POS tags — prepositions, conjunctions, determiners and others
Count by part of speech tag:
CC (coordinating conjunction),
CD (cardinal digit),
DT (determiner),
EX (existential there),
FW (foreign word),
IN (preposition/subordinating conjunction),
MD (modal),
PDT (predeterminer),
POS (possessive ending),
RP (particle),
TO (to),
UH (interjection)
Trump used +44.7% (330 vs 228) more proper nouns and +52.0% (1,099 vs 723) more prepositions than Biden. His use of singular present non-3rd person verbs (VBP) was +66.6% (488 vs 293) higher but his use of 3rd person singular verbs (VPZ) was –35.8% (230 vs 358) lower. Trump also spoke in the past tense +69.2% (340 vs 201) more than Biden.
Exclusive and Shared Usage
This section enumerates words that were exclusive to a
candidate (e.g. used by one candidate but not the other). This content
provides insight into what the candidates' priorities are and
reveals differences in perspective on similar topics.
For a given part of speech, the table breaks down the number of
words that were spoken by only one of the candidates or both
candidates (intersection). The last row includes words spoken by
either candidate (union).
Table 11
exclusive word usage
Total and unique words used exclusively by a candidate, or by both.
Biden used 24 adverbs that Trump didn't use, meanwhile trump used 48 adverbs +100.0% (48 vs 24) that Biden didn't use. These lists are fun to look at. Trump used adverbs like "certainly" and "definitely" and "powerfully" and "phenomenally" whereas Biden used "constantly" and "fairly" and "honorably" and "socially".
Pronoun Usage
This section explores pronoun use in detail. Refer to the methods section for details.
Pronoun Count
Fraction of all words that were pronouns.
Table 12a
pronoun fraction
Fraction of words that were pronouns.
Table 12b
exclusive and shared pronouns
Pronouns exclusive to speaker (e.g. speaker A
but not speaker B) and shared by speakers (speaker A
and B).
Pronoun by Person, Gender and Count
Pronoun usage by person (1st, 2nd, 3rd), gender (masculine, feminine, neuter) and count (singular, plural).
Table 13a
Pronoun by person
Count of pronouns by first, second or third person.
Table 13b
Pronoun by gender
Count of pronouns by masculine, feminine or neuter gender.
Table 13c
Pronoun by number
Count of pronouns by singular or plural.
It gets messy but interesting here. Looking at pronouns by person, Trump used +69.4% (398 vs 235) more 1st person pronouns and +143.0% (311 vs 128) more 2nd person pronouns than Biden. In other words, he referred to himself and Biden much more than Biden did (to Trump and himself, respectively). Looking at pronouns by gender, Trump used –54.1% (100 vs 218) fewer masculine pronouns and +48.1% (191 vs 129) more neuter pronouns than Biden.
First and third person pronouns — a closer look
These tables break pronouns by interesting contrasts. For example, the ratio of singular to plural 1st person pronouns reveals the use of "I/my/myself" vs. "we/our/ours".
Table 14a
1st person pronouns, by count
Count of singular and plural first person pronouns. This table contrasts use of I/my/myself vs. we/our/ours.
Table 14b
3rd person pronouns, by count
Count of singular and plural third person pronouns. This table contrasts he/she/his/her/it vs. they/them/theirs.
Table 14c
Me and you — 1st person singular and second person pronouns
Count of 1st person singular and second person pronouns. This table contrasts me/my/myself vs you/yours/yourself.
Table 14d
I, me, myself and my — closer look at 1st person singular pronouns
Count of specific 1st person singular pronouns: I, me, myself and my.
We've already seen that Trump used +22.2% (1,681 vs 1,376) more pronouns than Biden. Breaking usage down by categories, Trump used +69.4% (398 vs 235) more 1st person pronouns and his delivery had Δrel=+11.2% (Δabs=+6.1%, 60.6% vs 54.5%) proportionately more 1st person singular pronouns (e.g. me, my, myself). Biden, on the other hand, had Δrel=+15.5% (Δabs=+6.1%, 45.5% vs 39.4%) proportionately more 1st person plural pronouns (e.g. we, us, our, ours).
When looking at 3rd person pronouns, Trump had Δrel=+78.9% (Δabs=+16.5%, 37.4% vs 20.9%) more use of the 3rd person plural (e.g. they, them) than Biden.
Considering only the 1st person singular (e.g. me, my, myself) and 2nd person pronouns (e.g. you, yours), Biden actually used the 1st person singular Δrel=+14.4% (Δabs=+6.3%, 50% vs 43.7%) proportionately more.
Focusing only on the subset of prounouns (I, me, myself and my), Trump used +88.3% (241 vs 128) more of these words than Biden. Proportionately his use of "me" was Δrel=+85.1% (Δabs=+8.0%, 17.4% vs 9.4%) proportionately higher but his use of "my" was lower.
Pronouns by Category
This table tallies the use of pronoun by category. The categories are personal, demonstrative, indefinite, object, possessive, interrogative, others, relative, reflexive. Note that some pronouns that belong to multiple categories are counted in only one. For a list of pronouns for each category, see the pronoun methods section.
Table 15
Pronouns by cateogry
Count of pronouns by category.
Most of the pronouns used by the speakers were of the personal class (he, i, it, she, they, we, you). Proportionately, Trump used Δrel=+24.1% (Δabs=+11.6%, 59.8% vs 48.2%) pronouns from this class.
The next two common categories were demonstrative (that, these, this, those) and indefinite (all, another, any, anybody, anything, both, either, everybody, everything, few, many, most, no, nobody, none, nothing, one, other, others, some, somebody, something). Biden had Δrel=+60.4% (Δabs=+6.4%, 17% vs 10.6%) proportionately more demonstrative pronouns.
Biden also use Δrel=+53.1% (Δabs=+2.6%, 7.5% vs 4.9%) proportionately more interrogative pronouns (what, whatever, which, who).
Noun Phrase Usage
Noun phrases were extracted from the text and analyzed for
frequency, word count, unique word count and richness. Single-word phrases were not counted.
Top-level noun phrases are those without a parent noun phrase (a
parent phrase is one that a similar, longer phrase). Derived noun
phrases are those with a parent (more
details about noun phrase analysis).
The top-level noun phrases can be interpreted as independent
concepts. Derived noun phrases can be interpreted as variants on
concepts embodied by the top-level phrases.
Noun Phrase Count and length
This table reports the absolute number of noun phrases, which is
related to the number of nouns, and their length.
Table 16a
noun phrase count
Counts of noun phrases in words and per noun.
Table 16b
noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
Trump delivered slightly more +9.3% (330 vs 302) noun phrases.
Exclusive and Shared Noun Phrase Count and length
Table 17a
exclusive and shared noun phrase count
Counts of exclusive and shared noun phrases in words and per noun.
Table 17b
exclusive and shared noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
Trump delivered slightly more +10.5% (294 vs 266) noun phrases that were exclusive to him. For example, phrases like "big pharma", "big problem" and "big stuff". Meanwhile, Biden used phrases like "better shape", "american people" and "foreign policy".
Windbag Index
The Windbag Index is a compound measure that characterizes the complexity of speech. A low index is indicative of succinct speech with low degree of repetition and large number of independent concepts.
Unlike the Flesch-Kincaid readability metrics, the Windbag Index does not take into account the length of sentences or complexity (e.g. number of syllables) of individual words.
Table 18
windbag index
Windbag Index for each speaker. The higher the value, the more repetitive the speech.
The Windbag Index is a fun metric. There isn't a lot of surprise here: Trump is +122.6% (1,200 vs 539) more of a windbag. This value becomes interesting when eventually compared across all three debates.
Word Clouds
In the word clouds below, the size of the word is proportional to
the number of times it was used by a candidate (method details).
Not all words from a group used to draw the cloud fit in the image
— less frequently used words for large word groups may fall
outside the image.
All Words for Each Candidate
Each candidate's debate portion was extracted and frequencies were
compiled for each part of speech (noun, verb, adjective, adverb), with
words colored by their part of speech category.
The distribution of sizes within a tag cloud follows the frequency
distribution of words. However, word size cannot be compared between
clouds, since the minimum and maximum size of the words is fixed.
Debate Word Cloud for Donald Trump - all words
Size proportional to word frequency. Color encodes part of speech: noun verb adjective adverb
Debate Word Cloud for Joe Biden - all words
Size proportional to word frequency. Color encodes part of speech: noun verb adjective adverb
Trump's main words were "good", "people", "want" and "know". Biden used "true" a lot.
Exclusive Words for Each Candidate
The clouds below show words used exlusively by a candidate. For
example, if candidate A used the word "invest" (any number of times),
but candidate B did not, then the word will appear in the exclusive word
tag cloud for candidate A.
Words exclusive to Donald Trump
Size proportional to word frequency. Color encodes part of speech: noun verb adjective adverb
Words exclusive to Joe Biden
Size proportional to word frequency. Color encodes part of speech: noun verb adjective adverb
Words that Trump used that Biden did not use include "closed", "individual", "joe" (as expected) and "phenomenal". Biden's set was centered on "america", "american", "together", "create" and "discredited".
Pronouns for Each Candidate
Word clouds based on only pronouns.
Pronouns for Donald Trump
Size proportional to word frequency. Color encodes pronoun type: masculine feminine neuter 1st person 2nd person singular plural other
Pronouns for Joe Biden
Size proportional to word frequency. Color encodes pronoun type: masculine feminine neuter 1st person 2nd person singular plural other
The pronoun clouds for both candidates look very similar. Biden's use of "we" was more frequent.
Part of Speech Word Clouds
In these clouds, words from each major part of speech were colored
based on whether they were exclusive to a candidate or shared by the
candidates.
The size of the word is relative to the frequency for the candidate
— word sizes between candidates should not be used to indicate
difference in absolute frequency.
Cloud of noun words, by speaker
Words unique to each candidate (Trump, Biden) and those spoken by both.
The unique noun cloud (nouns spoken by only one candidate but not the other) is nearly entirely blue (Biden's color). Essentially, Biden repeated nouns exclusive to him far more than Trump repeated his set. It's surprising that Trump never said "America", "affordable", "violence", "recession" and "outcome".
Cloud of verb words, by speaker
Words unique to each candidate (Trump, Biden) and those spoken by both.
The unique verb cloud is more mixed in color, indicating that the candidates used their unique verbs in a more balanced way. Trump's main unique verb was "closed" whereas Biden's was "create" and "discredited".
Cloud of adjective words, by speaker
Words unique to each candidate (Trump, Biden) and those spoken by both.
Biden said "American". Trump did not.
Cloud of adverb words, by speaker
Words unique to each candidate (Trump, Biden) and those spoken by both.
Biden said "together" whereas Trump used "ahead".
Cloud of all words, by speaker
Words unique to each candidate (Trump, Biden) and those spoken by both.
The center of this cloud is nearly all blue except for "Joe", used by Trump. Around the center there is a ring of red words — these are only by Trump, such as "predators", "couple", "drugs", "water" and "supporters".
Word Pair Clouds for Each Candidate
Pairs used only once during the debate are not shown.
word pairs for Donald Trump
▲
JJ/JJ by Donald Trump
▲
JJ/RB by Donald Trump
▲
JJ/N by Donald Trump
▲
JJ/V by Donald Trump
▲
RB/RB by Donald Trump
▲
RB/N by Donald Trump
▲
RB/V by Donald Trump
▲
N/N by Donald Trump
▲
N/V by Donald Trump
▲
V/V by Donald Trump
word pairs for Joe Biden
▲
JJ/JJ by Joe Biden
▲
JJ/RB by Joe Biden
▲
JJ/N by Joe Biden
▲
JJ/V by Joe Biden
▲
RB/RB by Joe Biden
▲
RB/N by Joe Biden
▲
RB/V by Joe Biden
▲
N/N by Joe Biden
▲
N/V by Joe Biden
▲
V/V by Joe Biden
There are naturally more of certain pairings (noun noun) than others (verb verb). Here you can see when the tagger made a mistake, for example classifying "left" as a verb in Trump's "radical left", which shows up in his adjective verb pair.
Downloads
Debate transcript
Parsed word lists and word clouds (word lists, part of speech lists, noun phrases, sentences) (word clouds)
Raw data structure
Please see the methods section for details about these files.