Word Analysis of 2016 U.S. Presidential Debates
Hillary Clinton vs Donald Trump (1st debate)
26 September 2016
Word Statistics
Debate Word Count
Summary Word Count
The summary word count reports the total number of words and the
number of unique, non-stop words
used by each candidate. Word number is expressed as both absolute and
relative values.
Table 1a
all words
Number of all words and unique words used by each speaker.
Table 1b
exclusive and shared words
Words exclusive to speaker (e.g. speaker A
but not speaker B) and shared by speakers (speaker A
and B).
Trump dominated the debate in terms of time and total word count. 56% of the debate's words were delivered by Trump.
His unique word count, however, was lower than Clinton's both in absolute and relative terms. He delivered 1,176 unique words (14.8% of his total word count) to Clinton's 1,328 (21.3% of her total word count).
Clinton used more words that Trump did not. She delivered a total of 1,048 words that her opponent did not say, 70.5% of these being unique. Trump said 1,130 words that Clinton did not use and only 51.9% of these were unique. Using the number of these words as a measure, Clinton distinguished herself better from her opponent.
Stop Word Contribution
In the table below, the candidates' delivery is partitioned into stop and non-stop words. Stop words (full list) are frequently-used bridging words (e.g. pronouns and conjunctions) whose meaning depends entirely on context. The fraction of words that are stop words is one measure of the complexity of speech.
Table 2a
non-stop words
Counts of stop and non-stop words.
Table 2b
exclusive and shared non-stop words
Non-stop words exclusive to speaker (e.g. speaker A
but not speaker B) and shared by speakers (speaker A
and B).
Stop words are words like "and", "of" and "in" (full list). The fraction of stop words for both Clinton and Trump was similar—about 56-59%.
Clinton delivered more non-stop content. 43.5% of her words were non-stop, compared to Trump's 41.2%. Furthermore, unique words made up 44% of these, as opposed to Trump's 31.6%.
Word frequency
The word frequency table summarizes the frequency with which words
were used. I show the average word frequency and the weighted
cumulative frequencies at 50 and 90 percentile. The average word
frequency indicates how many times, on average, a word is used. For a
given fraction of the entire delivery, the weighted cumulative
frequency indicates the largest word frequency within this fraction
(details about weighted
cumulative distribution).
Table 3a
word use frequency
Average and 50%/90% percentile word frequencies.
Table 3b
exclusive and shared non-stop word use frequency
Average and 50%/90% cumulative percentile word frequencies. Non-stop words exclusive to speaker (e.g. speaker A
but not speaker B) and shared by speakers (speaker A
and B).
This is an interesting table and hints at how repetitive a debater's delivery is. If we exclude stop words, then Clinton repeated a word on average 2.3 times. Trump on the other hand, repeated a word on average 3.2 times.
When we look at the content exclusive to each candidate, Clinton repeated words that Trump did not use only 1.4 times. Trump, on the other hand, repeated the words that Clinton did not use about 2 times.
Sentence Size
Table 4
sentence size
Number of sentences spoken by each speaker and sentence word count statistics. Number of words in a sentence is shown by average and 50%/90% cumulative values for all, stop and non-stop words.
Clinton's average sentence length was about 15 words. About 6.5 of those words were non-stop words. She delivered 429 sentences.
Trump delivered +65.0% (708 vs 429) more sentences and their non-stop content was –27.7% (4.7 vs 6.5) shorter than Clinton's. This was apparent during the debate—Trump would start a sentence without finishing the previous one.
Here the analysis heavily depends on the punctuation in the transcript.
All further word use statistics represent content that has been filtered for stop words, unless explicitly indicated.
Part of Speech Analysis
In this section, word frequency is broken down by their part of
speech (POS). The four POS groups examined are nouns, verbs,
adjectives and adverbs. Conjunctions and prepositions are not
considered. The first category (n+v+adj+adv) is composed of all four
POS groups.
Part of Speech Count
Table 5
part of speech count
Count of words categorized by part of speech (POS).
Trump delivered –13.1% (986 vs 1,135) fewer total unique nouns, verbs, adjectives and adverbs. Both candidates had a similar relative ratio of each part of speech, about a 44:30:20:6 ratio.
However, Clinton delivered +10.3% (558 vs 506) more unique nouns and +22.3% (389 vs 318) more unique verbs. This suggests that her content had more ideas and more actions.
Trump slightly edged Clinton by +6.2% (275 vs 259) on the use of adjectives.
Part of Speech Frequency
Table 5
part of speech frequency
Frequency of words categorized by part of speech (POS).
We've already seen that Trump repeated his words more. This table looks at how this repetition breaks down by part of speech.
For example, Clinton repeated her nouns –24.0% (2 vs 2.63) less and her verbs –25.6% (2.01 vs 2.7) less than Trump.
Trump had a very high repetition rate of adverbs, +56.3% (3.94 vs 2.52) higher than Clinton.
Part of Speech Pairing
Through word pairing, I extract concepts from the text. The number of unique word pairs is a function of sentence length and is one of the measures of complexity.
Table 6a
part of speech pairing — Hillary Clinton
Word pairs (total and unique) categorized by part of speech (POS)
Table 6b
part of speech pairing — Donald Trump
Word pairs (total and unique) categorized by part of speech (POS)
Table 6c
unique part of speech pairing — candidate comparison
Unique word pairs categorized by part of speech (POS)
Clinton had +21.1% (2,685 vs 2,218) more unique noun-verb pairings than Trump and +54.7% (857 vs 554) more unique verb-verb pairings. This suggests that she spoke more in terms of action concepts and compound actions.
Exclusive and Shared Usage
This section enumerates words that were exclusive to a
candidate (e.g. used by one candidate but not the other). This content
provides insight into what the candidates' priorities are and
reveals differences in perspective on similar topics.
For a given part of speech, the table breaks down the number of
words that were spoken by only one of the candidates or both
candidates (intersection). The last row includes words spoken by
either candidate (union).
Table 7
exclusive word usage
Total and unique words used exclusively by a candidate, or by both.
The words that Clinton used that Trump did not use were more diverse in every part of speech.
For example, she used +23.0% (348 vs 283) more nouns that Trump did not use, +41.8% (224 vs 158) more verbs, +5.4% (157 vs 149) more adjectives and +42.9% (30 vs 21) more verbs.
Noun Phrase Usage
Noun phrases were extracted from the text and analyzed for
frequency, word count, unique word count and richness. Single-word phrases were not counted.
Top-level noun phrases are those without a parent noun phrase (a
parent phrase is one that a similar, longer phrase). Derived noun
phrases are those with a parent (more
details about noun phrase analysis).
The top-level noun phrases can be interpreted as independent
concepts. Derived noun phrases can be interpreted as variants on
concepts embodied by the top-level phrases.
Noun Phrase Count and length
This table reports the absolute number of noun phrases, which is
related to the number of nouns, and their length.
Table 8a
noun phrase count
Counts of noun phrases in words and per noun.
Table 8b
noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
The total number of concepts, as measured by noun phrases was similar between both candidates, with Clinton having slightly more (+5.8% (237 vs 224)). The length of the noun phrases was very similar between the candidates.
Exclusive and Shared Noun Phrase Count and length
Table 9a
exclusive and shared noun phrase count
Counts of exclusive and shared noun phrases in words and per noun.
Table 9b
exclusive and shared noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
The number of exclusive noun phrases to each candidate was also similar, about 220.
Interestingly, the candidates only referenced 17 concepts that the other mentioned as well. Among these were Barack Obama, biggest tax cuts, fair share, first place, good deal, wealthy people, white house and worst things.
Windbag Index
The Windbag Index is a compound measure that characterizes the complexity of speech. A low index is indicative of succinct speech with low degree of repetition and large number of independent concepts.
Table 10
windbag index
Windbag Index for each speaker. The higher the value, the more repetitive the speech.
Trump's Windbag Index is off the chart, +490.0% (1,003 vs 170) larger than Clinton's. This is a dubious accomplishment and is almost twice as high as I've seen in all previous debates. In fact, the highest I've seen previously was 754, delivered by Romney in his 2nd debate vs Obama.
Clinton's Index is surprisingly low—perhaps even too low?
Word Clouds
In the word clouds below, the size of the word is proportional to
the number of times it was used by a candidate (method details).
Not all words from a group used to draw the cloud fit in the image
— less frequently used words for large word groups may fall
outside the image.
All Words for Each Candidate
Each candidate's debate portion was extracted and frequencies were
compiled for each part of speech (noun, verb, adjective, adverb), with
words colored by their part of speech category.
The distribution of sizes within a tag cloud follows the frequency
distribution of words. However, word size cannot be compared between
clouds, since the minimum and maximum size of the words is fixed.
Debate Word Cloud for Hillary Clinton - all words
Debate Word Cloud for Donald Trump - all words
Word clouds are fun, even if they're very 1990's.
What did Clinton talk about? "Think", "people", "good" and "well" are all common words. As is "nuclear" and "American" and "important".
Trump repeated "country", "many", "just" and had a balance of "good" and "bad".
Exclusive Words for Each Candidate
The clouds below show words used exlusively by a candidate. For
example, if candidate A used the word "invest" (any number of times),
but candidate B did not, then the word will appear in the exclusive word
tag cloud for candidate A.
Words exclusive to Hillary Clinton
Words exclusive to Donald Trump
These clouds are fun because they show what the other candidate didn't say.
Shockingly, Trump never said "American" (only in the context of "African-American", which Clinton said many times too). He also didn't say "justice", "working", "everyone" and "information".
Trump, on the other hand, said "politicians", "agree" (surprise here), "tremendous" (no surprise here) and "totally".
Part of Speech Word Clouds
In these clouds, words from each major part of speech were colored
based on whether they were exclusive to a candidate or shared by the
candidates.
The size of the word is relative to the frequency for the candidate
— word sizes between candidates should not be used to indicate
difference in absolute frequency.
Cloud of noun words, by speaker
If we look at nouns unique to each speaker, Clinton's "justice", "information" and "everyone" stand out. Trump focused on "politicians", "nobody" and "chicago".
Cloud of verb words, by speaker
If we look at verbs unique to each speaker, Clinton's "working", "provide" and "invest" counters Trump's "leaving", "agree" and "losing".
Cloud of adjective words, by speaker
I still can't believe that Trump never said "American" or "foreign". Clinton didn't say "tremendous" or "greatest" (thankfully).
Cloud of adverb words, by speaker
Clinton had "deeply" and "finally" and "basically" while Trump said "totally", "soon" and "extremely". There's a sentence right there for you: "totally extremely soon".
Cloud of all words, by speaker
The tag clouds for each part of speech above are combined in this cloud.
Clinton has "working", "american" and "information" and Trump has "tremendous", "countries" "leaving".
Word Pair Clouds for Each Candidate
word pairs for Hillary Clinton
▲
adjective/adjective by Hillary Clinton
▲
adjective/adverb by Hillary Clinton
▲
adjective/noun by Hillary Clinton
▲
adjective/verb by Hillary Clinton
▲
adverb/adverb by Hillary Clinton
▲
adverb/noun by Hillary Clinton
▲
adverb/verb by Hillary Clinton
▲
noun/noun by Hillary Clinton
▲
noun/verb by Hillary Clinton
▲
verb/verb by Hillary Clinton
word pairs for Donald Trump
▲
adjective/adjective by Donald Trump
▲
adjective/adverb by Donald Trump
▲
adjective/noun by Donald Trump
▲
adjective/verb by Donald Trump
▲
adverb/adverb by Donald Trump
▲
adverb/noun by Donald Trump
▲
adverb/verb by Donald Trump
▲
noun/noun by Donald Trump
▲
noun/verb by Donald Trump
▲
verb/verb by Donald Trump
These are all the unique part of speech pairs for each candidate. Keep in mind that some words may be misclassified by the tagger, if their role in the sentence is highly contextual.
Pairs unique to Clinton were "jobs rising", "need communities" and "new jobs".
Trump had "bring monye", "bring back", "billions bring" and "inner cities".
Downloads
Debate transcript
Parsed word lists and word clouds (word lists, part of speech lists, noun phrases, sentences) (word clouds)
Raw data structure
Please see the methods section for details about these files.