Word Analysis of 2016 U.S. Presidential Debates
Hillary Clinton vs Donald Trump (combined debates)
Word Statistics
Debate Word Count
Summary Word Count
The summary word count reports the total number of words and the
number of unique, non-stop words
used by each candidate. Word number is expressed as both absolute and
relative values.
Table 1a
all words
Number of all words and unique words used by each speaker.
Table 1b
exclusive and shared words
Words exclusive to speaker (e.g. speaker A
but not speaker B) and shared by speakers (speaker A
and B).
The combined debate represents all the words delivered by both candidates across all three debates. The total volume of the words across all debates is roughly 3 times that of the average of the debates, as expected.
The fraction of unique words is about half of each debate. For example, of Clinton's 18,874 words across all three debates, 12.7% were unique. This is –40.4% (12.7 vs 21.3) lower than in the first debate. For Trump, 9.2% words were unique, which is –37.8% (9.2 vs 14.8) lower than in his first debate.
The number of words exlusive to a candidate across the three debates roughly doubled when compared to the first debate. Clinton delivered 2,227 exclusive words (words not spoken by Trump) across the three debates and Trump delivered 1,859 exclusive words. These represent a small fraction (89.9%) of the words used by both candidates, which totalled 36,295.
The number of exlusive words is mostly in keeping with the combined debate of Obama vs Romney. There the candidates shared 42,331 words and the fraction of exlusive words were 8.1% for Obama and 8.0% for Romney. What is interesting this year is that Clinton's fraction of exclusive words was 11.8%, which is +45.7% (11.8 vs 8.1) higher than for Obama. This speaks to the larger rift between the candidates approaches this year.
Stop Word Contribution
In the table below, the candidates' delivery is partitioned into stop and non-stop words. Stop words (full list) are frequently-used bridging words (e.g. pronouns and conjunctions) whose meaning depends entirely on context. The fraction of words that are stop words is one measure of the complexity of speech.
Table 2a
non-stop words
Counts of stop and non-stop words.
Table 2b
exclusive and shared non-stop words
Non-stop words exclusive to speaker (e.g. speaker A
but not speaker B) and shared by speakers (speaker A
and B).
Clinton delivered relatively more, +4.6% (43.4 vs 41.5), non-stop words than Trump. These proportions and their difference is very similar to that of the first debate.
Word frequency
The word frequency table summarizes the frequency with which words
were used. I show the average word frequency and the weighted
cumulative frequencies at 50 and 90 percentile. The average word
frequency indicates how many times, on average, a word is used. For a
given fraction of the entire delivery, the weighted cumulative
frequency indicates the largest word frequency within this fraction
(details about weighted
cumulative distribution).
Table 3a
word use frequency
Average and 50%/90% percentile word frequencies.
Table 3b
exclusive and shared non-stop word use frequency
Average and 50%/90% cumulative percentile word frequencies. Non-stop words exclusive to speaker (e.g. speaker A
but not speaker B) and shared by speakers (speaker A
and B).
These word frequency table highlight the extent to which Trump's tendency to repeat words.
On average, Trump repeated his words 10.9 times, which is +38.0% (10.9 vs 7.9) higher than Clinton. The frequency of non-stop words is what is interesting here, though. For this subset, Trump repeated his non-stop words 4.9 times on average, which is +36.1% (4.9 vs 3.6) higher than Clinton.
The top 5 repeated non-stop words by Clinton were:
well (89), will (89), Donald (93), think (107) and people (114). For Trump, these were
know (95), look (100), will (100), country (116) and people (133). Trump loves using "look", as part of the phrase "look here".
Trump also tended to repeat the words that were exlusive to him more by an average of 2.14 times, which is +25.0% (2.15 vs 1.72) higher than Clinton.
Clinton's most frequently used exlusive words where
information (11), stand (12), try (13), hope (14), clear (19) and families (19). Trump's most frequently used exlusive words were
endorsed (14), excuse (14), cities (18), inner (18), tremendous (29) and hillary (51).
What is fascinating here is that "donald" doesn't appear in Clinton's exclusive word list but "Hillary" appears in Trump's list. Why? That's because Trump said "Donald" 3 times across the debates but Clinton never said "Hillary". The three sentences in which he referred to himself were:
"But you will learn more about Donald Trump by going down to the federal elections, where I filed a 104-page essentially financial statement of sorts, the forms that they have."
"She complains that Donald Trump took advantage of the tax code."
"But you wouldn't change it, because all of these people gave you the money so you can take negative ads on Donald Trump."
Sentence Size
Table 4
sentence size
Number of sentences spoken by each speaker and sentence word count statistics. Number of words in a sentence is shown by average and 50%/90% cumulative values for all, stop and non-stop words.
Trump delivered 1,970 sentences across all the debates, which is +63.3% (1,970 vs 1,206) more than Clinton. Given that he only delivered +14.0% (21,507 vs 18,874) more words than Clinton, this suggests that his sentences were much shorter.
Indeed, Trump's sentences had an average of only 10.9 words, which is –30.6% (10.9 vs 15.7) shorter than Clinton. And if you consider only the non-stop words in a sentence, his had only 4.6, which is –34.3% (4.6 vs 7) lower than Clinton.
Trump's median sentence only had 6 non-stop words—Clinton's had 9.
Clinton's longest sentence was delivered in the second debate (town hall) had 39 non-stop words and was
That's why the slogan of my campaign is "Stronger Together," because I think if we work together, if we overcome the divisiveness that sometimes sets Americans against one another, and instead we make some big goals -- and I've set forth some big goals, getting the economy to work for everyone, not just those at the top, making sure that we have the best education system from preschool through college and making it affordable, and so much else.
Trump's longest sentence was also delivered in the second debate and had 45 non-stop words and was
I watch the deals being made, when I watch what's happening with some horrible things like Obamacare, where your health insurance and health care is going up by numbers that are astronomical, 68 percent, 59 percent, 71 percent, when I look at the Iran deal and how bad a deal it is for us, it's a one-sided transaction where we're giving back $150 billion to a terrorist state, really, the number one terror state, we've made them a strong country from really a very weak country just three years ago.
Still undecided?
All further word use statistics represent content that has been filtered for stop words, unless explicitly indicated.
Part of Speech Analysis
In this section, word frequency is broken down by their part of
speech (POS). The four POS groups examined are nouns, verbs,
adjectives and adverbs. Conjunctions and prepositions are not
considered. The first category (n+v+adj+adv) is composed of all four
POS groups.
Part of Speech Count
Table 5
part of speech count
Count of words categorized by part of speech (POS).
Both candidates used roughly the same fraction of nouns and verbs in their deliveries. About 45% of words were nouns and another 30% were verbs. Clinton delivered 799 unique verbs, which is +35.9% (799 vs 588) more than Trump. Given that verbs are action words, this is interesting to contrast against Trump's statements that Clinton's is "all words and no action". Her words speak much more to "action" than Trump's.
Trump used proportionately more adjectives and adverbs than Clinton. For example, 19.9% and 6.4% of Trump's words were adjectives and adverbs, respectively, which is (+16.4% (19.9 vs 17.1) and +14.3% (6.4 vs 5.6) higher than Clinton.
The total number of unique adjectives was very similar (543 for Clinton and 550 for Trump). Clinton edged Trump in unique adverbs. She used 97 and Trump used 83.
Part of Speech Frequency
Table 5
part of speech frequency
Frequency of words categorized by part of speech (POS).
This table tells you how each part of speech contributes to the overall repetition in the candidates' delivery.
We already know that Trump repeats himself more than Trump. But for which parts of speech is this more pronounced (pun intended).
Trump repeats himself +19.0% (3.82 vs 3.21) more than Clinton for nouns, +38.8% (4.04 vs 2.91) for verbs, +22.4% (2.95 vs 2.41) for adjectives and +43.2% (6.3 vs 4.4) for adverbs.
Clinton's most frequently used noun, verb, adjective and adverb were
people (105), think (107), good (27) and just (44).
Trump's most frequently used noun, verb, adjective and adverb were
people (113), will (98), great (51) and just (88).
A huge difference in style and substance can be seen by looking at the parts of speech that are most commonly used among the words exlusive to a candidate.
Clinton's most frequently used exclusive (those not said by Trump) noun, verb, adjective and adverb were
families (14), try (13), clear (17) and forth (5).
Trump's most frequently used exclusive (those not said by Clinton) noun, verb, adjective and adverb were
hillary (51), endorsed (14), tremendous (28) and totally (10).
Again, still "totally" undecided?
Part of Speech Pairing
Through word pairing, I extract concepts from the text. The number of unique word pairs is a function of sentence length and is one of the measures of complexity.
Table 6a
part of speech pairing — Hillary Clinton
Word pairs (total and unique) categorized by part of speech (POS)
Table 6b
part of speech pairing — Donald Trump
Word pairs (total and unique) categorized by part of speech (POS)
Table 6c
unique part of speech pairing — candidate comparison
Unique word pairs categorized by part of speech (POS)
Because Clinton's sentences were longer, her numbers for word pairings are larger.
She delivered 9,165 unique verb/noun combinations, which is +59.7% (9,165 vs 5,738) more than Trump. But the pairing for which she had the largest difference from Trump was the verb/verb pairing. She delivered 2,557 unique verb/verb pairs, which was +69.8% (2,557 vs 1,506) more than Trump.
Exclusive and Shared Usage
This section enumerates words that were exclusive to a
candidate (e.g. used by one candidate but not the other). This content
provides insight into what the candidates' priorities are and
reveals differences in perspective on similar topics.
For a given part of speech, the table breaks down the number of
words that were spoken by only one of the candidates or both
candidates (intersection). The last row includes words spoken by
either candidate (union).
Table 7
exclusive word usage
Total and unique words used exclusively by a candidate, or by both.
This is a fun table. It breaks down the part of speech categorization for the words exclusive to a candidate. In other words, nouns, verbs, adjectives and adverbs spoken by Clinton and not Trump, and so on.
Clinton used 611 nouns that Trump didn't use, 437 verbs, 284 adjectives and 36 adverbs. In contrast, Trump's numbers for his exclusive parts of speech were uniformly lower across all categories: –28.8% (435 vs 611) lower for nouns, –43.9% (245 vs 437) lower for verbs, –16.9% (236 vs 284) lower for adjectives and –33.3% (24 vs 36) lower for adverbs.
The greatest difference here was for verbs. This is ironic, since it is Trump's assertion that Clinton lacks initative and action.
Noun Phrase Usage
Noun phrases were extracted from the text and analyzed for
frequency, word count, unique word count and richness. Single-word phrases were not counted.
Top-level noun phrases are those without a parent noun phrase (a
parent phrase is one that a similar, longer phrase). Derived noun
phrases are those with a parent (more
details about noun phrase analysis).
The top-level noun phrases can be interpreted as independent
concepts. Derived noun phrases can be interpreted as variants on
concepts embodied by the top-level phrases.
Noun Phrase Count and length
This table reports the absolute number of noun phrases, which is
related to the number of nouns, and their length.
Table 8a
noun phrase count
Counts of noun phrases in words and per noun.
Table 8b
noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
Noun phrases can be used to identify concepts longer than a word. In total, both candidates delivered roughly the same number of noun phrases, which were of similar length.
Exclusive and Shared Noun Phrase Count and length
Table 9a
exclusive and shared noun phrase count
Counts of exclusive and shared noun phrases in words and per noun.
Table 9b
exclusive and shared noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
Clinton's longest three noun phrases were
military civilian intelligence professionals
mexican immigrants rapists criminals drug dealers
dishwashers painters architects glass installers marble installers drapery installers
For Trump, these were
great general four-star general today
104-page essentially financial statement
new roads new tunnels new bridges new airports new schools new hospitals
Windbag Index
The Windbag Index is a compound measure that characterizes the complexity of speech. A low index is indicative of succinct speech with low degree of repetition and large number of independent concepts.
Table 10
windbag index
Windbag Index for each speaker. The higher the value, the more repetitive the speech.
Trump's index is insanely high, +391.8% (10,461 vs 2,127) higher than Clinton.
For perspective, the Windbag Index the combined debates of Obama vs Romney shows Obama with 3,844 and Romney with 5,170.
It's important to realize that the index is going to be a function of the total number of words delivered. However, comparisons between numbers can be made if the length and format of the delivery offered to each candidate is the same.
Word Clouds
In the word clouds below, the size of the word is proportional to
the number of times it was used by a candidate (method details).
Not all words from a group used to draw the cloud fit in the image
— less frequently used words for large word groups may fall
outside the image.
All Words for Each Candidate
Each candidate's debate portion was extracted and frequencies were
compiled for each part of speech (noun, verb, adjective, adverb), with
words colored by their part of speech category.
The distribution of sizes within a tag cloud follows the frequency
distribution of words. However, word size cannot be compared between
clouds, since the minimum and maximum size of the words is fixed.
Debate Word Cloud for Hillary Clinton - all words
Debate Word Cloud for Donald Trump - all words
Clinton's word cloud has a larger proportion of larger text because she repeats herself less, as we've seen from the tables above.
It's interesting to see Clinton's use of "good" compared to Trump's use of "great". Both these words are the center of their cloud and although both are positive, they have quite a different feel to them.
Trump's use of "great" is quite vernacular. It's an emotional word that is much stronger and persuasive word than "good". Do you want a "good" meal or a "great" meal? The word is part of Trump's "Make America Great Again" slogan and is more of a sales pitch than a statement of quality.
Exclusive Words for Each Candidate
The clouds below show words used exlusively by a candidate. For
example, if candidate A used the word "invest" (any number of times),
but candidate B did not, then the word will appear in the exclusive word
tag cloud for candidate A.
Words exclusive to Hillary Clinton
Words exclusive to Donald Trump
The exlusive word clouds are really fun. Trump's use of "tremendous", "endorsed", "cities" and "totally" occupies center stage. Clinton's exlusive words are more what you expect from a traditional political discourse in which issues are discussed dispassionately: "families", "clear", "forth" and "hope".
Note that Trump's use of "endorsed" is always for the purpose of elevating his authority. He's using the word to tell us how great he is and, for this reason, that we should agree with him because, after all, all these other people have agreed with him. His mechanism of persuation is building the momentum of a mob.
Part of Speech Word Clouds
In these clouds, words from each major part of speech were colored
based on whether they were exclusive to a candidate or shared by the
candidates.
The size of the word is relative to the frequency for the candidate
— word sizes between candidates should not be used to indicate
difference in absolute frequency.
Cloud of noun words, by speaker
If we relate the use of words back to the candidates' slogans, Clinton's use of "families" is in keeping with her "Stronger Together" slogan.
Trump's attempt to terrify the electorate with "Chicago" and "cities", all mentioned in the context of violence and guns. These words suggest that violence is one of the reasons why he doesn't think America is "great" right now is because of the violence.
Cloud of verb words, by speaker
The verb cloud is a little scary because Trump's primary exclusive verb is the self-aggrandizing "endorsed". Clinton wants us to "try", "hope" and "invest".
Cloud of adjective words, by speaker
Tremendous. Just tremendous.
But, I think I'll go with "clear".
Cloud of adverb words, by speaker
America will "essentially" be "totally" great again "soon".
"horribly" said.
Cloud of all words, by speaker
When we combine the exclusive words for each candidate across all parts of speech, Clinton's (blue) words dominate, except for Trump's "tremendous".
Word Pair Clouds for Each Candidate
word pairs for Hillary Clinton
▲
adjective/adjective by Hillary Clinton
▲
adjective/adverb by Hillary Clinton
▲
adjective/noun by Hillary Clinton
▲
adjective/verb by Hillary Clinton
▲
adverb/adverb by Hillary Clinton
▲
adverb/noun by Hillary Clinton
▲
adverb/verb by Hillary Clinton
▲
noun/noun by Hillary Clinton
▲
noun/verb by Hillary Clinton
▲
verb/verb by Hillary Clinton
word pairs for Donald Trump
▲
adjective/adjective by Donald Trump
▲
adjective/adverb by Donald Trump
▲
adjective/noun by Donald Trump
▲
adjective/verb by Donald Trump
▲
adverb/adverb by Donald Trump
▲
adverb/noun by Donald Trump
▲
adverb/verb by Donald Trump
▲
noun/noun by Donald Trump
▲
noun/verb by Donald Trump
▲
verb/verb by Donald Trump
Trump insists that we "take look" and "look just". His adjective/noun combinations like "inner cities", "bad people" and "law order" pander to fears.
And so it's interesting that Clinton, whose job policy is criticized by Trump, should be the one who has "new jobs" as the top adjective/noun pair.
Downloads
Debate transcript
Parsed word lists and word clouds (word lists, part of speech lists, noun phrases, sentences) (word clouds)
Raw data structure
Please see the methods section for details about these files.