Word Analysis of 2012 U.S. Presidential Debates
Barack Obama vs Mitt Romney (1st debate)
3 October 2012
Introduction
This debate has been generally seen as a victory for Romney, who eclipsed Obama's with his energy and charisma. The analysis here does not attempt to capture the tone and energy level of the speaker — to do so in an automated way would be extraordinarily difficult from the transcript alone.
The analysis describes in detail the structure of each candidate's speech, words and phrases that they used (exclusively or shared), and the degree to which they repeated themselves.
Word Statistics
Debate Word Count
Summary Word Count
The summary word count reports the total number of words and the
number of unique, non-stop words
used by each candidate. Word number is expressed as both absolute and
relative values.
Table 1a
all words
Number of all words and unique words used by each speaker.
Table 1b
exclusive and shared words
Words exclusive to speaker (e.g. speaker A
but not speaker B) and shared by speakers (speaker A
and B).
Table 1
commentary
Obama' delivery slower, but has more unique and exclusive words. Democrats more articulate than in 2008, Republicans less.
Obama spoke for longer (42:50 min = 2,570 seconds) than Romney (38:32 min = 2,312 seconds) (debate timing by CNN). Romney used +7.0% (7,791 vs 7,280) more words than Obama — he spoke faster with a rate of 3.37 words/second, +19.1% (3.37 vs 2.83) faster than Obama's 2.83 words/second. Despite the fact that Obama delivered -6.6% (7,280 vs 7,791) fewer words, he used +3.4% (1,255 vs 1,214) more unique words. Romney either comes across as rushed, repetitive and aggressive or Obama as slow, strained and laborious.
In the first debate of 2008, Obama delivered +6.9% (7,529 vs 7,043) more words than McCain. This year, we find Obama speaking -3.4% (7,270 vs 7,529) fewer words than in 2008, while the Republican candidate speaking +10.6% (7,791 vs 7,043) more. In the 2008 debate, the number of unique words was nearly identical for both candidates (1,376 for Obama and 1,380 for McCain). This time we see Obama use +3.4% (1,255 vs 1,214) more unique words than Romney. Obama appears more articulate and measured, having more unique words and slower speech.
Obama's unique word fraction increased from 2008, from 16.5% to 17.2%. The Republican candidate's fraction dropped, from McCain's 17.6% to Romney's 15.6%. The Democrats have become more articulate, the Republicans less.
Of all the words used in the debate, 13,205 were spoken by both candidates. Of Obama's 7,280 words, 934 (12.8%) were exclusive to him (i.e. spoken by Obama, but not Romney), higher than Romney's 12.0%. Within these exlusive words, Obama had a higher fraction of unique ones (65.4% vs 61.2%). Obama tended to repeat exclusive words less than Romney.
Stop Word Contribution
In the table below, the candidates' delivery is partitioned into stop and non-stop words. Stop words (full list) are frequently-used bridging words (e.g. pronouns and conjunctions) whose meaning depends entirely on context. The fraction of words that are stop words is one measure of the complexity of speech.
Table 2a
non-stop words
Counts of stop and non-stop words.
Table 2b
exclusive and shared non-stop words
Non-stop words exclusive to speaker (e.g. speaker A
but not speaker B) and shared by speakers (speaker A
and B).
Table 2
commentary
Obama's speech more varied. Less than half of a candidate's unique words are spoken by the other.
Obama's relative stop word count is slightly higher than Romney's. 56.2% of Obama's words are stop, compared to 55.6% of Romney's. When we look at unique non-stop words, Obama's fraction is significantly higher, 34.8% vs 30.9%, indicating that his speech is more varied.
When exclusive non-stop words are examined, we see there were more unique words that were exclusive than shared. Obama had +6.8% (597 vs 559) more unique exclusive words. The two candidates shared 511 unique words.
Stop word frequency counts do not reveal significant difference between the two
candidates, although much can be gleaned from which stop words were actually used. For example, pronoun profiles speak volumes.
All further word use statistics represent content that has been filtered for stop words.
Word frequency
The word frequency table summarizes the frequency with which words
were used. I show the average word frequency and the weighted
cumulative frequencies at 50 and 90 percentile. The average word
frequency indicates how many times, on average, a word is used. For a
given fraction of the entire delivery, the weighted cumulative
frequency indicates the largest word frequency within this fraction
(details about weighted
cumulative distribution).
Table 3a
word use frequency
Average and 50%/90% percentile word frequencies.
Table 3b
exclusive and shared non-stop word use frequency
Average and 50%/90% cumulative percentile word frequencies. Non-stop words exclusive to speaker (e.g. speaker A
but not speaker B) and shared by speakers (speaker A
and B).
Table 3
commentary
Obama's repeats words less and suggests we can "now" "make" things with what we "know". Romney acts as saviour with "one" and "plan" for the "people".
We've already seen that Obama delivered proportionately more unique words (17.2%) than Romney (15.6%). This is reflected in his -9.4% (5.8 vs 6.4) (all words) and -9.4% (2.9 vs 3.2) (non-stop words) lower average word frequency. Obama also repeated stop words less often, -7.3% (27.9 vs 30.1). Finally, if we look at exclusive words, these are repeated -6.2% (1.52 vs 1.62) less by Obama as well.
The top 5 non-stop words (and frequencies) spoken by Obama were
know (28), now (37), make (44), romney (44) and governor (48). Romney used
plan (35), one (37), tax (39), president (41) and people (73).
Looking only at exclusive words, the top 5 non-stop words for Obama were
investments (7), difference (8), loopholes (9), folks (11) and opportunity (11) and for Romney,
retirees (6), current (7), half (7), mr (7) and middle-income (8).
For those words used by both candidates, the top 5 were
plan (54), now (56), tax (60), make (63) and people (100). Three of these appear in Romney's top 5 list (plan, tax and people) but only one in Obama's (now).
Sentence Size
Table 4
sentence size
Number of sentences spoken by each speaker and sentence word count statistics. Number of words in a sentence is shown by average and 50%/90% cumulative values for all, stop and non-stop words.
Table 4
commentary
Obama's delivery more complex, oratorial.
The number and size of sentences suggests that Obama's
delivery is gramatically more complex than Romney. Obama delivered -32.5% (391 vs 579) fewer sentences than Romney but only -6.6% (7,280 vs 7,791) fewer words. He packed more into his
sentences.
Obama's sentences were +37.8% (18.6 vs 13.5) longer
when all words are considered and +40.0% (8.4 vs 6) longer
when only non-stop words are counted. This is quite a large
difference, especially given that Obama spoke for longer and his
unique word usage is higher than Romney.
Both candidates delivered some whopping sentences. Obama's largest sentence is quite oratorial, with 112 words (536 characters).
Obama, 112 words — And everything that I've tried to do and everything that I'm now proposing for the next four years in terms of improving our education system, or developing American energy, or making sure that we're closing loopholes for companies that are shipping jobs overseas and focusing on small businesses and companies that are creating jobs here in the United States, or - or closing our deficit in a responsible, balanced way that allows us to invest in our future - all those things are designed to make sure that the American people, their genius, their grit, their determination is - is channeled, and - and - and they have an opportunity to succeed.
Romney's longest sentence is stammering gibberish of 71 words and 318 characters.
Romney, 71 words — Look, the right course for - for America's government - we were talking about the role of government - is not to become the economic player picking winners and losers, telling people what kind of health treatment they can receive, taking over the health care system that - that has existed in this country for - for a long, long time and has produced the best health records in the world.
Part of Speech Analysis
In this section, word frequency is broken down by their part of
speech (POS). The four POS groups examined are nouns, verbs,
adjectives and adverbs. Conjunctions and prepositions are not
considered. The first category (n+v+adj+adv) is composed of all four
POS groups.
Part of Speech Count
Table 5
part of speech count
Count of words categorized by part of speech (POS).
Table 5
commentary
Obama's use of verbs and adverbs higher. Romney prefers nouns and adjectives.
The relative total of nouns, verbs, adjectives and adverbs in Obama's and Romney's delivery was roughly the same, with Obama using slightly more Δrel=+1.5% (Δabs=+0.6%, 40.5% vs 39.9%). Within this set of words, Obama had a +3.5% (1,061 vs 1,025) more unique words.
Romney used Δrel=+3.5% (Δabs=+1.7%, 50.8% vs 49.1%) more nouns and Δrel=+6.4% (Δabs=+1.1%, 18.4% vs 17.3%) more adjectives, but Obama used Δrel=+8.8% (Δabs=+2.3%, 28.4% vs 26.1%) more verbs and Δrel=+10.6% (Δabs=+0.5%, 5.2% vs 4.7%) more adverbs.
Across all parts of speech, Obama's use of unique words is 3-8% richer than Romney's. Relative to Romney, Obama's use of unique nouns, verbs, adjectives and adverbs was Δrel=+7.8% (Δabs=+2.7%, 37.5% vs 34.8%), Δrel=+4.7% (Δabs=+1.9%, 42.5% vs 40.6%), Δrel=+3.4% (Δabs=+1.6%, 49.3% vs 47.7%), and Δrel=+7.8% (Δabs=+2.9%, 40.3% vs 37.4%).
Part of Speech Frequency
Table 5
part of speech frequency
Frequency of words categorized by part of speech (POS).
Table 5
commentary
Obama repeats his top verbs more than Romney:
know,
think and
make.
Obama is consistently less repetitive with his words than Romney. For nouns, verbs, adjectives and adverbs his average word frequency is -7.0% (2.67 vs 2.87), -4.5% (2.35 vs 2.46), -3.3% (2.03 vs 2.1), and -7.1% (2.48 vs 2.67), less, respectively. He particularly repeats nouns and adverbs less than Romney.
For verbs the 90% cumulative distribution value is higher than Obama, indicating that his frequency distribution is skewed and that his most repeated verbs were repeated more often. Obama's top 3 verbs were
know (24), think (26) and make (39). Romney's were
make (18), want (26) and will (31).
Romney hammers away at his nouns. His top 4 by frequency were
medicare (31), tax (37), president (39) and people (64). Obama's were
people (21), tax (21), romney (44) and governor (48). It is interesting to see that Romney refers to people 64 times but to Obama 39 times. Obama mentions people 21 times, prefering the use of folks, which he mentions 10 times.
Part of Speech Pairing
Through word pairing, I extract concepts from the text. The number of unique word pairs is a function of sentence length and is one of the measures of complexity.
Table 6a
part of speech pairing — Barack Obama
Word pairs (total and unique) categorized by part of speech (POS)
Table 6b
part of speech pairing — Mitt Romney
Word pairs (total and unique) categorized by part of speech (POS)
Table 6c
unique part of speech pairing — candidate comparison
Unique word pairs categorized by part of speech (POS)
Table 6
commentary
Obama repeats his top verbs more than Romney:
know,
think and
make.
Because Obama's sentences were +40.0% (8.4 vs 6) longer than Romney's, he had consistenly more total and unique pairings.
Obama had nearly twice as many, +82.3% (1,318 vs 723), unique verb-verb pairings as Romney. Where the candidates were most similar was in the number of unique noun-noun and adjective-adjective pairings. It is interesting to see the contrast between noun-noun and verb-verb statistics — Obama's sentences had relatively more verbs in them, suggesting that he spoke in terms of compound actions.
Exclusive and Shared Usage
This section enumerates words that were exclusive to a
candidate (e.g. used by one candidate but not the other). This content
provides insight into what the candidates' priorities are and
reveals differences in perspective on similar topics.
For a given part of speech, the table breaks down the number of
words that were spoken by only one of the candidates or both
candidates (intersection). The last row includes words spoken by
either candidate (union).
Table 7
exclusive word usage
Total and unique words used exclusively by a candidate, or by both.
Table 7
commentary
Candidates use many exclusive words, share fewer.
This table shows the number of words that were spoken exclusively by either candidate, and those that were shared. In total, 1,600 unique nouns, verbs, adjectives and adverbs were used between the two candidates. The shared fraction (486, 30.4%) was smallest, with Obama having the larger number of exclusive words (575, 35.9%) than Romney (539, 33.7%).
Of all unique nouns (845), Obama and Romney had more exclusive to them (266, 31.5%) and (276, 32.7) than they shared (248, 29.3%). Within this set of nouns, Romney had a slightly richer set with 276 unique nouns, compared to Obama's 266.
An even greater difference is found in the set of unique verbs (555), of which only 131 (23.6%) were shared. The largest fraction was used exclusively by Obama (203, 36.6%) with 168 (30.3%) by Romney. Obama's high fraction of exclusive verbs suggests his emphasis on various forms of action.
Like for nouns, Romney had the largest fraction of adjectives (29.9%).
Adverbs were the only part of speech that were more extensively shared (31, 36.0%), indicating a lower variety in this part of speech within the debate. Obama had a greater exlusive fraction (32.6%) than Romney (25.6%).
Noun Phrase Usage
Noun phrases were extracted from the text and analyzed for
frequency, word count, unique word count and richness. Single-word phrases were not counted.
Top-level noun phrases are those without a parent noun phrase (a
parent phrase is one that a similar, longer phrase). Derived noun
phrases are those with a parent (more
details about noun phrase analysis).
The top-level noun phrases can be interpreted as independent
concepts. Derived noun phrases can be interpreted as variants on
concepts embodied by the top-level phrases.
Noun Phrase Count and length
This table reports the absolute number of noun phrases, which is
related to the number of nouns, and their length.
Table 8a
noun phrase count
Counts of noun phrases in words and per noun.
Table 8b
noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
Table 8
commentary
Candidates deliver similar number of concepts.
Obama has -3.6% (529 vs 549) fewer noun phrases than Romney. But, given that he delivered fewer words and a higher fraction of unique words, it is not surprising that he had a higher fraction of unique phrases by Δrel=+4.6% (Δabs=+2.0%, 45.4% vs 43.39%).
Obama was less repetitive, which can be seen in the number of phrases per noun. He delivered +5.7% (0.37 vs 0.35) more noun phrases per noun than Romney.
Obama's phrases were only slightly longer by +0.9% (2.31 vs 2.29), for all phrases, and by +0.4% (2.36 vs 2.35), for top-level phrases.
Exclusive and Shared Noun Phrase Count and length
Table 9a
exclusive and shared noun phrase count
Counts of exclusive and shared noun phrases in words and per noun.
Table 9b
exclusive and shared noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
Table 9
commentary
Relatively few phrases are spoken by both speakers.
The majority of phrases are not shared by the speakers. Only 16.0% of all phrases are shared, with Obama and Romney delivering roughly the same number of exclusive phrases (448 vs 458). The number of total and unique top-level phrases is almost identical for the two speakers.
Exclusive phrases tended to be longer for the candidates (2.34, 2.35) than shared (2.08).
Windbag Index
The Windbag Index is a compound measure that characterizes the complexity of speech. A low index is indicative of succinct speech with low degree of repetition and large number of independent concepts.
Table 10
windbag index
Windbag Index for each speaker. The higher the value, the more repetitive the speech.
Table 10
commentary
Obama thoroughly less repetitive than Romney — by almost 50%
Obama had -31.7% lower Windbag Index. This is a reflection of the fact that his speech is less repetitive across 7 of the 8 metrics. The only factor that was lower for Obama was the fraction of words that were non-stop (0.438 vs 0.444).
Note that the 2012 Windbag Index cannot be compared directly to the 2008 version because of different methodology in computing noun phrases.
Word Clouds
In the word clouds below, the size of the word is proportional to
the number of times it was used by a candidate (method details).
Not all words from a group used to draw the cloud fit in the image
— less frequently used words for large word groups may fall
outside the image.
All Words for Each Candidate
Each candidate's debate portion was extracted and frequencies were
compiled for each part of speech (noun, verb, adjective, adverb), with
words colored by their part of speech category.
The distribution of sizes within a tag cloud follows the frequency
distribution of words. However, word size cannot be compared between
clouds, since the minimum and maximum size of the words is fixed.
Debate Word Cloud for Barack Obama - all words
Debate Word Cloud for Mitt Romney - all words
Romney's cloud appears more bloated that Obama's. This is due to his higher repetition — recall that Romney had a non-stop average word frequency +10.3% (3.2 vs 2.9) higher than Obama.
Romney's most commonly used verb was "want", suggesting a
sense of desire and entitlement on the part of his constituents and
"will", suggesting that he'll give it to them. From his top words you
can construct the sentence "people just want lower tax".
Obama mentioned Romney quite a bit (Romney did not return the
favour) and had a predilection for "make", "think", "know", as well as "want".
Exclusive Words for Each Candidate
The clouds below show words used exlusively by a candidate. For
example, if candidate A used the word "invest" (any number of times),
but candidate B did not, then the word will appear in the exclusive word
tag cloud for candidate A.
Words exclusive to Barack Obama
Words exclusive to Mitt Romney
Romney engages in some fearmongering, with words like "lose", "kill", "debt" and "always". Remember, these are words unique to Romney, not used by Obama. Obama's major exclusive words were "difference", "folks", "opportunity" and "loopholes".
Notice Romney's approach to describing the middle-class — he uses "middle-income", whereas Obama uses "average" and "folks".
Part of Speech Word Clouds
In these clouds, words from each major part of speech were colored
based on whether they were exclusive to a candidate or shared by the
candidates.
The size of the word is relative to the frequency for the candidate
— word sizes between candidates should not be used to indicate
difference in absolute frequency.
Cloud of noun words, by speaker
Obama's contribution of "folks" and "opportunity" dominates here. This means that within the set of words unique to Obama, he repeated them more than the corresponding top-use words of Romney, which were "half" (refering to the deficit) and "retirees".
The rest of the cloud, however, is filled more with Romney's words than Obamas, suggesting repetition now on Romney's part. Romney is paranoid about "food", "stamps" and "debt", which are all words unmentioned by Obama.
Some interesting nouns for Obama are "choices", "math", "law", and "war". Romney has "principle", "Spain", "China", "burden" and "mortgage".
Cloud of verb words, by speaker
The top verbs unique to Romney were "lose" and "hurt", followed by
words like "concerned", "taxed" and "kill". For Obama, the top unique
verb was the general "went", and also "buy", "created" and "called".
If the verbs are an indication of action planned for and supported
by the candidates, then Romney is someone who wishes to "stop" and "simplify" and worries about being "kill"(ed) and "crushed". Obama is ever more hopeful and positive with "initiate", "invest", "created", and "reflect".
Cloud of adjective words, by speaker
Obama and Romney reserve unique adjectives to refer to the middle class. Romney uses their income, "middle-income", whereas Obama uses the more neutral "average". Romney did not use "fair", mentioned by Obama 4 times.
Obama contributes more large words to this cloud, indicating greater relative repetition in adjectives exclusive to him.
Cloud of adverb words, by speaker
Adverbs yield the smallest word cloud because they are the least frequent of the four parts of speech I examined. Obama seems to want to energize the audience with "aggressively", "vitally" and "essentially". Romney is categorical with "always" and "rather". These words suggest that Romney is guided by immutable rules that apply "always" and "correctly" (to him, also "obviously").
Note Romney's use of technical words like "mathematically", against Obama's "structurally", and "extraordinarily". I'd vote for Obama merely for his exclusive use of "ironically".
Cloud of all words, by speaker
When all parts of speech are combined into cloud, Romney's unique words swamp out those of Obama, suggesting that Romney repeated terms exclusive to him more frequently.
Obama wants to remind "folks" of "opportunity", sticking to his message of hope. Romney, on the other hand, is reminding the audience that he hasn't forgotten the "middle-income" segment.
Word Pair Clouds for Each Candidate
word pairs for Barack Obama
▲
adjective/adjective by Barack Obama
▲
adjective/adverb by Barack Obama
▲
adjective/noun by Barack Obama
▲
adjective/verb by Barack Obama
▲
adverb/adverb by Barack Obama
▲
adverb/noun by Barack Obama
▲
adverb/verb by Barack Obama
▲
noun/noun by Barack Obama
▲
noun/verb by Barack Obama
▲
verb/verb by Barack Obama
word pairs for Mitt Romney
▲
adjective/adjective by Mitt Romney
▲
adjective/adverb by Mitt Romney
▲
adjective/noun by Mitt Romney
▲
adjective/verb by Mitt Romney
▲
adverb/adverb by Mitt Romney
▲
adverb/noun by Mitt Romney
▲
adverb/verb by Mitt Romney
▲
noun/noun by Mitt Romney
▲
noun/verb by Mitt Romney
▲
verb/verb by Mitt Romney
The major contributors to Obama's word pair tag clouds are "right now", "small businesses", "middle-class families", and "well think". Romney knows what is "always best" and focuses on "efficient private" and "free ever", with heavy reference to finances.
Obama's noun/noun combination tag is uniform in size, suggesting that he did not put undue emphasis on any one specific concept. He mentions "health care", "tax cuts", and "health insurance". Romney has some very large word in the noun/noun cloud, indicating heavy repetition of "health care" and "income tax".
Downloads
Debate transcript
Parsed word lists (word lists, part of speech lists, noun phrases, sentences)
Word clouds
Raw data structure
Please see the methods section for details about these files.