Word Analysis of 2012 U.S. Presidential Debates
Barack Obama (2008 vs 2012)
Introduction
In this analysis the content of Obama's 2008 first debate (vs McCain) is compared to that of the 2012 first debate (vs Romney). The purpose of this comparison is to highlight the debate dynamics of Obama's as the first-time candidate and as the encumbent.
The method of analysis is the same as for other debates, except in this case the transcript is constructed by treating Obama in 2008 as one speaker and Obama in 2012 as the another.
Word Statistics
Debate Word Count
Summary Word Count
The summary word count reports the total number of words and the
number of unique, non-stop words
used by each candidate. Word number is expressed as both absolute and
relative values.
Table 1a
all words
Number of all words and unique words used by each speaker.
Table 1b
exclusive and shared words
Words exclusive to speaker (e.g. speaker A
but not speaker B) and shared by speakers (speaker A
and B).
Obama fit +3.3% (7,517 vs 7,280) more words back in 2008 than in 2012. He was less repetitive in 2008, delivering Δrel=+5.8% (Δabs=+1.0%, 18.2% vs 17.2%) more unique words.
Stop Word Contribution
In the table below, the candidates' delivery is partitioned into stop and non-stop words. Stop words (full list) are frequently-used bridging words (e.g. pronouns and conjunctions) whose meaning depends entirely on context. The fraction of words that are stop words is one measure of the complexity of speech.
Table 2a
non-stop words
Counts of stop and non-stop words.
Table 2b
exclusive and shared non-stop words
Non-stop words exclusive to speaker (e.g. speaker A
but not speaker B) and shared by speakers (speaker A
and B).
Stop word content was very similar in 2008 and 2012. The most popular non-stop word used exclusively in 2008 were "troops", "nulcear", "iran", and "al Qaeda", whereas in 2012 it was "insurance", "small", "deficit", "opportunity" and "middle-class" (not counting references to Obama's opponents, e.g. "John", "McCain", "governor" and "Romney").
There were quite a bit more words unique, +20.5% (747 vs 620), to the 2008 debate than 2012.
All further word use statistics represent content that has been filtered for stop words.
Word frequency
The word frequency table summarizes the frequency with which words
were used. I show the average word frequency and the weighted
cumulative frequencies at 50 and 90 percentile. The average word
frequency indicates how many times, on average, a word is used. For a
given fraction of the entire delivery, the weighted cumulative
frequency indicates the largest word frequency within this fraction
(details about weighted
cumulative distribution).
Table 3a
word use frequency
Average and 50%/90% percentile word frequencies.
Table 3b
exclusive and shared non-stop word use frequency
Average and 50%/90% cumulative percentile word frequencies. Non-stop words exclusive to speaker (e.g. speaker A
but not speaker B) and shared by speakers (speaker A
and B).
Word frequency profile indicates Obama increased repetition in 2012 by +11.5% (2.9 vs 2.6).
Sentence Size
Table 4
sentence size
Number of sentences spoken by each speaker and sentence word count statistics. Number of words in a sentence is shown by average and 50%/90% cumulative values for all, stop and non-stop words.
Obama's sentences were +10.5% (8.4 vs 7.6) longer in 2012. Given that repetition was also higher, the overall delivery was less dense.
Part of Speech Analysis
In this section, word frequency is broken down by their part of
speech (POS). The four POS groups examined are nouns, verbs,
adjectives and adverbs. Conjunctions and prepositions are not
considered. The first category (n+v+adj+adv) is composed of all four
POS groups.
Part of Speech Count
Table 5
part of speech count
Count of words categorized by part of speech (POS).
Part of speech analysis shows little change from 2008 to 2012. The ratios of each of noun, verb, adjective and adverb is similar. Verb use is slightly up in 2012 by Δrel=+5.2% (Δabs=+1.4%, 28.4% vs 27%).
In 2012 Obama used a greater fraction of unique adverbs, Δrel=+13.5% (Δabs=+4.8%, 40.3% vs 35.5%). Unique fraction of all other parts of speech was lower in 2012.
In fact, adverbs were the only part of speech for which Obama's delivery was richer in 2012, by +5.1% (62 vs 59). In 2012 he delivered -10.5% (543 vs 607) fewer nouns, -4.8% (356 vs 374) fewer verbs and -14.0% (252 vs 293) fewer adjectives.
Part of Speech Frequency
Table 5
part of speech frequency
Frequency of words categorized by part of speech (POS).
Repetition was higher in 2012 for all parts of speech, except adverbs: nouns by -100.0% (0 vs 47), verbs by +9.3% (2.35 vs 2.15), and adjectives by +14.7% (2.03 vs 1.77). Adverb repetition fell by -11.7% (2.48 vs 2.81),
Part of Speech Pairing
Through word pairing, I extract concepts from the text. The number of unique word pairs is a function of sentence length and is one of the measures of complexity.
Table 6a
part of speech pairing — Barack Obama (2008)
Word pairs (total and unique) categorized by part of speech (POS)
Table 6b
part of speech pairing — Barack Obama (2012)
Word pairs (total and unique) categorized by part of speech (POS)
Table 6c
unique part of speech pairing — candidate comparison
Unique word pairs categorized by part of speech (POS)
The 2012 debate saw longer sentences, but fewer unique words. Nevertheless, the number of noun/noun combinations in 2008 was higher by +15.7% (3,993 vs 3,451), indicating that the 2008 debate was richer in concepts.
Exclusive and Shared Usage
This section enumerates words that were exclusive to a
candidate (e.g. used by one candidate but not the other). This content
provides insight into what the candidates' priorities are and
reveals differences in perspective on similar topics.
For a given part of speech, the table breaks down the number of
words that were spoken by only one of the candidates or both
candidates (intersection). The last row includes words spoken by
either candidate (union).
Table 7
exclusive word usage
Total and unique words used exclusively by a candidate, or by both.
The 2008 debate had +19.3% (717 vs 601) more exclusive words than 2012. The number of nouns and adjectives unique to the debate was +26.0% (368 vs 292) and +25.4% (168 vs 134) higher in 2008 than 2012, respectively. This is a good indication of just how much more vigorous and varied the 2008 debate was than 2012.
Noun Phrase Usage
Noun phrases were extracted from the text and analyzed for
frequency, word count, unique word count and richness. Single-word phrases were not counted.
Top-level noun phrases are those without a parent noun phrase (a
parent phrase is one that a similar, longer phrase). Derived noun
phrases are those with a parent (more
details about noun phrase analysis).
The top-level noun phrases can be interpreted as independent
concepts. Derived noun phrases can be interpreted as variants on
concepts embodied by the top-level phrases.
Noun Phrase Count and length
This table reports the absolute number of noun phrases, which is
related to the number of nouns, and their length.
Table 8a
noun phrase count
Counts of noun phrases in words and per noun.
Table 8b
noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
As hinted by the analysis above, the drop in the number of concepts is reflected in the reduced number of noun phrases. Although in 2012 noun phrases were slightly longer, +1.3% (2.36 vs 2.33) for top-level phrases, there were -8.9% (234 vs 257) fewer of them.
Exclusive and Shared Noun Phrase Count and length
Table 9a
exclusive and shared noun phrase count
Counts of exclusive and shared noun phrases in words and per noun.
Table 9b
exclusive and shared noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
The 2008 debate had a greater variety of exclusive noun phrases, by +9.6% (252 vs 230).
Windbag Index
The Windbag Index is a compound measure that characterizes the complexity of speech. A low index is indicative of succinct speech with low degree of repetition and large number of independent concepts.
Table 10
windbag index
Windbag Index for each speaker. The higher the value, the more repetitive the speech.
The 2008 debate delivery was denser and more vigorous. Repetition plagued Obama in 2012, resulting in a +41.8% (468 vs 330) larger Windbag Index. The only terms in 2012 that contributed to a lower index were the fraction of non-stop words and the fraction of unique adverbs. All other quantities indicated greater repetition across all others parts of the delivery.
Word Clouds
In the word clouds below, the size of the word is proportional to
the number of times it was used by a candidate (method details).
Not all words from a group used to draw the cloud fit in the image
— less frequently used words for large word groups may fall
outside the image.
All Words for Each Candidate
Each candidate's debate portion was extracted and frequencies were
compiled for each part of speech (noun, verb, adjective, adverb), with
words colored by their part of speech category.
The distribution of sizes within a tag cloud follows the frequency
distribution of words. However, word size cannot be compared between
clouds, since the minimum and maximum size of the words is fixed.
Debate Word Cloud for Barack Obama (2008) - all words
Debate Word Cloud for Barack Obama (2012) - all words
Exclusive Words for Each Candidate
The clouds below show words used exlusively by a candidate. For
example, if candidate A used the word "invest" (any number of times),
but candidate B did not, then the word will appear in the exclusive word
tag cloud for candidate A.
Words exclusive to Barack Obama (2008)
Words exclusive to Barack Obama (2012)
Part of Speech Word Clouds
In these clouds, words from each major part of speech were colored
based on whether they were exclusive to a candidate or shared by the
candidates.
The size of the word is relative to the frequency for the candidate
— word sizes between candidates should not be used to indicate
difference in absolute frequency.
Cloud of noun words, by speaker
Cloud of verb words, by speaker
Cloud of adjective words, by speaker
Cloud of adverb words, by speaker
Cloud of all words, by speaker
Word Pair Clouds for Each Candidate
word pairs for Barack Obama (2008)
▲
adjective/adjective by Barack Obama (2008)
▲
adjective/adverb by Barack Obama (2008)
▲
adjective/noun by Barack Obama (2008)
▲
adjective/verb by Barack Obama (2008)
▲
adverb/adverb by Barack Obama (2008)
▲
adverb/noun by Barack Obama (2008)
▲
adverb/verb by Barack Obama (2008)
▲
noun/noun by Barack Obama (2008)
▲
noun/verb by Barack Obama (2008)
▲
verb/verb by Barack Obama (2008)
word pairs for Barack Obama (2012)
▲
adjective/adjective by Barack Obama (2012)
▲
adjective/adverb by Barack Obama (2012)
▲
adjective/noun by Barack Obama (2012)
▲
adjective/verb by Barack Obama (2012)
▲
adverb/adverb by Barack Obama (2012)
▲
adverb/noun by Barack Obama (2012)
▲
adverb/verb by Barack Obama (2012)
▲
noun/noun by Barack Obama (2012)
▲
noun/verb by Barack Obama (2012)
▲
verb/verb by Barack Obama (2012)
Downloads
Debate transcript
Parsed word lists (word lists, part of speech lists, noun phrases, sentences)
Word clouds
Raw data structure
Please see the methods section for details about these files.