Word Analysis of 2008 U.S. Presidential Debates
Barack Obama vs. John McCain (1st debate)
26 September 2008
Word Statistics
Debate Word Count
Summary Word Count
The summary word count reports the total number of words and the
number of unique, non-stop words
used by each candidate. Word number is expressed as both absolute and relative values.
Table 1. Number of all words and unique words used by each speaker.
Table 1 Analysis
The candidates' time allowance was equal and given the fact
that both candidates used approximately the same number of words, it
can be concluded that the global cadence of speech is similar.
Although I am not surprised that ratio of the total number of used
words is similar (Obama delivered 7,529 words, 7% more than
McCain's 7,043), the fact that the total number of unique words was nearly identical for both candidates (1,376 vs 1,380) was a
shock. Though both Obama and McCain can be considered
articulate, Obama presents as verbally sharper than McCain and his
delivery has a greater nimbleness to it, which is reflected in his
slightly higher volume of word delivery. During his unscripted
deliveries, Obama's manner hints at a significant command of the
English language and suggests that his verbal abilities are not
stretched. For this reason, I was expecting his unique word count to
be higher.
The fact that the unique word count is identical suggests a high
degree of rehearsal and preparation. It may well be that both
candidates spent significant amount of time being coached to effect
the best delivery that would reach the most number of people. It may
also be that through the process of political selection, both
candidates epitomize an archetype of spoken word delivery.
It also came as a surprise that the total number of unique words
used by both candidates was only 2,115. Initially, I felt this to be
low — surely the matters of state require more than two thousand
words. For both candidates, I suspect a significant amount of coaching
towards conformity to the average American's comprehension.
Table 1 Legend
Stop Word Contribution
In the table below, the candidates' delivery is partitioned into stop and non-stop words. Stop words are frequently-used bridging words (e.g. pronouns and conjunctions) and do not carry inherent meaning. The fraction of words that are stop words is one measure of the complexity of speech.
Table 2. Expanded analysis of total, stop and non-stop word count.
Table 2 Analysis
Obama's absolute stop word count is higher than McCains
(4,263 vs 3,922) but Obama's total word count is also
higher. When the total number of words is considered, Obama and McCain
stop word delivery is similar, at 56.6% and 55.7%, respectively.
Stop word counts do not reveal significant difference between the two
candidates.
Table 2 Legend
All further analysis uses debate content that has been filtered for stop words.
Word frequency
The word frequency table summarizes the frequency with which words were used. Specifically, the average word frequency and the weighted cumulative frequencies at 50 and 90 percentile. The average word frequency indicates how many times, on average, a word is used. For a given fraction of the entire delivery, the weighted cumulative frequency indicates the largest word frequency within this fraction (details about weighted cumulative distribution).
Table 3. Average, 50%, and 90% weighted cumulative word frequencies (content filtered for stop words).
Table 3 Analysis
Both Obama and McCain average word frequency is similar, at 2.63
and 2.51, respectively. 50% of Obama's speech is composed of
words he used 4 times or fewer — identical to McCain. 90% of
Obama's (McCain's) speech was composed of words used 24
(21) times, or fewer, with the difference not being significant.
Table 3 Legend
Sentence Size
Table 4. Number of words in a sentence, as measured by average number of words, 50% and 90% weighted cumulative values for three word groups (all words, stop words and non-stop words).
Table 4 Analysis
Obama's sentences are 9% longer (17.4 vs 15.9) when all words
are considered and 8% longer (7.7 vs 7.1) when stop words are
removed. This sentence size difference is commensurate with the
difference in total word delivery, suggesting that the total number of
sentences by the candidates was similar. In fact, Obama delivered 449
sentences and McCain 460 (not shown).
Table 4 Legend
Part of Speech Analysis
In this section, word frequency is broken down by their part of speech (POS). The four POS groups examined are nouns, verbs, adjectives and adverbs. Conjunctions and prepositions are not considered. The first category (n+v+adj+adv) is composed of all four POS groups.
Part of Speech Count
Table 5. Count of words (total and unique) categorized by part of speech (POS).
Table 5 Analysis
The composition of the candidates' speech by part of speech is remarkably similar. The relative breakdown of nouns, verbs, adjectives and adverbs for Obama is 53:25:15:7 and 54:26:14:5 for McCain. I am more than mildly surprised at such an incredible uniformity in the speech of the candidates. The ratio of noun:verb:adjective:adverb reduces to about 8:4:2:1.
Within each POS category, the number of unique words is nearly identical for both candidates, with Obama (McCain) having 39% (41%), 46% (45%), 45% (45%) and 34% (39%) of their nouns, verbs, adjectives and adverbs unique. The largest difference is in the use of adverbs, with McCain having 39.2% of all his adverbs unique, whereas Obama's adverbs have a unique component of 34.3%.
Note that Obama uses adverbs more than McCain (6.8% vs 5.5%) — his speech included 213 adverbs (73 unique) whereas McCain used 166 adverbs (65 unique).
Table 5 Legend
Part of Speech Frequency
Table 5. Frequency of words by part of speech (POS).
Table 5 Analysis
This table hints at a significant difference in verb and adverb use.
As indicated in the previous table, McCain used fewer total adverbs
than Obama (166 vs 213), but his unique adverb fraction was higher
(39.2% vs 34.3%). It looks like Obama really likes adverbs, and really
likes repeating them too. Obama's average adverb frequency was
2.92, compared to McCain's 2.55. Moreover, 90% of Obama's
adverbs were used with a frequency of 36 times or less, whereas 90% of
McCain's adverbs were used with a frequency of 13 or less.
Obama, however, is significantly less repetitive with verbs, with 90% of his verbs
used 16 times or less, compared to 90% of McCain's verbs which
were used 25 times or less. Thus, although the candidates' total
and unique verb count was similar (see previous table), Obama's
distribution in verb frequency was skewed towards less repetition.
Table 5 Legend
Part of Speech Pairing
Through word pairing, I attempt to capture the contextual use of parts of speech within a sentence and extract concepts from the text. Specifically, unique pairs of words indicate complexity and inter-relatedness between concepts in a sentence.
Table 6a (Barack Obama). Word pairs (total and unique) categorized by part of speech (POS) for Barack Obama.
Table 6b (John McCain). Word pairs (total and unique) categorized by part of speech (POS) for John McCain.
Table 6c (Barack Obama vs John McCain). Word Pairs (total and unique) categorized by part of speech (POS) for both candidates.
Table 6 Analysis
Obama had consistenly more total and unique pairings than
McCain. This is largely due to the fact that Obama had longer
sentences (7.7 non-stop words) than McCain (7.1 non-stop
words).
The largest pairing difference was seen in the adverb/adverb
category, where obama had nearly twice as many pairings (84 vs 49)
than McCain. In other POS pairing categories, McCain' numbers
were consistently 70-85% that of Obama's.
When unique pairings are compared, the candidates fare more
equally, both having 80-90% of pairings unique. The only exception was
adverb/adverb pairs, with Obama's unique component being 78.6%
compared to McCain's 91.8% (Obama repeats adverbs).
Table 6a,b Legend
Table 6c Legend
Word usage
This section enumerates words that were unique to a canddiate
(e.g. used by one candidate but not the other). For a given part of
speech, the table breaks down the number of words that were spoken by
only one of the candidates or both candidates (intersection). The last
row includes all words (union).
Table 7. Total and unique words used exclusively by a candidate or by both candidates.
Table 7 Analysis
Previous tables indicated that speech delivery for the candidates
is incredibly uniform. This table shows each candidate's
contribution to unique words in the debate.
There were a total of 1,034 unique nouns used in the debate, with
371 (35.9%) used only by Obama, 389 (37.6%) by McCain and 274 (26.5%)
by both. In fact, for all parts of speech the candidates had more
words that they used exclusively than those they shared with each
other. Even for adverbs, which is the least populated group of words,
the candidates shared only 30 adverbs, and had 43 (Obama) and 35
(McCain) that they used exclusively.
It is not surprising that the proportion of unique words was larger
for the set of exclusive words than for the set of shared
words. Typically, the unique proportion within exclusive words were
around (60-70%) but much lower at 12-15% for shared words. This
indicates that the words spoken by both candidates were repeated much
more frequently.
Table 7c Legend
Noun Phrase Usage
Noun phrases were extracted from the text and analyzed for frequency, word count, unique word count and richness.
Top-level noun phrases are those without a parent noun phrase (a parent phrase is one that a similar, longer phrase). Derived noun phrases are those with a parent (more details about noun phrase analysis).
The top-level noun phrases can be interpreted as independent concepts. Derived noun phrases can be interpreted as variants on concepts embodied by the top-level phrases.
Noun Phrase Count
This table reports the absolute number of noun phrases, which is related to the number of total words (specifically, nouns) delivered. The next table presents the number of phrases relative to the number of nouns.
Table 8. Number of noun phrases.
Table 8 Analysis
Obama has +4.0% more noun phrases than McCain (855 vs 851). The difference between the fraction of unique noun phrases, however, is smaller between Obama and McCain, whose noun phrase uniqueness is 84.3% and 83.3%. Relatively to the number of noun phrases, the number of top-level phrases is similar between them, as is the top-level uniqueness ratio.
Table 8c Legend
Noun Phrase Richness
The previous table presented the total number of noun phrases, which can be equated to individual concepts. In this table, this value is shown relative to the number of nouns used. The interpretation of this ratio is that of richness. In other words, how many noun phrases were constructed, per noun.
Table 9. Number of noun phrases relative to the number of nouns.
Table 9 Analysis
The ratios here are very similar. Extremely similar, in fact, with the exception of the ratio of unique noun phrases to unique nouns, which is 1.16 for Obama and 1.07 for McCain. The interpretation is that Obama constructed a greater diversity of distinct concepts with his nouns.
Table 9c Legend
Noun Phrase Frequency and Size
Table 10. Noun phrase frequency, word count and unique word count.
Table 10 Analysis
Values are nearly identical for both candidates. Both repeat noun phrases an average of 1.19-1.20 times, and have 2.73-2.76 words per noun phrase.
Table 10c Legend
Windbag Index
The Windbag Index is a compound measure that characterizes the complexity of speech. A low index is indicative of succinct speech with low degree of repetition and large number of independent concepts.
Table 11. Windbag Index for each speaker. The higher the value, the greater the degree of repetition in the speech.
Table 11 Analysis
Obama's Windbag Index is +14.7% when compared to McCain's, at 422 vs 368.
The index is a compound score, with contributions from nine terms. Individually, Obama does better at the verb, adjective and noun phrase components. McCain, on the other hand, has superior contributions from word counts, nouns and adverbs.
Table 11c Legend
Tag Clouds
In the tag clouds below, the size of the word is proportional to
the number of times it was used by a candidate (tag cloud details).
Not all words from a group used to draw the cloud fit in the
image. Specifically, less frequently used words for large word groups
fall outside the image.
Debate Tag Clouds for Each Candidate — All Words
Each candidate's debate portion was extracted and frequencies were
compiled for each part of speech (noun, verb, adjective, adverb), with
words colored by their part of speech category. The words in these
tag clouds include words unique to one candidate as well as words used by
both candidates. For other tag clouds below, only words unique to a
candidate are used.
Keep in mind that the word sizes between tag clouds cannot be
directly compared, since the minimum and maximum size of the words in
each tag cloud is the same. However, the distribution of sizes within
a tag cloud reflects the frequency distribution of words (tag cloud details).
Debate Tag Cloud for Barack Obama — all words
Debate Tag Cloud for John McCain — all words
Debate Tag Cloud Analysis
The tag clouds for all words used by each candidate powerfully show
the difference in word frequency distribution between Obama and
McCain. In a few tables, I indicated the average and 50%/90% weighted
cumulative values for frequencies, but did not explicity show a
distribution. Well, these tag clouds show that.
McCain's cloud has a significantly more large
words, when compared to Obama's, indicating that McCain repeated
a larger subset of words throughout the debate. For example,
McCain's use of the word "nuclear" was nearly as frequent as his
use of the word "Obama". On the other hand, Obama's use of
"nuclear" was smaller than his use of the word "McCain".
It is also interesting to see that Obama very frequently used
"John", calling his opponent by his first name, whereas McCain
never used Obama's first name, Barack.
Debate Tag Clouds for Each Candidate — Unique Words
The tag clouds below show only used exlusively by a candidate. For
example, if candidate A used the word "invest" (any number of times),
but the other candidate B did not, then the word will appear in the
unique word tag cloud for candidate A.
Debate Tag Cloud for Barack Obama — words unique to Barack Obama
Debate Tag Cloud for John McCain — words unique to John McCain
Unique Word Tag Cloud Analysis
The tag cloud composed of words used exclusively by McCain'
indicates a high degree of relative repetition of a small subset of
the words. The center of McCain's tag cloud is bloated with large
text, indicating high relative usage of words like "afraid",
"serious", "fragile", and "badly". Remember, these are words unique to McCain — Obama did not use these words.
Obama's tag cloud shows relatively less repetition among the
words used only by Obama. In general, the words used by Obama that
were not used by McCain are more uniformly distributed in
frequency. It is surprising words like "recognize", "strategic",
"solve", "invest", and "agree" are unique to Obama (they
were not used by McCain).
Part of Speech Tag Clouds
In these tag clouds, words by both candidates were categorized on the
basis of exclusivity to a candidate. Words unique to each candidate
are drawn with a different color. Words used by both candidates are
shown in grey.
The size of the word is relative to the frequency for the candidate
— word sizes between candidates should not be used to indicate
difference in absolute frequency.
Words were further cateogorized by part of speech (noun, verb,
adjective, adverb) and individual tag clouds were prepared for each
category.
The last tag cloud in this section, which uses all (noun + verb +
adjective + adverb) parts of speech.
Tag Cloud of noun words, by speaker
Noun Tag Cloud Analysis
Not surprisingly, the candidates' most frequent noun was
Obama (for McCain) and John (for Obama). As I mentioned previously, it
is curious to find that McCain never refered to Obama by his first
name, Barack.
The cloud of green words around the central core of the tag cloud
indicates that nouns unique to Obama appeared at a higher relative
frequency than McCain.
Some interesting nouns for Obama are "alternative", "fundamentals",
"medicare", and "diplomacy". On the other hand, words like
"restraint", "failure", "corruption" and "maverick" are unique to
McCain.
Tag Cloud of verb words, by speaker
Verb Tag Cloud Analysis
The top verb unique to McCain was "control", closely followed by
"fought" and "succeed", followed by verbs like "defeat", "win", and
"legitimize". For Obama, the top unique verb was "getting", followed
by "invest", "funded", "recognize", "agree" and "rebuild", but those
were of relatively lower frequencies than for words at the same rank
in McCain's list. McCain repeats strong verbs.
If the verbs are an indication of action planned for and supported
by the candidates, then McCain is someone who wishes to "legitimize"
and "succeed [at] control", whereas Obama is more conciliatory and
positive with "invest", "focused", "solve" and "recognize".
Tag Cloud of adjective words, by speaker
Adjective Tag Cloud Analysis
McCain's exclusive use of "afraid", "serious" and "fragile" are interesting and hint at fear mongering.
Tag Cloud of adverb words, by speaker
Adverb Tag Cloud Analysis
Adverbs are the least frequent of the four parts of speech, so the
tag cloud here is less complex. Both candidates use strong and certain
action modifiers like "completely" (McCain) and "absolutely"
(Obama). As for other parts of speech, McCain had high relatively
frequency of terms unique to him, and this is evident by a more large
blue words in this tag cloud.
It is interesting to see Obama exclusively use words like "responsibly" and "structurally".
Tag Cloud of all words, by speaker
All Tag Cloud Analysis
When all parts of speech are combined into one tag cloud,
Obama's unique words swamp out those of McCain',
suggesting that when parts of speech are combined, Obama repeated
terms exclusive to him more frequently.
Word Pair Vignette Tag Clouds for Each Candidate
Tag Cloud of word pairs by Barack Obama
▲
adjective/adjective by Barack Obama
▲
adjective/adverb by Barack Obama
▲
adjective/noun by Barack Obama
▲
adjective/verb by Barack Obama
▲
adverb/adverb by Barack Obama
▲
adverb/noun by Barack Obama
▲
adverb/verb by Barack Obama
▲
noun/noun by Barack Obama
▲
noun/verb by Barack Obama
▲
verb/verb by Barack Obama
Word Pair Tag Cloud Analysis for Barack Obama.
The major contributors to Obama's word pair tag clouds are
open-ended word pairs such as "last several" (adjective/adjective),
"correct just" (adjective/adverb), "mccain senator" (noun/noun). A
couple of concepts such as "al qaeda" (qaeda was tagged as a verb by
the Brill tagger), "north korea" were prominent, but these are proper
nouns and reflect the topic under discussion.
Obama touched on many concepts, as indicated by the relatively flat
distribution of sizes in the noun/noun tag cloud. Some of these were
"care health", "biodiesel energy", "oil world", "energy wind". Some
curious ones were "john spending", "crisis day", "afghanistan iraq",
"deal russia". These should be contrasted to noun/noun pairs for McCain (below), which focused on threats and the military.
Tag Cloud of word pairs by John McCain
▲
adjective/adjective by John McCain
▲
adjective/adverb by John McCain
▲
adjective/noun by John McCain
▲
adjective/verb by John McCain
▲
adverb/adverb by John McCain
▲
adverb/noun by John McCain
▲
adverb/verb by John McCain
▲
noun/noun by John McCain
▲
noun/verb by John McCain
▲
verb/verb by John McCain
Word Pair Tag Cloud Analysis for John McCain.
McCain' pair tag clouds have significantly different
morphology than those of Obama. Primarily, due to McCain'
repetitive use of certain words, the tag clouds are overwhelmed with
these frequent (therefore large) pairs.
His adjective/noun tag clouds has an apocalyptic theme: "nuclear
threat", "important thing", "long way", and (this is fascinating) "old
russian" and "next states".
The noun/noun tag cloud size distribution is relatively flat, like
that of Obama, and indicates topics such as "threat weapons",
"business tax", "ahmadinejad extermination" and "aggression
georgia". The majority of McCain's noun/noun concepts were
threat- and military-related (contrast this to Obama, who was focused
more on energy and economy). Environment? What environment?
Downloads
debate transcript (courtesy of CNN).
parsed word lists (analyzed transcript, including words by speaker, by POS, and all POS pairings).
tag cloud images
data structure
Please see the methods section for details about these files.