Word Analysis of 2012 U.S. Presidential Debates
Joe Biden vs Paul Ryan
11 October 2012
Introduction
While the presidential candidates get three tries, there is only one vice-presidential debate. Let's see how Biden and Ryan fared.
Word Statistics
Debate Word Count
Summary Word Count
The summary word count reports the total number of words and the
number of unique, non-stop words
used by each candidate. Word number is expressed as both absolute and
relative values.
Table 1a
all words
Number of all words and unique words used by each speaker.
Table 1b
exclusive and shared words
Words exclusive to speaker (e.g. speaker A
but not speaker B) and shared by speakers (speaker A
and B).
Table 1
commentary
Biden slowest speaker.
Biden spoke for longer (41:32 = 2,492 seconds) than Ryan (40:12 min = 2,412 seconds) (debate timing by CNN).
Ryan used +16.2% (7,708 vs 6,631) more words than Biden — he spoke faster with a rate of 3.20 words/second, +20.3% (3.2 vs 2.66) faster than Biden's 2.66 words/second. Biden and Ryan delivered nearly the same relative number of unique words, with Biden edging Ryan slightly by Δrel=+1.1% (Δabs=+0.2%, 18.6% vs 18.4%).
Biden delivered -8.9% (6,631 vs 7,280) fewer words than Obama and spoke -6.0% (2.66 vs 2.83) words/second more slowly. Ryan's speed was also slower than his mate, Ryan, by -16.0% (2.83 vs 3.37), and he delivered slightly fewer words, -1.1% (7,708 vs 7,791). Biden delivered -8.9% (6,631 vs 7,280) fewer words than Obama and spoke -6.0% (2.66 vs 2.83) words/second more slowly.
The debate was more dynamic than the presidential debate, focusing on a greater variety of issues. Correspondingly, the vice-presidential candidates' unique word fraction was higher than their presidential counterparts in the first debate. Recall that Obama had a unique word fraction of 17.2% while Romney had 15.6%. This debate saw Biden with 18.6% and Ryan with 18.4% unique words.
Stop Word Contribution
In the table below, the candidates' delivery is partitioned into stop and non-stop words. Stop words (full list) are frequently-used bridging words (e.g. pronouns and conjunctions) whose meaning depends entirely on context. The fraction of words that are stop words is one measure of the complexity of speech.
Table 2a
non-stop words
Counts of stop and non-stop words.
Table 2b
exclusive and shared non-stop words
Non-stop words exclusive to speaker (e.g. speaker A
but not speaker B) and shared by speakers (speaker A
and B).
Table 2
commentary
Ryan saying more, using speech more distinctive than Biden.
Ryan's fraction of non-stop words is largest at 45.2%, +5.6% (45.2 vs 42.8) than Biden.
Biden had significantly fewer exclusive non-stop words than Romney, -24.1% (578 vs 762). Compare this to Obama's, who had +6.8% (597 vs 559) more exclusive words than Romney in the first debate. The spread of exclusive words between Biden/Ryan and Obama/Ryan is quite different. Whereas Obama and Romney distinguished themselves with roughly the same number of words (597 vs 559), Ryan had a great deal more than Biden (762 vs 578).
All further word use statistics represent content that has been filtered for stop words.
Word frequency
The word frequency table summarizes the frequency with which words
were used. I show the average word frequency and the weighted
cumulative frequencies at 50 and 90 percentile. The average word
frequency indicates how many times, on average, a word is used. For a
given fraction of the entire delivery, the weighted cumulative
frequency indicates the largest word frequency within this fraction
(details about weighted
cumulative distribution).
Table 3a
word use frequency
Average and 50%/90% percentile word frequencies.
Table 3b
exclusive and shared non-stop word use frequency
Average and 50%/90% cumulative percentile word frequencies. Non-stop words exclusive to speaker (e.g. speaker A
but not speaker B) and shared by speakers (speaker A
and B).
Table 3
commentary
Biden low on repetion. Ryan hammers on exclusive words.
Neither Biden, who was least likely to reuse words with a non-stop average frequency of 2.6, nor Ryan approach Romney's average word frequency of 3.2.
Ryan liked to repeat words that were exclusive to him, with an average frequency +12.6% (1.7 vs 1.51) higher than Biden.
Sentence Size
Table 4
sentence size
Number of sentences spoken by each speaker and sentence word count statistics. Number of words in a sentence is shown by average and 50%/90% cumulative values for all, stop and non-stop words.
Table 4
commentary
Both candidates' sentences shorter than those of presidential candidates. Ryan's speech terse.
Both candidates' sentences were short (Biden 5.5, Ryan 5.6 non-stop words/sentence). Compare these numbers to those of Obama, whose sentences had more structure with an average length of 8.4 non-stop words.
Biden's longest sentence was 91 words. Not quite Obama's 112-word whopper, but close.
Biden, 91 words — We need more, but 5.2 million - if they'd get out of the way, if they'd get out of the way and let us pass the tax cut for the middle class, make it permanent, if they get out of the way and pass the - pass the jobs bill, if they get out of the way and let us allow 14 million people who are struggling to stay in their homes because their mortgages are upside down, but they never missed a mortgage payment, just get out of the way.
Ok, that was not a fair sentence. He was obviously blabbering. Let's look at his next largest, at 90 words.
Biden, 90 words — And instead of signing pledges to Grover Norquist not to ask the wealthiest among us to contribute to bring back the middle class, they should be signing a pledge saying to the middle class we're going to level the playing field; we're going to give you a fair shot again; we are going to not repeat the mistakes we made in the past by having a different set of rules for Wall Street and Main Street, making sure that we continue to hemorrhage these tax cuts for the super wealthy.
Ryan on the other hand had a relatively terse longest sentence.
Ryan, 44 words — But we want to see the 2014 transition be successful, and that means we want to make sure our commanders have what they need to make sure that it is successful so that this does not once again become a launching pad for terrorists.
Part of Speech Analysis
In this section, word frequency is broken down by their part of
speech (POS). The four POS groups examined are nouns, verbs,
adjectives and adverbs. Conjunctions and prepositions are not
considered. The first category (n+v+adj+adv) is composed of all four
POS groups.
Part of Speech Count
Table 5
part of speech count
Count of words categorized by part of speech (POS).
Table 5
commentary
Large spread in relative unique adjectives - Biden more diverse.
Ryan pounced on Biden with greater total number of unique nouns, verbs and adjectives. Relatively, Biden had a higher fraction of words across all parts of speech. He particularly distinguished himself with more diverse use of adjectives.
If you compare the relative unique word percent spreads to the first presidential debate, the difference is stark. Whereas Obama and Romney had their relative unique word fractions within 2-3% for each part of speech, the difference between Biden and Ryan grows to as much as 7.5% for adjectives.
Part of Speech Frequency
Table 5
part of speech frequency
Frequency of words categorized by part of speech (POS).
Table 5
commentary
Adverbs repeated less often than verbs.
Noun repetition was similar for both candidates. Ryan repeated verbs, adjective and adverbs at levels consistently higher than Biden. Both candidates repeated adverbs less frequently than their verbs, which does not hold true for Obama or Romney.
Part of Speech Pairing
Through word pairing, I extract concepts from the text. The number of unique word pairs is a function of sentence length and is one of the measures of complexity.
Table 6a
part of speech pairing — Joe Biden
Word pairs (total and unique) categorized by part of speech (POS)
Table 6b
part of speech pairing — Paul Ryan
Word pairs (total and unique) categorized by part of speech (POS)
Table 6c
unique part of speech pairing — candidate comparison
Unique word pairs categorized by part of speech (POS)
Table 6
commentary
Ryan more likely to combine adjectives. Biden forms more complex sentences with adverbs.
Ryan had a consistently higher number of adjective/noun, adjective/verb and adjective/pairings than Biden, suggesting that he inserted them into more sentences than Biden. Biden had more adjective/adverb combinations, which indicate sentences in which both concept and action are modified.
Exclusive and Shared Usage
This section enumerates words that were exclusive to a
candidate (e.g. used by one candidate but not the other). This content
provides insight into what the candidates' priorities are and
reveals differences in perspective on similar topics.
For a given part of speech, the table breaks down the number of
words that were spoken by only one of the candidates or both
candidates (intersection). The last row includes words spoken by
either candidate (union).
Table 7
exclusive word usage
Total and unique words used exclusively by a candidate, or by both.
Table 7
commentary
Candidates share fewer nouns and adjectives than presidential counterparts.
With the exception of adverbs, Ryan had significantly larger total exclusive words for each part of speech. Both candidates shared 25.8% of nouns, lower than 29.3% that we saw with Romney and Obama in their first debate. Shared verb and adverb fractions were similar. Much more exclusivity in adjectives was seen — only 19.5% of adjectives were shared, compared to 27.4% in the frist presidential debate.
Noun Phrase Usage
Noun phrases were extracted from the text and analyzed for
frequency, word count, unique word count and richness. Single-word phrases were not counted.
Top-level noun phrases are those without a parent noun phrase (a
parent phrase is one that a similar, longer phrase). Derived noun
phrases are those with a parent (more
details about noun phrase analysis).
The top-level noun phrases can be interpreted as independent
concepts. Derived noun phrases can be interpreted as variants on
concepts embodied by the top-level phrases.
Noun Phrase Count and length
This table reports the absolute number of noun phrases, which is
related to the number of nouns, and their length.
Table 8a
noun phrase count
Counts of noun phrases in words and per noun.
Table 8b
noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
Table 8
commentary
Ryan has much greater share of exclusive noun phrases.
Biden delivered almost the identical number of top-level noun phrases as Obama (233 vs 234). Ryan's speech was terse with 0.47 unique noun phrases per unique noun (compare Romney at 0.42 and Obama at 0.43).
Exclusive and Shared Noun Phrase Count and length
Table 9a
exclusive and shared noun phrase count
Counts of exclusive and shared noun phrases in words and per noun.
Table 9b
exclusive and shared noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
Whereas Obama and Romney had a similar number of exclusive top-level noun phrases in their first debate (226 vs 224), and Biden's level was similar (222), Ryan had +27.9% (284 vs 222), more than Biden.
Windbag Index
The Windbag Index is a compound measure that characterizes the complexity of speech. A low index is indicative of succinct speech with low degree of repetition and large number of independent concepts.
Table 10
windbag index
Windbag Index for each speaker. The higher the value, the more repetitive the speech.
Table 10
commentary
Biden consistently less repetitive than Ryan.
Biden's Windbag Index is extremely low, -33.2% (199 vs 298) lower than Ryan. Obama and Romney had values of 468 and 685 in their first debate. The two factors for Biden that contributed to a higher component than Ryan was the fraction of non-stop words (incidentally, this was also the fraction for which Obama's value was lower) and the fraction of noun phrases that were top-level.
Word Clouds
In the word clouds below, the size of the word is proportional to
the number of times it was used by a candidate (method details).
Not all words from a group used to draw the cloud fit in the image
— less frequently used words for large word groups may fall
outside the image.
All Words for Each Candidate
Each candidate's debate portion was extracted and frequencies were
compiled for each part of speech (noun, verb, adjective, adverb), with
words colored by their part of speech category.
The distribution of sizes within a tag cloud follows the frequency
distribution of words. However, word size cannot be compared between
clouds, since the minimum and maximum size of the words is fixed.
Debate Word Cloud for Joe Biden - all words
Debate Word Cloud for Paul Ryan - all words
Biden's focus is "american" and "middle-class". For Ryan, it's the generic "people".
Exclusive Words for Each Candidate
The clouds below show words used exlusively by a candidate. For
example, if candidate A used the word "invest" (any number of times),
but candidate B did not, then the word will appear in the exclusive word
tag cloud for candidate A.
Words exclusive to Joe Biden
Words exclusive to Paul Ryan
Biden has "fact", "friend" and modifies with "particularly" and "mostly". Ryan fulfils the Republican fearmongering agenda with "nuclear", "lose", "failed" and "attack". Recall that Romney had similarly devastating words ("lose, "hurt" and "crushed").
Part of Speech Word Clouds
In these clouds, words from each major part of speech were colored
based on whether they were exclusive to a candidate or shared by the
candidates.
The size of the word is relative to the frequency for the candidate
— word sizes between candidates should not be used to indicate
difference in absolute frequency.
Cloud of noun words, by speaker
Biden thinks reality is a buddy, with "fact" and "friend" being prominent. Ryan's language is combative with "weapons", "marine" and "terrorists". Ryan does say "youtube", which means he has the Google.
Cloud of verb words, by speaker
Biden uses "love", which is wonderful, and ryan has "failed", "try", "lose" and "run".
Cloud of adjective words, by speaker
When Ryan mentions "lower", it's not about expectations. Tax, surely.
Cloud of adverb words, by speaker
Ryan has "pretty" and "safely", slightly unusual words given the threatening nature of his frequently used verbs and nouns. His "faster" is matched by Romney's "always".
Cloud of all words, by speaker
Relatively few exclusive words in the center are surrounded by a cloud of shared (grey) words. Quite a diffent view than the first presidential debate, for which this cloud was full of large words with fewer grey contributions.
Word Pair Clouds for Each Candidate
word pairs for Joe Biden
▲
adjective/adjective by Joe Biden
▲
adjective/adverb by Joe Biden
▲
adjective/noun by Joe Biden
▲
adjective/verb by Joe Biden
▲
adverb/adverb by Joe Biden
▲
adverb/noun by Joe Biden
▲
adverb/verb by Joe Biden
▲
noun/noun by Joe Biden
▲
noun/verb by Joe Biden
▲
verb/verb by Joe Biden
word pairs for Paul Ryan
▲
adjective/adjective by Paul Ryan
▲
adjective/adverb by Paul Ryan
▲
adjective/noun by Paul Ryan
▲
adjective/verb by Paul Ryan
▲
adverb/adverb by Paul Ryan
▲
adverb/noun by Paul Ryan
▲
adverb/verb by Paul Ryan
▲
noun/noun by Paul Ryan
▲
noun/verb by Paul Ryan
▲
verb/verb by Paul Ryan
Biden mixes "believe" a lot, which shows up in the verb pairs as "believe free", "aid believe" and "making believe". The last pair sounds vaguely more like Romney than Biden.
Ryan fulfils the mandate to spread fear with "terrorist attack". "Infringing catholic" comes from his sentence
They're infringing upon our first freedom, the freedom of religion, by infringing on Catholic charities, Catholic churches, Catholic hospitals.
Downloads
Debate transcript
Parsed word lists (word lists, part of speech lists, noun phrases, sentences)
Word clouds
Raw data structure
Please see the methods section for details about these files.