Metrics of speech structure of candidates fall within narrow tolerances, suggesting high degree of wordsmithing and rehearsal. For example, noun/verb/adjective/adverb ratio spread is very small with candidates' values within 2%. Relatively small differences seen in unique word count and noun phrase profile.
The Obama/McCain debates began with balanced performance from both candidates but end with Obama verbally overpowering McCain and delivering speech with more concepts and higher complexity.
When words exclusive to a candidate are considered, Obama's more frequent use of verbs and much more frequent use of adjectives and adverbs, compared to McCain, suggests that he is more of a fluid and contextual thinker who, unlike McCain whose language metrics suggest a categorical approach, does not seek to fit issues into pre-existing categories. Obama's greater use of modifiers suggest an outlook that is more open to nuance and inter-relatedness of events and issues.
Analysis of the Biden/Palin debate suggests that speech of Vice-Presidential candidates is less complex and more repetitive than that of their Presidential counterparts, with Biden being the most repetitive speaker and Palin having the longest sentences, of all four debates.
McCain vs. Obama(1st debate) 26 Sep 2008
McCain vs. Obama (2nd debate) 7 Oct 2008
McCain vs. Obama (3rd debate) 15 Oct 2008
McCain vs. Obama (combined debate)
2020 Trump vs. Biden Debate word analysis
2016 Clinton vs. Trump Debate word analysis
2012 Obama vs. Romney Debate word analysis
1960 Nixon vs. Kennedy Debate word analysis
What Romney's and Obama's Body Language Says to Voters. Watch them cut, point and tilt-and-nod.
He counts your words (even those pronouns), an article in the NYT about Pennebaker's approach to analysis of debates and Al Qaeda communication
Lexical Analysis of Obama's and McCain's Speeches by Jacques Savoy
Presidential word use in State of the Union addresses by Jonathan Corum.
Naming Names, a NYT article about candidates' reference to each other during debates (uses Circos).
If you want more, get more. The debate continues endlessly with Tripsum: Trump Lorem Ipsum—randomly generated text based transcripts from the 2016 Clinton vs Trump debates.
The analysis presented here explores word usage in the 2008 US Presidential and Vice-Presidential debates. The purpose is to explore the structure of speech, as characterized by the use of nouns, verbs, adjectives and adverbs, and noun phrases. The speech patterns of opposing candidates are compared in an effort to identify characteristic value and personality traits.
Specifically, I examine the debate for the following
The analysis reveals extremely surprising results, or at least what I believe to be surprising results. With three debates behind them, the verbal contest between Obama and McCain, while starting out relatively even, can be seen to tip very strongly in Obama's favour with respect to speech complexity and articulation.
A formal debate — and in this case three for a pair of candidates — serves as a great text for this kind of analysis. The debate format is highly controlled: each speaker is subjected to the same stimulus (question) and is given the same amount of time to respond. Debates therefore eliminate some of the variation that would appear in analysis of interviews and other unscripted speech, in which questions and topics may vary across samples.
For some cases, where the analysis focuses on proportions of parts of speech, collecting a variety of inputs from a speaker is helpful. However, when debates are analysed, we can extend the investigation to include examining words used exclusively by one speaker and not the other — these are extremely informative because they reflect how a speaker chose to address the issue.
The transcript for each debate is parsed to (a) identify the speaker, (b) remove stop words (words such as "do", "and" and "it") are removed, (c) tag non-stop words with their part of speech (this is called tagging), and (d) identify noun phrases (this is called chunking).
The tagged and chunked transcripts are analyzed to determine
I attempt to quantify the overall complexity of speech by a novel metric called the Windbag Index. This value is product of 9 factors, each measuring uniqueness in different aspects of speech (more about Windbag Index).
A full description of each of the steps in the analysis is available in the detailed methods section. I enourage you to read this section - it's not very technical - to become familiar with the approach and to gain greater versatility in interpreting the results. This works is also not without its share of limitations.
Detailed results and comments are available for each debate. The first Obama vs McCain debate has more in-depth analysis, since it is the first debate that I analyzed.
Results for Barack Obama vs. John McCain (1st debate)
Results for Barack Obama vs. John McCain (2nd debate)
Results for Barack Obama vs. John McCain (3nd debate)
Results for Barack Obama vs. John McCain (combined debates)
Results for Joe Biden vs. Sarah Palin
Each debate analysis report contains a great deal of data. To start, you may find these elements the most interesting
And, yes, Biden is a windbag. His speech has a Windbag Index of 606, highest of all candidates!
Nouns unique to a speaker were those that were mentioned by one speaker, but not the other. Nouns, and other parts of speech, were identified in the transcripts using the Brill tagger (example tagged text - Obama portion, 1st debate). For example, Obama said biodiesel, children, education, education, medicare, perspective and science. McCain did not. McCain, on the other hand, said aggression, greed, failure, invasion, maverick and stubborness. Obama did not.
Download high-resolution wordles: Obama noun/adjective pairs unique nouns McCain noun/adjective pairs unique nouns
speaker | statistics | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
vocabulary size |
non-stop word fraction |
unique word fraction |
avg word frequency |
avg sentence size |
||||||
Barack Obama |
|
|
|
|
|
|||||
John McCain |
|
|
|
|
|
|||||
Joe Biden |
|
|
|
|
|
|||||
Sarah Palin |
|
|
|
|
|
speaker | statistic | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
POS ratio | POS unique fraction |
average POS frequency |
|||||||||||||||||||||
Barack Obama |
|
|
|
||||||||||||||||||||
John McCain |
|
|
|
||||||||||||||||||||
Joe Biden |
|
|
|
||||||||||||||||||||
Sarah Palin |
|
|
|
speaker | Windbag Index |
---|---|
Barack Obama |
422, 405, 457 (avg 428) |
John McCain |
368, 352, 505 (avg 408) |
Joe Biden |
606 |
Sarah Palin |
535 |
Words in the tag clouds below are colored by part of speech: noun verb adjective adverb
The analysis presents a great deal of data, but from it two central themes arise.
First, speech patterns between candidates (especially those paired in a debate) are extremely similar.
Second, the complexity of vice-presidential candidates is lower than presidential candidates — uniqueness is lower, repetition is higher.
The first theme quickly became evident after analyzing the first debate. The speech pattern of Obama and McCain conformed to nearly identical word usage patterns. For example, vocabulary size for Obama and McCain (number of unique non-stop words used) is identical at 1,243. Their non-stop word fraction is also nearly identical at 43.4% and 44.3% for Obama and McCain, respectively. Likewise, the difference in their unique word fraction and average word frequency is only +4.3% and +4.8%, respectively.
The reason for such conformity is anyone's guess, but several factors come to mind. First, the word usage profile could be a direct product of political selection. The fact that these debaters were drawn from a political and therefore have had to function within a verbally demanding environment, where nimbleness is perhaps rewarded over precision, may speak to the similarity in their delivery. Their political contemporaries and the public may both have a finely tuned ear, though perhaps not the same one, to what is considered effective speech by a successful candidate.
Another factor, and one that I suspect is in play at all times, is the degree of premedidated wordsmithing in the preparation for these debates. I do not doubt that each of the candidates' preparation went well beyond casual enumeration of talking points. Certainly to consistently win the hearts and minds (and ears) of their audience, each debater must have give significant consideration to not just what was said, but how. I would not be surprised if candidates memorized what they considered to be particularly effective phrases for delivering content and trenchant retorts for contrasting their opponents. It is also likely that somewhere in the bowels of the political arena are linguistic specialists who have profiled precise (or as precise as can be measured) comprehension and literacy levels of the population, broken down by region and demographic.
Frequency analysis of the speech of Biden and Palin indicates a lower overall complexity - smaller vocabulary size and higher degree of repetition - than in the speech of Obama and McCain. Presumably these Vice-Presidential hopefuls want to come across as sufficiently articulate and effective to be compelling, but not so much as to steal the limelight from their running mates.
The largest difference in complexity is between Biden and McCain, whose averge word frequency was 2.96 and 2.51, respectively. This, and other metrics that measure Biden's speech, earn him many of his nicknames that suggest him to be verbose but not articulate. And, although McCain's complexity drops significantly in the third debate, in which his verb/noun pairings suggest that he spends more time attacking than expounding his own plans, McCain is the least repetitive of all candidates.
Palin had the longest sentences, with an average length of 8.46 non-stop words. With nearly 1.5 words more than McCain, her sentences were the only ones that broke the 8 non-stop word barrier. This is a significant finding, especially in light of the fact that she had significantly smaller vocabulary than McCain and Obama.
The relative ratio of each part of speech is extremely similar to all candidates: nouns compose 53–57% of speech, verbs 25–26%, adjectives 12–15% and adverbs 5–7%. The greatest fluctuation in usage, and in the unique component, was adverbs.
Given that adverbs are the least used part of speech of the four examined, they serve as a natural unit. When compared to adverb use, adjectives are consistently only 2.2–2.6 times as frequent as adverbs. This strongly suggests the speakers' desire to qualify things much more than actions. Verb use is about 4.8 times as frequent as adverbs, which suggests that only 1 verb in 5 gets a modifier. This brings to mind the notion that politicians make promises by saying what they will do, but fail to deliver clarity that would explain how it will be achieved. Obama, however, has a lower verb-to-adverb frequency, 3.7, suggesting that he might be one to more frequently characterize actions, by either defining limits or strengthening the verb. Obama had the lowest noun-to-adverb ratio, 7.7, compared to 9.7 for McCain, 9.8 for Palin and 10.9 for Biden. This suggests that Obama's delivery was focused more on action and movement rather than static concepts.
Adverb's are not Obama's only strength. Consistently a greater part of his delivery is composed of more verbs, adjectives and adverbs than McCain (see table). If we look at words that are specific to a candidate, Obama's ratio of nouns:verbs:adjectives:adverbs in this word group is 48:30:21:7, whereas the values for McCain are 59:27:16:3.
Previous work by Pennebaker drew firm conclusions that Obama is a contextual thinker — one who uses more verbs and modifiers — who sees the world as having loosely defined boundaries between concepts. Contextual thinkers like to use adjectives and adverbs to loosen otherwise narrowly defined words (or those perceived as narrow) in an effort to express exceptions and nuance. McCain, on the other hand, has been characterized as a categorical thinker — one who heavily uses nouns.
Even when all words by a candidate are considered, not just the ones only attributable to them, the proportion in part of speech discrepancy is strong. Obama's ratio is 51:27:16:7 and McCain's is 56:26:13:5. Adjectives and adverbs make up 22.4% of Obama's parts of speech whereas for McCain this fraction is only 18.3%. The difference in verb use is relatively small, but adjective and adverb usage is significantly different. Presumably when McCain uses a verb he sees a narrow definition. Obama, on the other hand, is more likely to add dimension to its meaning with an adverb.
The analysis of all three debates by Obama and McCain reveals distinguishing elements of their speech. It is not a large stretch to imply that these relate directly to their personality, outlook and their policies.
Extremely informative are the word clouds of nouns and verbs, by speaker.
When frequencies of nouns unique to McCain are tallied, the top word is "Obama" which he used 111 times, with other top nouns being Iranians (8), greed (7), marines (6), institutions (6) and aggression (6). Since McCain actually used both "John" and "McCain", these words do not overwhelm nouns unique to Obama, who used words like science (5), consequence (6), notion (9), approach (6), and focus (7). I am partial to Obama's nouns than McCains - they indicate openness to nuance (e.g. "notion") and recognition of the complexity in issues (e.g. "approach" and "consequence").
Relative use of McCain's unique verbs is greater than for Obama's set, and thus McCain's verbs outweigh Obama's in the unique verb word cloud. McCain uses oppose/opposes (8), secure (4), legitimize (4), realize (5) and watch (5). Obama has agree (15), invest (14), recognize (8), focused (6) and thinking (6). McCain stands out with strong and aggressive verbs, and repeats them to the same extent. Frequencies of Obama's unique contribution is more greatly skewed, with verbs like "agree" and "invest" being used nearly twice as frequently as other unique verbs.
Unique word frequency statistics are extremely revealing about the manner in which the candidates chose to distinguish themselves. McCain's distinguishing nouns and verbs are more balanced with a focus on threats, military and unilateral action. He choses to spread his unique contribution evenly across these topics. Obama, on the other hand, has more focused use on his top verb contributions.
The vocabulary size for each part of speech is remarkably similar for every candidate. The number of unique nouns, verbs, adjectives and adverbs ranged within 640-663, 359-377, 163-213 and 61-73, respectively. The largest difference was for adjectives, with Obama having the largest adjective vocabulary (213) and Biden the lowest (163).
In an effort to provide a single number that quantifies repetition in speech, I created the Windbag Index (details), which is a composite of measures of repetition in various aspects of speech.
The Windbag Index successfully captures the essence of the large number of individual metrics presented by this analysis. It can be seen that Obama and McCain cluster together, with a Windbag Index difference of 4.9%. Likewise Biden and Palin cluster together, with the difference between them being 13.3%. More importantly, the vice-presidential candidates are separated from the presidential candidates by a very large relative margin.
McCain's loss of verbal agility in the last debate stands out clearly in this graphic - his Windbag Index rose from a previous high of 368 to 505. Obama's index was highest in the third debate as well but his values had much lower variation suggesting better pacing and more consistent performance.
Content of word list archive and data structure syntax is described in the methods section.
Barack Obama vs. John McCain (1st debate) transcript word lists tag clouds data structure
Barack Obama vs. John McCain (2nd debate) transcript word lists tag clouds data structure
Barack Obama vs. John McCain (3nd debate) transcript word lists tag clouds data structure
Barack Obama vs. John McCain (combined debates) transcript word lists tag clouds data structure
Joe Biden vs. Sarah Palin transcript word lists tag clouds data structure