home > results and commentary > Obama vs Romney (combined)

Word Analysis of 2012 U.S. Presidential Debates

Barack Obama vs Mitt Romney (combined debates)



Introduction

Word Statistics

Debate Word Count

Summary Word Count

The summary word count reports the total number of words and the number of unique, non-stop words used by each candidate. Word number is expressed as both absolute and relative values.

Table 1a
all words
Number of all words and unique words used by each speaker.
set word count
Barack Obama
22,029 2,372
47.8% 10.8%
196572372
Mitt Romney
24,024 2,349
52.2% 9.8%
216752349
total
46,053 3,429
100.0% 7.4%
426243429

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 1b
exclusive and shared words
Words exclusive to speaker (e.g. speaker A but not speaker B) and shared by speakers (speaker A and B).
set word count
Barack Obama
1,790 1,080
8.1% 60.3%
7101080
Mitt Romney
1,932 1,057
8.0% 54.7%
8751057
both candidates
42,331 1,292
91.9% 3.1%
410391292

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 1
commentary
Table 1
legend
a c
b d
3010

a :: word count

b :: word count, as fraction in total in debate

c :: unique words in (a)

d :: unique words in (a), as fraction in (a) bar :: proportion of (a-c):c

Stop Word Contribution

In the table below, the candidates' delivery is partitioned into stop and non-stop words. Stop words (full list) are frequently-used bridging words (e.g. pronouns and conjunctions) whose meaning depends entirely on context. The fraction of words that are stop words is one measure of the complexity of speech.

Table 2a
non-stop words
Counts of stop and non-stop words.
speaker all stop non-stop
Barack Obama
22,029 2,372
100.0% 10.8%
196572372
12,414 160
56.4% 1.3%
12254160
9,615 2,212
43.6% 23.0%
74032212
Mitt Romney
24,024 2,349
100.0% 9.8%
216752349
13,530 161
56.3% 1.2%
13369161
10,494 2,188
43.7% 20.9%
83062188
total
46,053 3,429
100.0% 7.4%
426243429
25,944 166
56.3% 0.6%
25778166
20,109 3,263
43.7% 16.2%
168463263

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 2b
exclusive and shared non-stop words
Non-stop words exclusive to speaker (e.g. speaker A but not speaker B) and shared by speakers (speaker A and B).
set word count
Barack Obama
1,779 1,075
18.5% 60.4%
7041075
Mitt Romney
1,908 1,051
18.2% 55.1%
8571051
both candidates
16,422 1,137
81.7% 6.9%
152851137

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 2
commentary
Table 2
legend
a c
b d
3010

a :: total number of words, for a given category (all, stop, non-stop)

b :: (a) relative to words in the debate if category=all, otherwise relative to words by the candidate

c :: number of unique words with set (a)

d :: (c) relative to (a)

bar :: proportion of (a-c):c

All further word use statistics represent content that has been filtered for stop words.

Word frequency

The word frequency table summarizes the frequency with which words were used. I show the average word frequency and the weighted cumulative frequencies at 50 and 90 percentile. The average word frequency indicates how many times, on average, a word is used. For a given fraction of the entire delivery, the weighted cumulative frequency indicates the largest word frequency within this fraction (details about weighted cumulative distribution).

Table 3a
word use frequency
Average and 50%/90% percentile word frequencies.
speaker word frequency
all stop non-stop
Barack Obama
9.3 63 820
9.28763.000820.000
77.6 231 846
77.588231.000846.000
4.3 11 61
4.34711.00061.000
Mitt Romney
10.2 69 808
10.22769.000808.000
84.0 294 1,002
84.037294.0001002.000
4.8 13 67
4.79613.00067.000
total
13.4 131 1,530
13.430131.0001530.000
156.3 487 1,822
156.289487.0001822.000
6.2 20 129
6.16320.000129.000

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 3b
exclusive and shared non-stop word use frequency
Average and 50%/90% cumulative percentile word frequencies. Non-stop words exclusive to speaker (e.g. speaker A but not speaker B) and shared by speakers (speaker A and B).
set word frequency
Barack Obama
1.66 2 7
1.6552.0007.000
Mitt Romney
1.81 2 7
1.8152.0007.000
total
6.16 20 129
6.16320.000129.000

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 3
commentary
Table 3
legend
a b c
51025

a :: average word frequency

b :: largest word frequency in 50% of content

c :: largest word frequency in 90% of content

bar :: proportion of a:b:c

Sentence Size

Table 4
sentence size
Number of sentences spoken by each speaker and sentence word count statistics. Number of words in a sentence is shown by average and 50%/90% cumulative values for all, stop and non-stop words.
speaker number of sentences sentence size
all stop non-stop
Barack Obama
1,284
1284
17.2 24 49
17.15724.00049.000
9.9 14 28
9.93114.00028.000
7.6 11 22
7.62511.00022.000
Mitt Romney
1,720
1720
14.0 19 39
13.96719.00039.000
8.1 11 24
8.08711.00024.000
6.2 8 18
6.1958.00018.000
total
3,004
3004
17.3 22 45
17.33122.00045.000
10.9 13 27
10.87613.00027.000
8.8 10 21
8.80510.00021.000

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 4
commentary
Table 4
legend
a b c
51025

a :: average sentence size

b :: largest sentence size for 50% of content

c :: largest sentence size for 90% of content

bar :: proportion of a:b:c

Part of Speech Analysis

In this section, word frequency is broken down by their part of speech (POS). The four POS groups examined are nouns, verbs, adjectives and adverbs. Conjunctions and prepositions are not considered. The first category (n+v+adj+adv) is composed of all four POS groups.

Part of Speech Count

Table 5
part of speech count
Count of words categorized by part of speech (POS).
part of speech
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Barack Obama
8,869 2,117
40.3% 23.9%
317511351765744104955135298
4,310 1,135
48.6% 26.3%
31751135
2,509 744
28.3% 29.7%
1765744
1,600 551
18.0% 34.4%
1049551
450 98
5.1% 21.8%
35298
Mitt Romney
9,499 2,110
39.5% 22.2%
352311611896705119459233197
4,684 1,161
49.3% 24.8%
35231161
2,601 705
27.4% 27.1%
1896705
1,786 592
18.8% 33.1%
1194592
428 97
4.5% 22.7%
33197
total
18,368 3,149
39.9% 17.1%
72561738399411162499887732146
8,994 1,738
49.0% 19.3%
72561738
5,110 1,116
27.8% 21.8%
39941116
3,386 887
18.4% 26.2%
2499887
878 146
4.8% 16.6%
732146

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 5
commentary
Table 5
legend
a c
b d
1535

a :: total number of words for a given POS (all, noun, verb, adjective, adverb, pronoun)

b :: (a) relative to all words by candidate

c :: unique words in (a)

d :: (c) relative to (a)

bar :: proportion of (a-c):c

Part of Speech Frequency

Table 5
part of speech frequency
Frequency of words categorized by part of speech (POS).
part of speech frequency
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Barack Obama
4.19 10 60
4.18910.00060.000
3.80 9 43
3.7979.00043.000
3.37 7 60
3.3727.00060.000
2.90 5 33
2.9045.00033.000
4.59 13 78
4.59213.00078.000
Mitt Romney
4.50 11 58
4.50211.00058.000
4.03 10 73
4.03410.00073.000
3.69 8 75
3.6898.00075.000
3.02 5 30
3.0175.00030.000
4.41 9 67
4.4129.00067.000
total
5.83 18 110
5.83318.000110.000
5.17 15 91
5.17515.00091.000
4.58 13 127
4.57913.000127.000
3.82 9 64
3.8179.00064.000
6.01 17 145
6.01417.000145.000

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 5
commentary
Table 5
legend
a b c
51025

a :: average word frequency

b :: largest word frequency in 50% of content

c :: largest word frequency in 90% of content

bar :: proportion of a:b:c

Part of Speech Pairing

Through word pairing, I extract concepts from the text. The number of unique word pairs is a function of sentence length and is one of the measures of complexity.

Table 6a
part of speech pairing — Barack Obama
Word pairs (total and unique) categorized by part of speech (POS)
part of speech pairings - Barack Obama
noun verb adjective adverb
noun
11,267 9,014
  80.0%
22539014
verb
13,671 11,481
  84.0%
219011481
3,641 3,166
  87.0%
4753166
adjective
7,652 6,346
  82.9%
13066346
4,482 3,760
  83.9%
7223760
1,234 1,035
  83.9%
1991035
adverb
2,336 1,991
  85.2%
3451991
1,358 1,189
  87.6%
1691189
730 646
  88.5%
84646
123 105
  85.4%
18105

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 6b
part of speech pairing — Mitt Romney
Word pairs (total and unique) categorized by part of speech (POS)
part of speech pairings - Mitt Romney
noun verb adjective adverb
noun
10,401 8,053
  77.4%
23488053
verb
11,086 9,155
  82.6%
19319155
2,796 2,315
  82.8%
4812315
adjective
6,636 5,413
  81.6%
12235413
3,488 2,951
  84.6%
5372951
1,043 890
  85.3%
153890
adverb
1,710 1,464
  85.6%
2461464
972 830
  85.4%
142830
547 495
  90.5%
52495
99 82
  82.8%
1782

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 6c
unique part of speech pairing — candidate comparison
Unique word pairs categorized by part of speech (POS)
unique part of speech pairings
noun (n) verb (v) adjective (adj) adverb (adv)
noun
9,014 8,053
  89.3%
9014
8053
verb
11,481 9,155
  79.7%
11481
9155
3,166 2,315
  73.1%
3166
2315
adjective
6,346 5,413
  85.3%
6346
5413
3,760 2,951
  78.5%
3760
2951
1,035 890
  86.0%
1035
890
adverb
1,991 1,464
  73.5%
1991
1464
1,189 830
  69.8%
1189
830
646 495
  76.6%
646
495
105 82
  78.1%
105
82

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 6
commentary
Table 6 a,b
legend
a c
  d
3010

a :: total number of pairs, for a given category (e.g. verb/noun)

c :: number of unique pairs within set (a)

d :: (c) relative to (a)

bar :: proportion of (a-c):c

Table 6c
legend
a c
  d
50
45

a :: unique pairs for Barack Obama

c :: unique pairs for Mitt Romney

d :: (c) relative to (a) (i.e. Mitt Romney relative to Barack Obama)

bars :: (a) and (c)

Exclusive and Shared Usage

This section enumerates words that were exclusive to a candidate (e.g. used by one candidate but not the other). This content provides insight into what the candidates' priorities are and reveals differences in perspective on similar topics.

For a given part of speech, the table breaks down the number of words that were spoken by only one of the candidates or both candidates (intersection). The last row includes words spoken by either candidate (union).

Table 7
exclusive word usage
Total and unique words used exclusively by a candidate, or by both.
part of speech
n+v+adj+adv nouns (n) verbs (v) adjectives (adj) adverbs (adv)
Barack Obama
1,761 1,039
100.0% 59.0%
9.6% 33.0%
7221039
351515195347692182244
866 515
49.2% 59.5%
9.6% 29.6%
351515
351515
542 347
30.8% 64.0%
10.6% 31.1%
195347
195347
287 218
16.3% 76.0%
8.5% 24.6%
69218
69218
66 44
3.7% 66.7%
7.5% 30.1%
2244
2244
Mitt Romney
1,851 1,032
100.0% 55.8%
10.1% 32.8%
8191032
3365352013091482492944
871 535
47.1% 61.4%
9.7% 30.8%
336535
336535
510 309
27.6% 60.6%
10.0% 27.7%
201309
201309
397 249
21.4% 62.7%
11.7% 28.1%
148249
148249
73 44
3.9% 60.3%
8.3% 30.1%
2944
2944
both candidates
14,756 1,078
100.0% 7.3%
80.3% 34.2%
136781078
64835583546333217925667549
7,041 558
47.7% 7.9%
78.3% 32.1%
6483558
6483558
3,879 333
26.3% 8.6%
75.9% 29.8%
3546333
3546333
2,435 256
16.5% 10.5%
71.9% 28.9%
2179256
2179256
724 49
4.9% 6.8%
82.5% 33.6%
67549
67549
total
18,368 3,149
100.0% 17.1%
100.0% 100.0%
152193149
72561738399411162499887732146
8,994 1,738
49.0% 19.3%
100.0% 100.0%
72561738
72561738
5,110 1,116
27.8% 21.8%
100.0% 100.0%
39941116
39941116
3,386 887
18.4% 26.2%
100.0% 100.0%
2499887
2499887
878 146
4.8% 16.6%
100.0% 100.0%
732146
732146

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 7
commentary
Table 7c
legend
a d
b e
c f
4030
40302015105

a :: total number of words in set (e.g. obama \ romney, obama ∩ romney, obama ∪ romney , for a given part of speech

b :: (a) relative to all exclusive words in n+v+adj+adv

c :: (a) relative to all words in n+v+adj+adv

d :: unique words in (a)

e :: (d) relative to (a)

f :: (d) relative to all unique words in n+v+adj+adv

bar1 :: normalized ratio of (a-d):d

bar2 :: absolute ratio of (a-d):d for all POS groups (first column) or POS group (other columns)

Noun Phrase Usage

Noun phrases were extracted from the text and analyzed for frequency, word count, unique word count and richness. Single-word phrases were not counted.

Top-level noun phrases are those without a parent noun phrase (a parent phrase is one that a similar, longer phrase). Derived noun phrases are those with a parent (more details about noun phrase analysis).

The top-level noun phrases can be interpreted as independent concepts. Derived noun phrases can be interpreted as variants on concepts embodied by the top-level phrases.

Noun Phrase Count and length

This table reports the absolute number of noun phrases, which is related to the number of nouns, and their length.

Table 8a
noun phrase count
Counts of noun phrases in words and per noun.
speaker noun phrase count
all top-level
Barack Obama
529 240
100.0% 45.4%
0.12 0.21
289240
433 234
81.9% 54.0%
0.10 0.21
199234
Mitt Romney
549 241
100.0% 43.9%
0.12 0.21
308241
439 231
80.0% 52.6%
0.09 0.20
208231

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 8b
noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
speaker noun phrase length
all top-level
Barack Obama
2.31 2 3
2.3122.0003.000
2.36 2 4
2.3602.0004.000
Mitt Romney
2.29 2 3
2.2952.0003.000
2.35 2 4
2.3462.0004.000

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 8
commentary
Table 8a
legend
a d
b e
c f
1070

a :: number of noun phrases

b :: (a) relative to number of all noun phrases

c :: number of noun phrases per noun

d :: number of unique phrases

e :: (c) relative to (a)

f :: number of unique noun phrases per unique noun

bar :: normalized ratio of (a-c):c

Table 8b
legend
a b c
102080

a :: average noun phrase size, in words

b :: largest noun phrase size in 50% of content

c :: largest noun phrase size in 90% of content

bar :: proportion of a:b:c


Exclusive and Shared Noun Phrase Count and length

Table 9a
exclusive and shared noun phrase count
Counts of exclusive and shared noun phrases in words and per noun.
speaker noun phrase count
all top-level
Barack Obama
448 225
41.6% 50.2%
223225
399 226
89.1% 56.6%
173226
Mitt Romney
458 231
42.5% 50.4%
227231
399 224
87.1% 56.1%
175224
both candidates
172 31
16.0% 18.0%
14131
74 18
43.0% 24.3%
5618

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 9b
exclusive and shared noun phrase length
Average and 50%/90% cumulative length of noun phrases, in words.
speaker noun phrase length
all top-level
Barack Obama
2.35 2 4
2.3552.0004.000
2.38 2 4
2.3762.0004.000
Mitt Romney
2.34 2 4
2.3382.0004.000
2.36 2 4
2.3632.0004.000
both candidates
2.08 2 3
2.0762.0003.000
2.18 2 3
2.1762.0003.000

Fields with (e.g. 155) link to data files. Hover over the field to show these links.

Table 9
commentary
Table 9a
legend
a c
b d
1070

a :: number of noun phrases

b :: (a) relative to number of all noun phrases

c :: number of unique phrases

d :: (c) relative to (a)

bar :: normalized ratio of (a-c):c

Table 9b
legend
a b c
102080

a :: average noun phrase size, in words

b :: largest noun phrase size in 50% of content

c :: largest noun phrase size in 90% of content

bar :: proportion of a:b:c


Windbag Index

The Windbag Index is a compound measure that characterizes the complexity of speech. A low index is indicative of succinct speech with low degree of repetition and large number of independent concepts.

Table 10
windbag index
Windbag Index for each speaker. The higher the value, the more repetitive the speech.
speaker Windbag Index
index value index terms
Barack Obama
3,844
-25.6%
3844.26600544004
0.436 0.230 0.263 0.297 0.344 0.218 0.454 0.975
-0.1% +10.3% +6.2% +9.4% +3.9% -3.9% +3.4% +1.7%
0.4364701075854560.2300572022880920.2633410672853830.296532483060980.3443750.2177777777777780.4536862003780720.975
Mitt Romney
5,170
+34.5%
5170.42271075939
0.437 0.209 0.248 0.271 0.331 0.227 0.439 0.959
+0.1% -9.4% -5.9% -8.6% -3.7% +4.1% -3.2% -1.7%
0.4368131868131870.2085000952925480.2478650725875320.2710495963091120.3314669652855540.2266355140186920.4389799635701280.95850622406639
Table 10
commentary
Table 10
legend
The Windbag Index is 1/(t1*t2*...*t9) where t1,t2,...,t8 are

t1 :: fraction of words which are non-stop

t2 :: fraction of non-stop words which are unique

t3 :: fraction of nouns which are unique

t4 :: fraction of verbs which are unique

t5 :: fraction of adjectives which are unique

t6 :: fraction of adverbs which are unique

t7 :: fraction of noun phrases which are unique

t8 :: fraction of noun phrases which are top-level


Large individual terms t1...t9 contribute to a smaller index.

The percentage values below the index and each term are relative differences to the other speaker's corresponding term (i.e. 100*(a-b)/b where a is the value for one speaker and b for the other).

Word Clouds

In the word clouds below, the size of the word is proportional to the number of times it was used by a candidate (method details).

Not all words from a group used to draw the cloud fit in the image — less frequently used words for large word groups may fall outside the image.

All Words for Each Candidate

Each candidate's debate portion was extracted and frequencies were compiled for each part of speech (noun, verb, adjective, adverb), with words colored by their part of speech category.

The distribution of sizes within a tag cloud follows the frequency distribution of words. However, word size cannot be compared between clouds, since the minimum and maximum size of the words is fixed.

Debate Word Cloud for Barack Obama - all words

Debate tag cloud for Barack Obama

Debate Word Cloud for Mitt Romney - all words

Debate tag cloud for Mitt Romney
commentary

Exclusive Words for Each Candidate

The clouds below show words used exlusively by a candidate. For example, if candidate A used the word "invest" (any number of times), but candidate B did not, then the word will appear in the exclusive word tag cloud for candidate A.

Words exclusive to Barack Obama

Debate tag cloud for Barack Obama

Words exclusive to Mitt Romney

Debate tag cloud for Mitt Romney
commentary

Part of Speech Word Clouds

In these clouds, words from each major part of speech were colored based on whether they were exclusive to a candidate or shared by the candidates.

The size of the word is relative to the frequency for the candidate — word sizes between candidates should not be used to indicate difference in absolute frequency.

Cloud of noun words, by speaker

commentary

Cloud of verb words, by speaker

commentary

Cloud of adjective words, by speaker

commentary

Cloud of adverb words, by speaker

commentary

Cloud of all words, by speaker

commentary

Word Pair Clouds for Each Candidate

word pairs for Barack Obama

adjective/adjective by Barack Obama
adjective/adverb by Barack Obama
adjective/noun by Barack Obama
adjective/verb by Barack Obama
adverb/adverb by Barack Obama
adverb/noun by Barack Obama
adverb/verb by Barack Obama
noun/noun by Barack Obama
noun/verb by Barack Obama
verb/verb by Barack Obama

word pairs for Mitt Romney

adjective/adjective by Mitt Romney
adjective/adverb by Mitt Romney
adjective/noun by Mitt Romney
adjective/verb by Mitt Romney
adverb/adverb by Mitt Romney
adverb/noun by Mitt Romney
adverb/verb by Mitt Romney
noun/noun by Mitt Romney
noun/verb by Mitt Romney
verb/verb by Mitt Romney
commentary

Downloads

Debate transcript

Parsed word lists (word lists, part of speech lists, noun phrases, sentences)

Word clouds

Raw data structure

Please see the methods section for details about these files.