Donald Trump (@realDonaldTrump)is a prolific Twitter user who breaks every convention that I’ve seen in my study of politicians and government officials using the platform. You can read all his tweets on his Twitter profile, but they are also handily archived and searchable at http://www.trumptwitterarchive.com/
I grabbed a copy of everything he’s posted since he joined in May 2009 — a total of 30,123 tweets. This includes his retweets in addition to tweets he authored. This article is the first in a series that will analyze that data.
It’s Just Words
There are lots of fancy ways to analyze the words people use, but to start, I opted for a simple frequency count. I stripped out stop words — the really common words like “a”, “the”, and “of” that appear very frequently without a lot of meaning (I used the default and MySQL stopword lists at http://www.ranks.nl/stopwords). I also converted everything to lower case so “trump” and “Trump” would be counted as the same word. Then, I just counted how many times he used each word.
The most common word Tump tweets is his own username, followed by “Trump”. “Donald” is number 5. As I’ll discuss below, the vast majority of the uses of his username come from others’ tweets that Trump retweets. Here are the frequency counts from his most popular words in descending order.
I Bet You Think This Tweet Is About You
Trump mentions lots of people in his tweets. Similar to what I did above for words, I counted how often other accounts were @ mentioned by Trump. As I hinted at above, his own username was by far the most popular. Of the more than 8,400 mentions, only about 40 were made by Trump actually referring to himself. The rest come from others’ tweets that user Trump’s Twitter handle and that Trump then retweets (sometimes with his comments tacked on the end).
The most popular mentions are a mix of friends and enemies, with of Obama followed by Fox News and Trump’s reality show, The Apprentice.
The Co-Mention Network
I analyze social networks, and I was especially interested in what accounts Trump mentions together in the same tweet. I went through all the tweets and created links between all accounts mentioned together in the same tweet. I also kept count of how many times those users were mentioned together. I left @realDonaldTrump’s account out of this since it appears so often and would mangle what we can see.
From there, I created a social network. For the network geeks: the nodes (circles in the figure below) represent accounts. The edges (lines between circles) indicate that the accounts were mentioned in the same tweet. The thickness of the edges reflects how frequently the accounts were mentioned together, and the side of the node is its degree (i.e. how many other accounts are connected to the node).
If Accounts A and B are mentioned together in one tweet, and B and C appear together in another, we start to get a network where A links to B and B links to C.
The image below shows this structure for Trump’s tweets. Overall, there were 6,278 accounts with 18,528 links among them. I’ve filtered out any accounts not connected to this main group (say, Trump mentions you and me together but never again — we would be absent from this visualization since we’re just a lame pair of people floating outside the main group). That left 4,356 accounts and 15,944 edges. Then, I filtered out any pairs that were co-mentioned only once. That’s means in this picture, we are seeing connections between pairs of accounts that Trump mentions together more frequently. There are 633 accounts with 2,148 edges.
You can get a big, clear version of the picture here. The colors represent different “communities” that are automatically detected (geeks: done with Gephi’s modularity statistic option which uses the algorithm from Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre, Fast unfolding of communities in large networks, in Journal of Statistical Mechanics: Theory and Experiment 2008 (10), P1000).
The pinkish purple group to the top and right is related to The Apprentice, including contestants and related accounts. The orange group to the left of that is mostly media accounts and journalists. To the left of that, the teal group is accounts related to Trump properties. The large green group at the bottom connects Fox News and Megyn Kelly, along with various guests and commentators.
One other feature that’s a bit harder to see is an edge that’s so thick that it looks like a rectangle. It’s pinkish-grey and appears just about the big green @foxnews node. That connects Mitt Romney and Barack Obama’s accounts. Trump mentioned them together in 76 tweets — the most of any pair. The most common pairs after that are Fox News mentioned with Sean Hannity (42 times), NBC and @apprenticeNBC (38 times), and a triad connecting Ivanka Trump, Eric Trump, and Donald Trump Jr. (29 times). You can see this family triad as a strong bright pink triangle in the center of the network to the lower left of The Apprentice cluster.
There’s a lot to find in this network. This post is just the first of a series I’ll do analyzing these tweets. What we can see from these simple counts is that Trump is really into himself and that he has some distinct circles of interest in his tweets.
Future analyses will include comparisons with the Twitter accounts of others (including HRC and Obama), analysis of Trump’s personality traits using automated methods (including those I developed in my research), and a deeper linguistic analysis of his tweets. Stay tuned!
-Jen Golbeck (@jengolbeck)