How FIREy is Your City?
As I’ve mentioned several times in previous posts, I’m currently on a mini-retirement. While the temptation to be idle is there, I’ve instead opted to use my time to learn new skills. I’ve taught myself to code with the Python, R, and SQL languages. All that and I still can’t get my site to look exactly how I want it. Oh well.
Playing around with these new skills is fun and expands the potential of things you can figure out. I asked myself “what do America’s biggest cities think about money”? Yes, yes, I know, America is technically North and South America, but I’m specifically referring to ‘MURICA.
Is your home city all about personal finance and early retirement? Read on to find out!
If you want to get to the juicy bits, skip down to the colorful graphs.
Back in the good old days, you could only get thousands of answers to a specific question by conducting surveys. Now, it’s much easier. Today’s folks voluntarily put their opinions onto social media 24/7. So, instead of using a survey, I used Twitter. I scraped tweets from areas within a 150-mile radius of these cities.
The Top 10 Cities in the United States by Population
The top 10 cities in the United States by population, are as follow:
- New York City
- Los Angeles
- San Antonio
- San Diego
- San Jose
Did you know that you can scrape thousands of tweets from Twitter? And that you can specify which city they come from and what they talk about? Specifically, I used R to scrape 5,210 tweets from Twitter.
I did this by opening a developer account on Twitter and then connecting to it with R. This has some limitations. For example, I can only pull 2,000 tweets per query. Additionally, Twitter just lets me scrape tweets as far back as 14 days ago. So, if a single tweet gets retweeted 10,000 times in a day in one city, I’ll only be able to pull that one tweet due to the 2,000 tweet limit. If few people tweeted about a subject in a town, I might just get 85 tweets for that whole 14 day period.
After pulling data from Twitter, then the data needs to be cleaned and analyzed for it to make sense.
Cleaning the Data
After scraping all the tweets, you then have to “clean” them, so they are easier to analyze in following steps. First, you extract the text. Then, you make sure that the tweets are written in the English alphabet (not Cyrillic, for example). After all that, you have to chuck out the “#” signs and delete any links.
Once you’ve done all this, your data should be relatively easy to analyze.
Sentiment analysis is a fascinating topic that I can’t delve into with great detail on this blog post. Wrong format for that.
Essentially, sentiment analysis takes dictionaries of “positive” or “negative” words and then scores each tweet according to how many of these words it has. Each positive word is a +1, and each negative word is a -1. Most words are neutral, or 0.
Sentiment Analysis Weaknesses
Sentiment analysis is not the end all, be all. It cannot pick up on sarcasm, for example. It also can’t accurately score grammatical errors like double negatives. That said, with a large enough sample size, you can be confident that you are correct about the general trends in sentiment.
Topics and Sample Size
I specifically searched within a 150-mile radius of America’s top 10 cities for the issues “personal finance” and “early retirement.”
Remember how I said I could pull 2,000 tweets per query? Since I searched ten cities for two topics, I should have an n of 40,000 (2 * 2,000 * 10). If I were analyzing common topics of discussion like “car” and “Donald Trump,” I would likely have a sample size close to 40k. However, dear reader…we’re still kinda fringe. We’re not quite mainstream yet. So, my n of 5,210 will have to do.
Data is meaningless without analysis. How do you make sense of 5,210 inputs?
The program Tableau is amazingly helpful for analyzing large data sets. I started tinkering around with Tableau two years ago because Excel sucks for making advanced graphs and charts.
You can buy Tableau or use a free online version. Since I’m frugal(ish), I went with the free version.
How can Tableau help you analyze data? Let’s unpack that next.
American Attitudes About Personal Finance and Early Retirement
Below, you can see the proportion of original tweets vs. duplicate tweets (copy pasted) vs. tweets that returned an error (null, in red). Unfortunately, null results are relatively common, so I took them out going forward. Click on the pictures to enlarge them.
Next, we have the ratio of tweets that were retweeted vs. not retweeted that were pulled up. As you can see, the majority of personal finance and early retirement tweets were not retweeted.
Below we have the breakdown of retweets by city and by subject (personal finance or early retirement). What stands out here is how crazy Philadelphia is with its personal finance retweets! What’s going on here?
I plotted the frequency of tweets by city and by topic below. Yes, I know, it’s a jumble.
You can easily see which city it is and the number of tweets with an embedded Tableau graph, but my new super speedy site clashes with embedded content so I have to paste in screenshots.
The tweets date back to May 12th and run through May 22nd. The two grey lines are the simple moving averages of all personal finance tweets (top) and early retirement tweets (bottom) in the top 10 cities during this time. Personal finance is apparently more popular than early retirement, which makes sense.
See those spikes? They could either be unrelated data (noise) or relevant data (signal). I looked into what those spikes were caused by and found the tweets responsible for them.
Time Series Graph
The orange spike in Philadelphia tweets is caused by this tweet below. It includes the words “personal” and “finance,” but is political and not related to the “personal finance” we care about. This is noise.
Who is G Herbo? I don’t know either. But he’s a nice dude because he let his girlfriend’s mother retire early. That’s a hard Mother’s Day gift to top! Is this relevant to early retirement? Yes, it’s signal.
We saw earlier that personal finance is more popular than early retirement. Early retirement is a fringe group within a broader swathe called personal finance. The graphic below illustrates this more elegantly than we saw previously.
Now we get into sentiment analysis. I excluded neutral tweets, which were most of the tweets. They were too vanilla to pick up any sentiment from. Those tweets that did register show that generally people are slightly positive about personal finance and early retirement.
Note that the prevalence of personal finance skews the data. You’d expect tweets about early retirement to be more than a little bit positive. There are also lots of negative tweets. I can tell you that most of those involved some variation of “take that [famous athlete], early retirement bitch!” Charming.
Now we have a breakdown of the sentiment polarity, by topic, per city. Blue is positive, orange is neutral, and red is negative. Chicago and San Diego appear to be more negative than normal about the topic of personal finance.
This, to me, most elegantly answers the question “how FIREy is your city?” The shading of the state areas is a sum of total retweets. Pennsylvania and California are retweet champs. The pie charts show the ratio of mentions of early retirement vs. personal finance.
The most FIREy cities are, in order:
- San Jose
- San Antonio
- New York
- Los Angeles (N/A)
- San Diego (N/A)
Bear in mind, the sample size in Dallas and Phoenix is tiny. To get a statistically significant level of accuracy, I’d need to run these scrapes and sentiment analysis daily 20 times (2 categories, ten cities).
I don’t have that kind of time, but I am curious, so I’m going to set up a script on IFTTT (If This, Then That) to do precisely this, automatically, every day.
With a significantly higher sample size spread out over a month or two, we should see even more meaningful results.
What was the point of all this? While it’s interesting to see this information, is it worth the many hours I put into it? By itself, no.
However, this ability is invaluable for marketing. Imagine being able to see the impact of an advertising campaign in real time in any city that has Twitter or other significant social media platforms. You can see which ad campaigns are useful and which aren’t. Which advertising team is outpacing the competition? Whether an expensive ad campaign has the impact you desired.
I’ve taught myself a new skill, and now I’m hustling online as a freelancer. While it took many many hours to do this exercise, I now have a framework that I can drop new queries into and get results in an hour or two. This is something that I can scale up. This is going to be one of my new side hustles. I’ll let you know how it goes!
Since I’ve been on mini-retirement, I’ve had more time than usual. Instead of wasting time watching Netflix from dawn to dusk, I’ve learned a new skill that will make me money. Vital money that I can keep coming in once I go back to work. Money that I can earn remotely when I take advantage of geoarbitrage to retire early. So, next time you’re curious about something and have some free time (hopefully you do), ask yourself if you can monetize what you learn.
Do you live in the most FIREy city? Is your city not listed here? Did anything in this data surprise you?