Analysis of tweet-speak reveals regional dialects
Could a 140-character tweet reveal something about a tweeter’s geographic origin? That’s exactly what researchers at Carnegie-Mellon University have shown in an analysis of 9,500 geo-tagged tweeters and their week’s worth of 380,000 messages on Twitter.
There are, after all, local dialects spoken all across the country. Why would communication on Twitter be any different?
In fact, it’s not so different, according to these researchers. For example, there are region-specific twitter-words. Take the word “cool.” In northern California, a tweeter is most likely to write “koo,” while in southern California, it’s spelled “coo.” The word “something” is tweeted as “sumthin” in many cities, but in New York City, the term “suttin” is preferred. If you’re tweeting about being “‘very’ tired,” folks in northern California may prefer to say “‘hella’ tired,” New Yorkers would complain about being “‘deadass’ tired”, and people in Los Angeles might prefer to use a a profanity not suitable for publication in a family-oriented website.
How did they get these results? The Carnegie-Mellon team analyzed their database of 380,000 tweets, using computer programs to identify regional variation in word use and topics. In the end, they were able to determine the location of a tweeter in the continental United States to within about 300 miles.
Tweets are informal conversations that often use their own jargon of abbreviated words and symbols – twitter-speak – to fit as much as possible in a single tweet’s 140-character limit. (If you’re new to twitter, a visit to the twitter dictionary might clear up some of the mysteries of tweeting.)
However, some twitter-speak vocabulary and word spelling is regional, very much like regional dialects. There are the most obvious ones such as “y’all” in the south, and the use of the words pop, soda, and coke.
These researchers say that tweets among friend within local social networks provide an opportunity for them to continue learning about twitter language dialects, and, over time, to track the evolution of these regional variants of twitter-speak.
Jacob Eisenstein, one of the study authors, explained to The Tartan, Carnegie-Mellon’s student newspaper,
Here is a word that only occurs in New York or only in Pennsylvania, and you have to ask yourself, is this a stable variation…? When we look at this a year from now, is it still going to be the case that we only see it in this part of the country, or will is spread to the whole country, or will it completely disappear off the map?
And there you have it. The Carnegie-Mellon researchers show that, yes, a 140-character tweet might indeed reveal a tweeter’s geographic origin. The issue now – assuming these scientists will try to follow this line of research – might be whether science can keep up with the dynamic and changeable forces at work in social media.