Friday 12 September 2014

Twitter mines: at the digital coalface

Twitter mines: at the digital coalface

Introduction


In 1849, in a desperate attempt to stop local miners rushing to California to join the new gold rush, Dr M.F Stephenson, head of the local Mint, is famously misquoted as saying "There's gold in them thar hills" as he pointed to the surrounding peaks in the town of Dahlonega, Georgia.


In actual fact, he said "There's millions in it", and the real quote is as true today as it was over 150 years ago: if you find the right mine, there are millions, even billions to be made from it.


But in the 21st Century, the mines aren't dug in the ground, and the miners are more likely to have PhD's than a pickaxe and dirty fingernails. Today, we are mining data.


Big data, big mines


You would have to have been living in a literal hole in the ground not to have heard phrases like "big data" and "big analytics" being bandied around. It is a brave new frontier of technology where huge volumes of data are sifted to try to analyse every element of human behaviour, and if possible, to predict how people will react. And if you can predict the future, there is money to be made in it.


In fact, this idea of data mining is not as new as the big data companies would have you believe. For years, retailers have been trying to draw parallels between how different people buy goods to attempt to transfer the knowledge they have gained from one consumer, and apply it to another.


How long have you had a supermarket loyalty card? I've had one for almost 20 years. And as Tesco's chairman said after the first trial of the Clubcard loyalty scheme: "What scares me about this is that you know more about my customers after three months than I know after 30 years."


Beer and nappies


There is a similar story told in the US regarding data mining done by Wal-Mart using Teradata back in the early 90s. The system spotted a trend that between 5pm and 7pm on a Friday night, people were more likely to buy beer and "diapers" in one transaction. The story is often retold suggesting that the system was able to spot that these purchases were made by young men on their way home from work, which is unlikely, as the system only had access to sales transactions, not the demographics of the purchaser. However, using that trend and moving the two products closer together produced more sales: or at least that's how the story goes.


The point is, though, that sifting data for that elusive nugget of gold is not new. What is new, however, are two things. First, the volume of data has increased massively. Second, and in a sense the more exciting point, much is now publicly available.


Love or hate affair


Twitter evinces some pretty polarised opinions. Most users love it, being able to keep themselves up-to-date with trends in their own micro world, as well as in the real world. They can quickly communicate with others, which leads to rapid information sharing through retweeting. If you see a pall of smoke from your bedroom window, you have a lot more chance of finding out where it is coming from by looking on Twitter than you will have if you look on mainstream news sites.


Others, though, cannot see the point. That is partly generational, but also partly to do with demographics. You might draw an arbitrary age line through people who have grown up with computers and video consoles, and those who have not. But that changes depending on the type of area you work in. Journalists, novelists, comedians, musicians and technologists of all ages take to Twitter to tell the world what they are up to.


Interestingly, research done by the Pew Research Centre in 2014 shows that 19% of all online adults in the US use Twitter, but that usage drops off a cliff in the 50+ age range, whereas for social media sites as whole (i.e. Facebook), that age range is much better represented as a proportion of the whole.


World view


What is perhaps indisputable, though, is that it is possible to get a snapshot of the world by looking at Twitter, even if that world might be a little skewed in terms of age and language (over 50% of Tweets are in English).


That said, I'm not sure that just looking at the front page of Twitter to tell me what is trending is actually going to give me as much insight as I might want. Here's a selection of the trending topics worldwide as I write this:



  • Watching into the Storm

  • Carlos III

  • Amber Alert

  • Wild Life

  • Jeremy Kyle


There's no mention of Ebola, the conflict raging in Iraq, and other slightly more important topics than Mr Kyle. Does this tell us that Twitter is not very good at sifting its own information? Or that people have more in their mind day-to-day than discussing global politics in 140 characters?


In fact, it is a bit of both.


Hosed with data


If you are a data analyst, and are looking for a stream of data to work with, Twitter is difficult to beat. Twitter provides a stream of its tweets to subscribers that can be tailored for various needs. You can, for example, get tweets that are made by particular people, or you can track particular words, or you can get the "firehose" – every tweet that is ever made by anyone.


Actually, you used to be able to get the firehose, but it has been scaled back for most users to the "sample" stream, a random selection of tweets from the overall whole. But in volume terms, it is much more than the firehose was in, say 2007/8, and is plenty to be able to start to perform trend analysis.


At a basic level, what it means is that for the average business or user, it has never been easier to find out whether a marketing message is having any impact, and also whether your customers love you or hate you.


Try it. Go to http://www.twitter.com, and in the right-hand corner, type something meaningful into the search bar, your name, say, or the company you work for, or a brand that you are associated with.


Straight away, you will see a selection of tweets that relate to that search term (I now know more about people with my surname than I ever expected or wanted to know). These give a quick insight into what the world is saying and thinking. But wait for a few minutes longer. Assuming the search term is not incredibly esoteric, more tweets will start to appear quickly. (Try searching "Coca Cola" to get the immediate idea, which generates new results every five or so seconds).


That is a level of market research that even 10 years ago would have cost thousands, and you can do it for free. And of course, if you are really serious about it, a quick Google search for "Twitter Tracking Tools" will reveal dozens of companies who will help you assess your market impact on Twitter, and will advise you on how to use the medium to boost your business.


Deeper insights


However, let's go back to the question of how Twitter mines its data, and what deeper insights can be gained from that data. Back in 2008, I worked with one member of the Twitter development team and produced the first public trending tool, which went live as @secretbear on Twitter (and gained a total of exactly 15 followers!). It ran for a couple of years before I stopped updating it.


The biggest challenge is sorting the wheat from the chaff.


Picking the same time range as I did when I was looking at Twitter's worldwide trends above, my newly reactivated Twitter trending tool shows (amongst a few others):



  • Mompati Merafhe

  • Scotland Yard

  • Commonwealth Games

  • Australian Ethical Investment

  • Christina Perry


I feel a little more enlightened about the state of the world, but it goes to show that with gigabytes of data pouring through second by second, it can still be a little more like sticking a sewing needle in the ocean and hoping to spear a shark.


But the sheer volume of data is useful if you are looking for longer term trends.


Emotional analysis


One continuing area of research is around emotional analysis of the Twittersphere, i.e. can you draw conclusions about how the world is "feeling", and if so, can that be applied to other sets of data. And of course, can you make money from it?


In 2010, researchers at Indiana University studied almost 10 months of tweets, and subjected them to mood and sentiment analysis using a number of different tools. They then looked to see whether this could be correlated against changes in the Dow Jones on a daily basis. And in 87% of cases, the changes in mood over the day could be used to predict whether the market rose or fell at the end of the day.


On paper, this looks like the ultimate path to riches. Track Twitter, see if people are happy or sad, and then bet on whether the market will end up or down at the end of the day. Except of course, all of this analysis is done after the event. The researchers were able to place themselves in the past and say that had they known at the beginning of the day how people would be feeling over the course of the day, they could have predicted how the day would end. What they were not suggesting is that how people felt yesterday would affect how the stock market would perform the next day.


But there is light at the end of this particular tunnel, and it comes from an unusual place: Gamers.


Parallel power


Usually associated with darkness and acne cream, gamers have demanded one thing from their computers: speed. They need ultra-realistic graphics rendered smoothly for a fully immersive gaming experience. Traditional CPU technology is not fast enough to cope with this, and has one other major drawback – it processes each instruction one at a time, (i.e. it is a serial process). When you need all 24 million+ pixels to appear on your screen at roughly the same time, plotting their movement one instruction at a time gives you lag. And lag gets you killed (virtually, at least).


To get over this, manufacturers such as Nvidia produced graphics cards that could perform a reduced number of types of instructions (called RISC computing), but all at the same time, which is known as parallel processing. This type of power and parallel processing was previously the realm of supercomputers costing millions of pounds. Now for £700 (or dollars) you can get a supercomputer on your desktop.


This type of speed and power has led to the resurrection of one of the brave new technologies of the 80s, the neural network. A neural network is designed to emulate the learning patterns of the brain, by getting a computer to "learn" from data that is given to it. It fell out of fashion because while the mathematics was well understood, the processing power was just not available.


Stock answer


Today, by feeding millions of emotional scenarios into our neural network running on hardware that costs less than a second-hand car, the computer can learn the likely effects of different emotional states on the other patterns it is monitoring, such as the stock market. This way, rather than waiting until the end of the day to see if the stock market is going to go up or down, we can tell by the end of the next tweet that comes in. And this technology is available now (and the people in the know are keeping it to themselves, of course)


Twitter has changed communication for a whole generation, but it has also provided an incredibly rich seam of data for analysts to work on. I have described some in this article, but prepare to be amazed at the insights into human behaviour that 140 characters can give in the next few years.

















http://ift.tt/X5Br7n

No comments:

Post a Comment