All about Technology: Why big data is crude oil – while rich data is refined, and the ultimate in BI

Wednesday, 1 April 2015

Why big data is crude oil – while rich data is refined, and the ultimate in BI

Introduction and benefits of rich data

It's talked up as being an epoch-defining change, but big data on its own is useless. Created by combining open, freely available data with data owned by both citizens and businesses, 'rich data' is a far more valuable commodity.

What's the difference between big data and rich data?

"It's like the difference between crude and refined oil," says Dr. Rado Kotorov, Chief Innovation Officer at Information Builders. "Combining data provides new context and new use cases for the data. For example, combining social media data with transactional data can provide insight into purchases and thus lead to product innovation."

Rich data is created by combining data from different systems. Rich data has context, and thus, is useful in practical terms to both businesses and individuals. "Credit card processing companies sell benchmarking data to merchants," says Kotorov. "These merchants can see general market trends and compare those with their own observations of the market to make better decisions, or understand gaps in their own operations."

Context is everything – a recent study in Harvard Business Review shows that location-based offers to shoppers increases the odds of purchasing by 76%. Rich data could help healthcare providers and even fight crime, too.

Why do we need rich data?

Rich data is nothing short of cutting-edge business intelligence. "Rich data can be used to answer different kinds of questions that would previously have been difficult," says Southard Jones at cloud business intelligence and analytics company Birst. "Linking up multiple sources of information can help see things in new ways or across the whole process, rather than just one team's responsibility."

For example, imagine a sales team analysing which products to sell to which customers. Instead of looking at sales data in isolation to see who bought what last year, a rich data approach is to look beyond sales data and see the effect of marketing campaigns, and finance (how quickly the customer pays), too.

Rich data is about predicting behaviour. Selling the right product to the right person at the right price is what sales is all about, but none of this presently relies on data. "Often sales rely on gut feel and experience," says Jones. "Replacing that with a system where a customer's propensity to buy is clearly indicated allows sales to prioritise their efforts and improve productivity and accuracy."

What's wrong with big data?

It's far too shallow to use. "The computing power of the cloud has enabled us to collect, store and process vast levels of data, but with big data it is inevitable that that we will also collect lots of duplications, deviations and duds," says Nigel Beighton, VP of Technology, Rackspace, who says that without big data, we can't have rich data. "Rich data is the diamond in the rough."

Jon Cano-Lopez, CEO of REaD Group

"We often say to our clients that while their own customer data is highly accurate because it reflects their actual transactions, it can be pretty limited in terms of the overall depth it provides," says Jon Cano-Lopez, CEO of REaD Group. "A telecoms company will know how people use their phones and data, their geographic location, and even who their friends are via their telephone numbers," says Cano-Lopez. "However, they don't know who the people really are, what job they do, what their interests are, how much they earn, their family make-up, and what makes them tick."

Unstructured data might tell you about two seemingly identical heavy users, but look closer and one person could be a high volume business user, while the other is a socialite with many friends. An additional, 'rich' layer of data will add the depth, helping to identify the true value of a customer. "Combining transactional data that is based on a customer's activity, with their lifestyle information, provides a much fuller picture," says Cano-Lopez.

What benefits can rich data bring?

It's possible to create huge potential benefit by enriching data and using it in real time. "Medical device data and unstructured data in the form of clinical staff notes are being mined to support earlier diagnosis of conditions like sepsis (blood poisoning)," says Matt Pfeil, Chief Customer Officer at DataStax. "This involves real-time comparison of patient data against a centralised, anonymous set of data."

Creating rich data allows life-threatening conditions to be spotted and treated early, though only if there's consent of the patient; their anonymised data can be used in the future for the treatment of others.

Blowing smoke and privacy worries

Is rich data just a smokescreen?

Some believe that rich data is no more valuable than big data. "The problem will be that the majority [of rich data] is in those hard-to-get seams, and that requires some serious work and effort to extract," says Jamie Turner, CTO of Postcode Anywhere, which has over a billion queries a year. "But its sheer volume makes it valuable and important to not overlook."

Jamie Turner, CTO of Postcode Anywhere

Turner doesn't think that attribution is the answer, saying: "The greatest volume will be unstructured and hard to understand but way more valuable. It's also worth remembering that attribution done badly is even worse because your start relying on an indirect measure of things rather than the raw data."

What about the Internet of Things?

Another reason why we need rich data is that big data is about to explode. The Internet of Things will mean a plethora of devices coming online, from thermostats and scales to TVs and smart energy meters all constantly creating 'time-series' data. The end result will be a huge pool of data that needs to be sorted, managed and used.

"Connecting these 'things' to the internet and using the data from them can provide a better service back to the customer – whether it is helping them reduce their energy spend, or stick to a diet and exercise plan," says Pfeil, who insists that the use of that data has to be clear. "Customers want to feel that their data is being used in their best interests, and that it is kept secure," he says.

It's likely that NoSQL databases – developed by the likes of Google and Facebook – will need to be used to manage the huge amounts of time-series data that IoT devices will create.

Apps

What about open data from governments?

Open data is just unstructured big data. Government departments produce immense amounts of raw data to inform policy decisions – on everything from live traffic information and residential property sales to obesity and deprivation levels – and much of it is now being made public for anyone to analyse and use, perhaps to develop apps. But there's a problem.

"Data is being published by government departments and agencies, but not generally in a format that is easily discoverable or linkable," says Adam Fowler, Principal Sales Engineer at Enterprise NoSQL database platform vendor MarkLogic. "What's needed is a system that supports security and privacy requirements, and web publishing, incorporates semantic technologies for better discoverability and querying, and uses recognised standards for linked open data so that new data can be easily linked to existing data sources."

Should there be one centralised repository? Fowler thinks that the existing Data.gov.uk website should allow interactive querying of the underlying data. "For example, an open data report published on the use of homeless shelters, by borough, would need to exclude individuals' names," says Fowler. "This could help central government or charities to better allocate funds, reflecting up to date usage across the country."

So if open data published by government departments and agencies was easily linkable and discoverable by individuals and businesses, it would have so much more value; if it's not in a format that is easily discoverable and useful then it may as well be closed data. Common, open standards have been proposed by the ODI (Open Data Institute) and the W3C, and Tim Berners-Lee, the initiator of the Linked Data project, has suggested a 5 star deployment scheme.

Matt Pfeil, Chief Customer Officer at DataStax

Is rich data reliable?

Rich data is only as good as the personal data it uses. A recent report from Symantec called State of Privacy looked into attitudes towards data privacy across Europe, including the UK, and found a growing mistrust in how businesses and governments treat personal data. A third of people in the UK provide false data to protect themselves and over half of those surveyed (57%) are now avoiding posting personal details online altogether.

"You may be putting your faith in user data at the expense of truth," says Sian John, Chief Security Strategist EMEA, Symantec, to organisations relying on user data. "Data does not always acknowledge the human side of your customer. Too much reliance may deliver an advertising or marketing campaign with little relevance."

Is rich data a threat to privacy?

Worries about a Big Brother society is causing a breakdown in trust between individuals and companies. "We are entering a world where consent will be king, and the more that companies have to ask customers for this, the more they may be rejected," thinks Cano-Lopez.

Some think that rich data can, in time, be used as leverage. "In the future, people may choose to control information that they are creating and then monetise this back to companies – this may be in the form of lowering their bills or getting better service quality from one provider," says Pfeil.

People may want more control over their personal data, but the system is not set up for this.

"You would need a central portal where this data would be stored allowing businesses to upload consumer data and consumers access to provide cross-brand permissions," says Jason Lark, Co-Founder and MD of Celerity, adding: "Think how tough it is for many small businesses to record their data, while many businesses are still working on bringing their own customer data together."

Merging all of this data into one portal would be, says Lark, a Herculean task. Beyond logistics, there are social implications, too. "Does it involve empowering individuals or nationalising data?" asks Lark. "Are we depriving individuals of their data, or companies of their property? We need to really think about these issues."