Social science is making increasing use of Big Data. However, working with Big Data creates many challenges of complexity, security and dissemination. Professor Vania Sena outlines how the work of The ESRC Business and Local Government Data Research Centre can unlock the potential of big data for the benefit of organisations, both private and public, and for policy makers.
Big Data are the talk of the day. As the joke goes, everybody likes to talk about them but in reality not everybody is entirely sure about what they are and how or if they can be of any use.
The label definitely does not help either: what does “Big” exactly mean when we talk about Big Data? As any social scientist knows far too well, data can easily become “big” as soon as they are matched and merged with other data-sets with the result that data storage may be problematic and traditional methodologies for their analysis may not be useful any longer.
So, does “Big” simply refer to the size of the data or is it a reference to their complexity? More importantly, given the fact that data (in whatever shape and size) have always been available, can we really be sure that they offer a real opportunity to organisations and if so, how?
These are very complex questions and in an attempt to answer them, I will start from the standard definition of Big Data based on the so-called 3V model. In 2012, Gartner defined Big Data as “high-volume, high velocity and/or high variety information assets that require new forms of processing to enable enhanced decision-making”.
Although it is widely used, I also like to add to the 3V definition additional key features of Big Data which make them so interesting. First, they may be both structured and unstructured mostly because they come from different sources (think of Twitter vs. the Census data).
Second, they are routinely collected by organisations even if they are often collected for other reasons. This last feature makes them quite relevant to organisations’ performance: as a by-product of organisations’ activities, by definition they may contain information about their processes, suppliers and customers which can help them to improve their performance.
For instance, retailers have detailed information about their consumers (and their basic demographic characteristics), their preferences and their daily (or weekly) spending. These data may have been produced while customers pay for their shopping and in this sense the data collection is just a by-product of the retailers’ main activity.
At the same time, these data may contain quite useful information on their customers that will let them understand how to use their promotion schemes to nudge consumers into buying specific items, for instance.
How big are these new types of data? Again this is a relative concept that varies as processing power increases. A few years back Big Data sizes ranged from a few dozen terabytes to many petabytes in a single data set and they are still growing today. However, because of the relative nature of the notion of size, it is preferable to use the notion of complexity when discussing Big Data. Complexity here refers to the fact that these data can have been created by merging different types of data (structured and unstructured) and therefore their management and curation can be complicated.
This leads to another important issue: even if it is true that data have always been around, in reality it is only now that we have the computational power which allows to store, curate and analyse these complex and large datasets.
Consider the above example of the retailer and its shopping data. By definition, a dataset which lists all the items which have been bought by each customer every day in a store is large. In the past, the retailer would have discarded most of the information and just tried to store and make sense of a small fraction of the data, very likely aggregated by store. However, today, the retailer can store all the produced data and it can have access to techniques that allow to extract from this very detailed dataset that can help their profitability.
Data research centres
In reality most organisations do not seem to be able to exploit fully these data and equally they do not seem aware of the techniques that would help them. The ESRC Business and Local Government Data Research Centre is one of the three data research centres which have been funded by the ESRC exactly to allow local governments and business to benefit from the Big Data revolution. The Centre’s strategy for delivery is based on a multi-pronged approach articulated in the following elements:
a) the creation of a physical repository based at University of Essex where the Big Data provided from both businesses and local authorities will be securely stored and provide users secure access points to the data.
b) a research programme providing methodological advances for the analysis of Big Data, as well as an innovative, substantive and inter-disciplinary research programme on smart, sustainable and inclusive regional economic growth.
c) a Training and a Knowledge Exchange programme which will also offer users a suite of big data solutions
d) an Engagement and Communication programme aiming at creating awareness among citizens of the benefits and the costs associated to the use of Big Data as well as of the need of using them in a responsible way.
Understanding big data is a big deal for everyone. By exploiting and understanding data we can help businesses grow and help inform public policy for a better society.