Big data and data analytics offer the opportunity to gain insight into consumer shopping habits, preferences and sleep patterns. But some companies have trouble figuring out what all the information means. This primer can help ease the process
The terms “big data” and “data analytics” have become popular buzzwords in recent years. While the terms conjure up notions of market omnipotence built off the incomprehensible amounts of data generated every instant in the digital age, many who use these terms don’t truly comprehend what they mean, what their capabilities and limitations are, or how to use them.
In this article, we’ll provide a basic introduction to the concept of big data, as well as the more practical concept of data analytics. We’ll also make the argument that big data, while a powerful tool, is not necessary for every data analysis project. We’ll then discuss some of the common pitfalls people encounter in data analytics. Finally, we’ll walk through the basic process of how to structure a successful data analytics process to avoid those pitfalls while getting the most of the available data resources.
What is big data?
Before getting into the best practices of data analytics, it’s important to get a firm understanding of some of the terms. Big data is the type of term that can spawn a different definition depending on whom you ask but a widely accepted definition comes from the Gaithersburg, Maryland-based National Institute of Standards and Technology: “Big data consists of extensive data sets — primarily in the characteristics of volume, variety, velocity and/or variability — that require a scalable architecture for efficient storage, manipulation and analysis.”
“Big data consists of extensive data sets — primarily in the characteristics of volume, variety, velocity and/or variability — that require a scalable architecture for efficient storage, manipulation and analysis.”National institute of standards and technology
The NIST notes that the definition above contains an inherent interplay between the characteristics of the data and the need to be able to process it with sufficient levels of performance (speed) and cost efficiency, i.e., the “architecture” element.
The architecture element is not fundamental to the discussion of developing a process for data analytics, but the massive amounts of data potentially involved in data analytics requires either a massively powerful computer to manage the collection, storage and analysis of those data (vertical scaling) or distribution of the data collection, storage and processing among many integrated individual computers (horizontal scaling).
The role of statistics
Merriam-Webster defines statistics as “a branch of mathematics dealing with the collection, analysis, interpretation and presentation of masses of numerical data.” Statistics is a key concept at the heart of big data and data analytics. The idea is to use a subset of the overall universe of data to draw conclusions about that universe of data. For example, you might collect measurements of the weight of some people in an attempt to estimate the average weight of all people. As we will see, this seemingly simple example becomes much more complex depending on the characteristics of the data.
The other element of the big data definition focuses on the characteristics of the data. These characteristics will determine whether and how you can use certain data in a data analytics project. NIST lists four primary characteristics — volume, velocity, variety and variability — to which Cary, North Carolina-based data analytics company SAS adds a fifth — veracity.
A company with a $5,000 budget and a single full-time employee allocated to a data analytics pilot project realistically can’t expect to predict the trends in consumer preferences in the United States next year.
- Volume: This refers to the number of data points, and it can be both a blessing and a curse. Access to seemingly infinite data points can mean the ability to identify consumer characteristics and likely behaviors with an incredible degree of accuracy, but the architecture costs of processing those data increase with volume.
- Velocity: Closely related to volume, velocity considers the rate at which data are or can be collected.
- Variety: Data comes in a variety of forms. For example, consumer preferences can be measured by reviewing consumer surveys, tracking online searches or tracking purchase decisions.
- Variability: Variability refers to changes in data over time. These changes can include the rate at which data are flowing, the format of the data or the data themselves.
- Veracity: Data from different sources can be more or less reliable. For example, consumers might report preferences that don’t match their purchase behavior in practice. Sales data from some sources might be incorrect or even intentionally misstated. Veracity refers to the quality of the data.
What is data analytics?
Data analytics literally means analyzing the available data. It can take several forms, depending on the questions being addressed:
- Descriptive: What happened?
- Diagnostic: Why did it happen?
- Predictive:What will happen?
- Prescriptive: What should be done?
These types of data analysis increase with complexity from descriptive to prescriptive, with prescriptive being the ultimate goal of businesses. A descriptive data analytics question might be, “How much money does the average California consumer spend per year on automobile maintenance?” The answer can be obtained relatively easily given sufficient data on spending habits, and that answer is useful to the extent that it helps determine the size of the available market, but it isn’t necessarily enough by itself to aid in decision-making.
The related diagnostic question might be, “Why are consumers spending this amount of money on automobile maintenance? Why not more or less?” This requires looking more closely at the data and making some connections, such as the spending habits of those in certain demographic groups, income brackets or geographic regions; changes in spending habits given broader economic or seasonal conditions; or spending habits on certain types of automobile maintenance or spending with certain companies versus others. The predictive follow-up would be, “What are the spending habits of California consumers with respect to automobile maintenance likely to look like over the next five years?”
Finally, the prescriptive question: “Given what we believe to be the current and future state of the market, what should we do?” In our hypothetical, a company in the automobile maintenance business in California faced with this question might decide to invest more heavily in a specific area of the industry, more directly target certain market segments, benchmark and follow the best practices of specific competitors, or even leave the industry altogether.
Does data analytics require big data?
Armed with a firm understanding of big data and the goals of data analytics, we can ask the question, “Does data analytics require big data?” Let’s look again at the NIST definition of big data with a couple of key components of the definition emphasized this time: “Big data consists of extensive data sets — primarily in the characteristics of volume, variety, velocity and/or variability — that require a scalable architecture for efficient storage, manipulation and analysis.”
What distinguishes big data from a more generalized concept of a data sample is the size and the complexity of managing all that data. Big data, by definition, consists of massive amounts of data that require massive computing capacity to use effectively.
Does a business require big data to effectively conduct data analysis? Not necessarily, but it depends on the analysis. Statistics become more accurate as the size of the data set increases. And accurately answering complex questions becomes easier as the accuracy of underlying statistical conclusions are more accurate. So, a company looking at the defect rate of its mattress manufacturing process over the past three years probably doesn’t need to leverage big data; however, a company that wants to conduct a prescriptive analysis of a business plan for entering the Southeast Asian market for high-end mattresses over a five-year period might need big data to ensure its estimates and predictions are sufficiently accurate to warrant its planned course of action.
Common data analytics pitfalls
Companies looking to leverage data analytics face a number of potential pitfalls, and there are many ways a company can spend huge amounts of time, money and other resources on a data analytics project with little or no benefit, or, even worse, end up making a bad decision. Here are a few common mistakes often made.
Does a business require big data to effectively conduct data analysis? Not necessarily, but it depends on the analysis.
- No clear objective: Companies often see terms like big data and data analytics as a silver bullet or miracle cure and think, “If we just have the data, the answer will become clear.” Pulling in billions of data points and putting them through some black box that spits out valuable insights isn’t how data analytics works. Companies that go into the process without a clear objective are likely to spend significant resources with no clear benefit.
- Taking too big a bite: Almost as bad as not having an objective at all is having an overly ambitious one. Companies need to consider their resources and capabilities before defining their data analytics goals. A company with a $5,000 budget and a single full-time employee allocated to a data analytics pilot project realistically can’t expect to predict the trends in consumer preferences in the United States next year. But it could possibly succeed in identifying the most and least productive work hours in one of its factories.
- Basing decisions on faulty data: As discussed above, the ultimate goal of data analytics is to determine a proper course of action given the current and expected future state of affairs. If decisions on the proper course of action are based on an incorrect assessment of the current and future state of affairs, those decisions could end up being disastrous. This is why prescriptive analytics requires investing in the necessary resources to ensure accurate descriptive, diagnostic and predictive analyses.
Structuring an effective data analytics project
Knowing what can go wrong, let’s think about how to make a data analytics project succeed and consider some of the steps to structure an effective data analytics project.
- Set clear objectives. We discussed this in the pitfall section. Companies looking to leverage data analytics need to have a clear idea of what they hope to get out of their initiative. This includes whether the analysis is descriptive, diagnostic, predictive or prescriptive.
- Formulate clear questions. Whether estimating the current state of the market or determining where to position the company over the next year, it’s important to develop clear foundational questions. For example, to know why consumers favor a certain mattress brand, companies first need to find out which brands they prefer, what the characteristics of those brands are, what qualities consumers value over others, etc.
- Develop a strategy to answer those questions. How do you answer a question about consumer preferences? It could include conducting surveys, analyzing online search data and purchase decisions, speaking with consumer experts, or a combination of these and other strategies.
- Collect data. The data collection step is essentially the execution of the strategies identified in the previous step: conducting surveys, collecting data on purchases or online searches, etc.
- Analyze. Depending on the type of data analysis being performed, this can be an extremely complex step. If the analysis is simply descriptive, it may be as simple as doing a count or calculating an average. Something more complex involving drawing conclusions about relationships in the data and predictions about the future obviously is a lot more involved.
- Iterate. As with anything, it’s unlikely you’ll nail your data analysis on the first attempt. Data sources might lack veracity or sufficient volume, conclusions might be based on inaccurate connections between data, etc. But by repeating the process, learning from mistakes and making the necessary adjustments, companies can gain significant competence in data analytics over time.
Businesses have various resources at their disposal, which they leverage to generate profit. These resources can include tangible assets like employees, raw materials, buildings and equipment. They also can include intangibles like intellectual property and data. (For more about the tricky subject of intellectual property and research data, scroll to footer story.) Data can be a valuable asset that many companies are just starting to understand.
Big data and data analytics might sound like complex concepts that are out of reach for all but the most sophisticated companies, but, in practice, they are only as complex as you choose to make them. If you understand these concepts, it’s possible to narrowly and precisely define a data analytics goal and achieve meaningful and powerful results. The key is to not get overwhelmed by the amount of data or be overly ambitious with perceived capabilities and expected results, and to clearly map out the data analytics project plan.
Legal and Regulatory Concerns Over Data Collection
Data analytics can be complex enough in its own right, but companies also need to be aware of various legal and regulatory issues that pertain to data collection and use. We’ll briefly mention a few and encourage companies to speak with legal counsel for any concerns they may have with respect to these and other potential legal and regulatory concerns.
Again, companies should seek legal advice to address any concerns in this area.
- Data privacy laws: Governments around the globe, including half of U.S. states, are increasingly concerned with when and how data on people is collected and what is done with it. Two major examples are the European Union’s General Data Protection Regulation and the California Consumer Privacy Act. Both laws have teeth beyond the borders of the EU and California, respectively: The GDPR protects EU citizens wherever they reside and wherever the data is collected, and the CCPA is enforceable against any company doing business in California.
- Data security: Even if companies can collect and use consumer data, there are requirements in many states intended to ensure that data are safe from theft and unintentional disclosure. This can include not only credit cards, but also home and email addresses, and other personally identifiable information.
- Intellectual property: Companies also need to be aware of laws concerning who owns the data they are using in a data analytics project. Is it the subject of the data? The collector and compiler? Who can sell it and use it? These questions are the subject of intellectual property rights in data management.
Not all data collection and analysis will raise legal and regulatory questions, but companies need to be aware of the types of issues that can arise and expose them to potential liability. Again, companies should seek legal advice to address any concerns in this area.