Big Data the Fuel of Progress
Big Data the Fuel of Progress
By Alan C. Brawn CTS. ISF-C, DSCE, DSDE, DCME, DSSP
Big data literally envelopes us. It permeates each of the elements of our personal and professional lives. We are surrounded by it… and some might even say drowning in it, and some may ask to what purpose? We will attempt to clear it up and define, explore, and characterize big data and discuss how we reformat it into information we can use. Of course, it all begins with understanding exactly what this phenomena called big data is.
There are as many definitions of “big data” as there are businesses who want to collect it and try their best to benefit from it. Big data as a terms describes the sheer magnitude and availability of a seemingly endless variety of data throughout our world. Big data ultimately refers to extremely large data sets. IBM maintains that businesses around the world generate nearly 2.5 quintillion bytes of data daily! Almost 90% of this global data has been produced in the last 2 years alone. A more formal definition comes from the National Institute of Standards and Technology (NIST). They defined big data as consisting of “extensive datasets—primarily in the characteristics of volume, velocity, and/or variability—that require a scalable architecture for efficient storage, manipulation, and analysis.” So now we know that big accurately defines all this data we need to look at where it comes from.
The reality is that it comes from everywhere. It is generated by everything we interact with and are connected to. It can even be created when we think we are not connected. Can we all say cameras and sensors? It comes from business transactions, loyalty programs, customer databases, medical and government records, internet transactions and clicks, mobile applications, social networks, research repositories, and machine-generated data and real-time data sensors used in Internet of Things (IoT) connectivity to name a few. The data may be left in its raw form or preprocessed using data mining tools or data preparation software, so it’s ready for analytics to make it usable.
Big data can be categorized in three basic types:
- Structured data is quantitative in nature and refers to a fixed format like a database or Excel spreadsheet. In this form it is easy to use, analyze, distribute, and repurpose as needed.
- Semi-structured data does not conform to relational databases such as Excel or SQL but contains some level of organization through semantic elements like tags. For instance, consider HTML, which does not restrict the amount of information you can collect in a document, but enforces a certain hierarchy or structure.
- Unstructured data is qualitative and lacks any specific form or structure. Email, social media, word processing, and video files are examples. The lack of structure makes it very difficult and time-consuming to process and analyze. Over 80% (and growing) of big data falls into this category.
Let’s explore the characteristics of big data. In 2001, industry analyst Doug Laney defined the characteristics of big data labeling them as the “Three Vs”. They were volume, velocity, and variety.
- Volume: The unprecedented explosion of data gathering means that the digital universe will reach 180 zettabytes (180 followed by 21 zeroes) by 2025. 2.5 quintillion bytes of data are produced by humans every day. The challenge is not so much the amount but what to do with it.
- Velocity: Data is generated at an ever-accelerating pace. Every day, Google receives over 3.5 billion search queries. Globally, as of 2019, a staggering 293.6 billion emails were sent each day and there are currently now over 4 billion email users worldwide. The challenge for data scientists is to find efficient ways to collect, process, all this data for specific uses.
- Variety: Data comes in different forms as note previously. It might be structured, semi-structured, or increasingly unstructured.
As you can image a lot has happened since 2001 and the original Three Vs of big data. Data scientists are adding several expanded characteristics.
- Veracity: This refers to the quality of the collected data. Think garbage in and garbage out. One expert noted, “As the world moves toward automated decision-making, where computers make choices instead of humans, it becomes imperative that organizations be able to trust the quality of the data”.
- Variability: Data’s meaning is constantly changing. One example is language processing by computers. It is complicated because words often have several meanings. Data scientists must account for this variability by creating sophisticated programs that understand context and meaning in all the variations possible.
- Visualization: Data must be understandable to nontechnical consumers of data. Visualization is the creation of complex graphs that tell the tale, “transforming the data into information, information into insight, insight into knowledge, and knowledge into advantage”.
- Vulnerability: This is all about security and protecting big data yet making it accessible to the appropriate person.
- Volatility: How long does big data need to be kept?
- Value: While considering the cost to collect and assess your big data, after addressing all the other characteristics, you want to be sure your organization is getting value from it.
Companies can use the accumulation of big data in several ways. It can be used to improve operations, lower operating costs, and assess weakness and strengths in various areas. In a sales environment, it can improve customer service or create personalized marketing campaigns based on specific customer preferences. No matter the application, it can lead to increased profitability.
Businesses that effectively gather and utilize big data with an eye of utilizing it profitably for their companies have a competitive advantage over those that do not. They will be able to make faster and more informed business decisions. One McKinsey analyst noted, “Buried deep within this data are immense opportunities for organizations that have the talent and technology to transform their vast stores of data into actionable insight, improved decision making, and competitive advantage”.
One thing we know for sure: big data is here to stay, and it’s getting exponentially bigger minute by minute. Organizations need to understand and assess what big data means to them and what it can do for them. In digital signage, we need to facilitate those discussions with our clients to see and show where we fit. The possibilities and opportunities are truly endless.