Data Jargon Buster
Soon, data science will become just another skill that we’re all at least somewhat literate in. However, one major barrier is the jargon used by data experts, making the topic inaccessible to the average person. And to break it down, we need to start getting rid of jargon – or at least master it.
Only 33% of full-time employees in the U.S. are confident in their data literacy – and it’s not hard to see why. Understanding data involves learning a whole new language of data-related jargon, which creates a real barrier to entry. However, according to Jeff Catlin, CEO of Lexalytics, “data science will become just another skill we’re all at least somewhat literate in.”
As AI becomes more ubiquitous, and the tools become more user-friendly, Catlin predicts that barriers to entry will go down: “Just as computers have become more widely available and easier to use, so too will AI.”
It’s already been proven that exposure to data improves data literacy. Organisations that succeed in establishing a data-driven culture begin to see less-technical employees work with and benefit from data – and there’s a real incentive to encourage this. For a typical Fortune 1000 company, improving data accessibility by just 10% will result in more than $65 million additional net income.
Simply put, it’s a no-brainer: we need to empower everyone to get comfortable with data. And that includes getting rid of jargon.
Below, we’ve collated some examples of the most commonly used data jargon.
Our Top Data Jargon Busters
Data science: Basically, Data Science is an interdisciplinary field that uses data to understand and affect behaviour of people, systems and environments. By using scientific methods, processes, algorithms and systems, Data Scientists can extract knowledge and insights from large amounts of complex data. In other words, it’s a whole realm of science that is specifically geared towards providing meaningful information, and interpreting data for the purpose of decision making. Simple, right?
Artificial Intelligence (AI): You may already have heard of AI, which is a popular subject for science fiction films. In a nutshell, AI is a specialist form of computer science, focused on giving computers capabilities that imitate aspects of human intelligence. This can include everything from pattern recognition to computer vision and reasoning.
Machine learning: This is type of statistical method that enables a machine to learn on its own, without the help of a data scientist. It relies on a machine’s ability to adapt: using algorithms, models actively learn and better themselves each time they process new data. They do this by classifying different inputs in order to make predictions about future behavior, based on having been trained with similar previous data. This can be applied to anything from spam detection to sales forecasting, as well as product recommendations.
Deep learning: This is the specialization of one particular machine learning technique, known as “artificial neural networks”, which enables machines to train themselves. To do this, they need to have access to sufficiently large amounts of example data. Deep learning techniques vary, but some applications include image and speech recognition, language translation, and autonomous vehicles.
Analytics: Simply put, analytics is a form of storytelling. It’s the process of drawing conclusions based on raw information. Through analysis, data and numbers can be transformed into something useful.
There are three main types of analytics in data:
Descriptive analytics: This involves condensing big numbers into smaller pieces of information. It’s sort of like a summary: rather than listing every single number and detail, descriptive analytics provide a general narrative.
Predictive analytics: Predictive analytics allow analysts to make predictions about the future by studying recent and historical data. Of course, this process is not 100% accurate, but it can provide insight as to what will most likely happen next, and it is done using data mining, machine learning and statistics.
Prescriptive analytics: Finally, having a solid prediction for the future, analysts can prescribe a course of action. This turns data into action and leads to real-world decisions.
Autonomous things: “Autonomous things” use AI to perform tasks traditionally done by humans. Whether it’s robotics, cars, drones, or appliances, all autonomous things use AI to interact with their environments. The sophistication of these systems may vary: for example, they can span a drone operated in the air with human-assistance to a farming robot operating completely autonomously in a field. Companies like Microsoft and Uber are already using AI-driven robots to patrol parking lots and large outdoor areas to predict and prevent crime.
Big Data: Succinctly put, “Big Data” describes data sets that are so voluminous and complex that traditional data processing application software is inadequate to deal with them. Because the data comes from lots of different sources, and isn’t always consistent or structured, it can be very hard to work with. Which is why it’s very practical to have data scientists.
Most large web applications might have data in the tens of gigabyte. Big Data, however, ranges from hundreds of Gigabytes scale to Terabytes or even Petabytes. For reference, one Petabyte is equal to 1,000,000,000,000,000 Bytes. To visualize, Gizmodo described one Petabyte as 20 million 4-drawer filing cabinets filled with texts. 20 Petabytes would be all the written works of mankind from the beginning of time translated in every language.
Database: A database is an organized collection of data. It may include charts, schemas or tables. It may also be integrated into a Database Management System (DBMS), a software that allows data to be explored and analyzed.
Data mining: Rather than mining for natural resources, data mining explores large sets of data to find patterns and insight. This is a highly analytical process that emphasizes making use of large datasets, usually involving artificial intelligence, machine learning or statistics. Importantly, the data mined can be used to predict future trends.
IoT: The “Internet of Things” is generally described as the way products are able “talk” to each other. It is a network of objects (for example, your phone, smartwatch or car) embedded with network connectivity. Driverless cars are perfect examples. They are always pulling information from the cloud and their sensors are relaying information back. The IoT generates huge amounts of data, making it both important and popular for data science. There is also the IoE (the “Internet of Everything”), which combines products, people and processes to generate even more connectivity.
Parties: Data is classed as either as 1st party data, 2nd party data, or 3rd party data.
1st party data is the most valuable: it’s the information collected offline (e.g. surveys) or online (e.g. cookie-based, web analytics, client CRM). It belongs to whoever collects it – often, this will be advertisers, who can collect data (with the consumer’s notification and consent) through Data Management Platforms.
2nd party data is, essentially, someone else’s 1st party data. It’s information collected offline or online belonging to a company which collects data about the consumer (with their consent), and with whom you can have a partnership agreement. In other words, it’s somebody else’s first-party data (e.g. Facebook data).
3rd party data is the broadest type of data. It encompasses data collected by data providers other than the website owner (e.g. Exelate or bluekai).
Personal Data: The new EU data privacy law – General Data Protection Regulation (GDPR) – defines Personal Data as the following: any information relating to an identified or identifiable person. This person is one who can be identified, directly or indirectly, by reference to an identifier such as a name, an identification number, location data, an online identifier, or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural, or social identity of that natural person. This can be any data that could identify a specific individual, i.e. names, email addresses, device identifiers, twitter handles, location data and photographs. But it could also be cookie data, IP addresses, and other unique reference numbers.
Ready to improve data literacy in your organisation?
At AVADO’s Data Academy, we work to develop community programmes, so data literacy becomes top-of-mind and seeps into daily consciousness.
Interested in learning more about how the Data Academy can help your organisation and about our free levy-funded programmes? Give us a call on +44 (0)20 3893 5401 or email us at email@example.com.