Data science is about all the new information that has only recently become available, largely due to advances in technology, and cannot be processed using traditional techniques. Some of it is “big” and so requires new tools.
Examples include location-tracking technology in phones and other smart devices, which can provide insight into the movements of large groups of people. Other examples are remotely sensed images that when paired with machine learning can give detailed information about natural resources, our environment and infrastructure.
Netflix has access to the viewing activity of over 180 million subscribers around the world, while Amazon knows the online browsing and purchasing habits of the nearly 200 million people who visit the site every month.
Data science, then, means using statistics and computer science, among other skills, to collect, clean, organize, store and analyze that information in order to make evidence-based business decisions.
Netflix uses its store of information not just to recommend what you should watch next, but also which films and movies it should invest in. And you already know how your activity on Amazon follows you around the web via personalized ads.
“Data science is at the intersection of computer science and statistics, with a strong focus on problem-solving, and data science skills open doors to almost any field, such as applications in the physical processes and the environment, additive manufacturing, health care, economics and e-commerce,” said Applied Mathematics and Statistics Professor Doug Nychka, co-director of the Data Science graduate program at Colorado School of Mines.
Data Science vs. Data Analytics, Data Engineering, Machine Learning
As with any relatively new field, there’s some confusion with the various terms being thrown around. Perhaps the most common question has to do with the difference between data science and data analytics.
In a nutshell, data science has a much larger scope, encompassing every step of the process of turning unstructured information into actionable intelligence.
Data analytics, on the other hand, is the subsection of data science that involves organizing, processing and studying information that has usually already been organized to solve a problem that is already known. This discipline requires database and programming skills and mostly uses statistical tools to arrive at conclusions.
Data engineering, which might also be known as data infrastructure or data architecture, is the creation of the processes for collecting, generating, storing and manipulating information. This requires a strong background in programming and information technology.
Machine learning is just one of the techniques data scientists use in their work. Machine learning algorithms comb through vast amounts of information and find patterns in the data using statistical techniques.