In the October 2012 edition of the Harvard Business Review Thomas H Davenport and DJ Patil co-authored an article titled "Data Scientist: The Sexiest Job of the 21st Century". The article is just one example of the hype surrounding Data Science in recent times. However, Data Science is still a topic of much speculation and conjecture around the world, let alone in Bangladesh.
'Big data', 'machine learning', and 'analytics' are words that are thrown around very often during conversations relating to data science. This article aims to shed some light on these topics as well as provide a basic idea of the data science value chain and the various roles about the job.
In its most basic sense, data science is the analysis of diverse data that helps to get valuable insights and actionable steps. In today's digital age, vast volumes of data are generated every second, most of which is unstructured, this is known as big data. Data scientists must make sense of this enormous amount of data for it to be used for analysis purposes. Due to its very nature, data science can be thought of as an intersection between programming skills, math and statistical skills, and domain knowledge. Looking at the value chain will clarify why and where each skill is required.
The first step is the planning phase. When a new data science project is initiated, clear goals must be defined, i.e. what questions need to be answered or what needs to be accomplished. The second step is data preparation. This stage includes collecting the data and deciding on possible sources of data. The data needs to be cleaned so that it can be used in the specific programme intended. Once cleaned, the data must be further refined, i.e. choosing which variables to work with. The third step involves modelling. The data is used to create models, which must be validated or checked to see if it accurately measures what it is supposed to. Once that is done, the model must be evaluated for accuracy. The final step is the execution of models, which involves visualising the data and models and deploying them in places. Programming skills are needed for data preparation and execution. Mathematical and statistical skills are necessary for modelling the data and domain knowledge are required for the planning phase. Of course, there are overlaps, but on a fundamental level, that is how the skills can be divided.
Considering the diversity of skills required for data science, it is no surprise that a single person cannot do it alone. As such, there are many roles when it comes to data science. People often have the misconception that data science is only for computer science graduates or statisticians; however, that is not at all true. People of diverse experience and skills may work together on data science teams. The various roles are as follows:
Engineers/developers: These people are responsible for collecting, cleaning and maintaining the data and data warehouses.
Big data specialists: These are professionals with in-depth knowledge of computer science and statistics. One of their primary responsibilities is to ensure that the models created can learn from experience and improve by itself without human intervention, also known as machine learning.
Researchers: Researchers have expertise in the statistical elements of data science and are required to have domain knowledge as well. This helps them to build robust models that will help achieve the objective or answer the specific question being explored.
Analysts: Analysts use the data on a day-to-day basis. They often engage in activities such as data visualisation or use database software to retrieve specific queries they may have. They draw inferences based on the patterns and trends found after analysing the data, also known as analytics.
Why can one consider Data Science a career option? Just in the US alone, a shortfall of 0.14 million people with profound analytical ability was projected. As far as salaries are concerned, data scientists have the third highest median salaries in the US. However, what does that mean for people here in Bangladesh? Currently, demand for data science professionals is limited as very few companies have business intelligence or analytics teams.
Data scientists are proving to be very valuable in tech companies in Bangladesh, especially the telecommunication companies as they generate vast volumes of data, which needs to be analysed to make informed decisions and take actionable steps. It is not just tech companies that can benefit, though, any organisation that generates huge amount of data can benefit from analysing it. However, as time goes by, more and more companies are going to see the need for proper management and analysis of data to identify trends and patterns that they otherwise could not.
The demand in Bangladesh for data scientists may be limited right now, but worldwide, there is a shortage of skilled data science professionals.
The writer is a strategic assistant partner at Banglalink Digital Communications Limited.
He can be reached at shahriar127@gmail.com