DataScience

Discussing our Data Science Master's Part 01: Overall structure

by Erik Štrumbelj on Oct 19 2018

What is a Data scientist anyway?

One would have a very difficult time trying to find a generally accepted definition of Data Science. My personal view is that data science is just a new name for a skill-set that has been around probably for as long as we had science. Today, it might be way more cool to be called a Data Scientist than an applied statistician, analyst, quant or just researcher in a particular domain. Tomorrow, Data Science will probably be replaced with another term. But the core skills remain the same: analytical thinking, attention to detail, communication and domain-specific knowledge. Mathematics and computation. Science.

Without a doubt, this skill-set has been evolving and will continue to evolve. For example, 100 years ago it was very important to understand mathematics to a level where one could do his own derivations in very complicated cases. Not to mention looking up things such as logarithms and quantiles in tables. Today, most of the heavy lifting is done with computers. And, luckily for me, we don’t have to write reports and draw plots by hand anymore.

On the other hand, progress also brings new challenges. A 100 years ago, big data was something that you couldn’t fit on a piece of paper. Today, there is so much data that we need at least a basic understanding of data management. Better tools also bring higher standards in software development, reproducible research, visualization, etc.

Like the skill-set itself, the demand for it has also always been present. And growing. To the point where we now live in an increasingly data-driven world: finance, marketing, sports training, politics, HR and medicine, to name just a few, rely on data. Today, the practical reality is that there is a big shortage of people with such skills (or even just some of them). So, educational institutions are now faced with the question of how to best prepare students for a career in this data-driven world.

The Master’s program we’ve prepared for you is our attempt at an answer. In this post, I’ll try to explain some of the choices we made in designing this program and why we made them.

Not everyone can be a professional basketball player

Or a professional Data Scientist.

Most Master’s programs in Data Science that are available today are either 1 or 2-year programs aimed at working professionals or 2-year programs aimed at a broad spectrum of student backgrounds (mathematics, engineering, economics, life sciences, social sciences, medicine). The shortage of Data Scientists (and popularity of Data Science) in today’s market is such that both of these make perfect sense.

Our program, however, belongs to the minority of programs aimed at producing more complete and versatile Data Scientists that will hopefully someday lead Data Science departments or replace us as better researchers and teachers. Professionals that should not only be able to use tools to do data science, but also have a deeper understanding of the underlying concepts and the ability to generate new ideas and create new methods.

This choice comes at a cost. In order to achieve that level, students need both a better starting point and to cover more ground in the 2 years of our Master’s program. In practice, this means being more selective in terms of their undergraduate background and enrolling a relatively small number of students.

The core knowledge and skills

Most of the core courses focus on mathematical and analytical aspects of data science. Why? Because we firmly believe that these are key to training analytical thinking and developing a deeper understanding. And these are also the areas where one can benefit the most from an academical treatment of the material.

Two core courses are purely mathematical - their goal to familiarize the student with graduate level calculus, matrix/tensor algebra, and optimization. One core course is dedicated to probability and statistics, again, at a graduate level - students learn how to use probability to talk about uncertainty and how to use statistical approaches to reason with uncertainty. The fourth core course introduces the students to machine learning using a solid mathematical/statistical foundation from the first semester and adding modern computational approaches to data analysis and optimization.

Concepts are learned but skills need to be practiced and internalized. R/Python programming, databases, version control, critical thinking, communication, reporting, visualization, and other skills are like breathing to the modern Data Scientist. The Introduction to data science course provides an overview, but there is no specific course dedicated to any of them. Instead, the core courses are synchronized and all of them implicitly require the student to practice and demonstrate these skills.

Specializing in a particular area of data science

There are many things one can do as a Data Scientist. Therefore, we believe that once the core skills have been acquired, it is very important for the student to have some flexibility in choosing the areas he wishes to specialize in. On the other hand, it would also be a terrible waste if we did not build on the core knowledge that the students acquire in the 1st year.

The Elective core courses are a compromise between the two. These are more demanding courses that rely on prerequisites and provide specialized knowledge in one of the areas that are integral to modern Data Science. Every student must take at least 2 of these courses, but they may take more if they want and are willing to put in more work. Currently, the program has 4 such courses (the second course in machine learning, a course in modern applied statistics, a course on working with large quantities of data and a course dedicated to deep learning).

The remainder of the curriculum is elective courses that are also accessible to other Computer science Master’s students. Due to the long tradition of AI & machine learning at the Faculty of computer and information science, we able to offer a wide range of courses in the broad area of data science. For example, Artificial intelligence, Bioinformatics, Biometrics, Computer vision, Natural language processing, and Network analysis, to name a few. The list of courses will be updated every year. Students that want to learn more in a particular field of Computer Science can also pick a 4-course module from the list of non-data science related Computer science modules.

This allows for many different profiles so students can adapt their coursework to better prepare them for a particular field or problem domain. We will revisit this topic in one of our future posts.

Putting things into practice

A Data Scientist’s education would not be complete without learning what Data Science looks like in practice and the opportunity to apply their newly acquired knowledge and skills to solving real-world problems. Each semester, there is at least one course dedicated to this. The Introduction to data science course in the first semester will feature guest lecturers from industry. In the second semester, students will work in teams on solving a real-world problem from industry or academia. And in the second year, each student will work on his Master’s thesis - a 4-course equivalent project where students apply all that they have learned to a problem in the broader area of Data Science.

Closing words

Hopefully, this gives you a broad overview of our Master’s program and some of the reasons behind its design. In future posts, we’ll discuss individual components and courses in more detail.

Until next time.