Kohderyhmä 
Data Science Master's programme
Data Science Methods module
The course is available to students from other degree programmes 
Ajoitus 
First semester (Autumn)
Typically 2nd period 
Osaamistavoitteet 
Machine learning is the core technology under the recent developments of artificial intelligence (AI) and it is applied widely in several domains. This course will provide you with the necessary theoretical background to understand the fundamental machine learning concepts and to use the basic methods of supervised and unsupervised learning in a proper manner to solve reallife problems. The course will prepare you for the further studies in machine learning and introduce you to the methods and tools that are used to solve the problems in practice.
More specifically:
 You will have the necessary theoretical background to understand and explain the fundamental machine learning principles and concepts (e.g., training data, feature, model selection, loss function, training error, test error, overfitting). You recognise various ingredients in machine learning task (task, computational problems, models, algorithm etc.).
 You are able to map a practical data analysis problem into a machine learning task, take the correct steps to solve the task, and know how to interpret and evaluate the outcomes. You understand the underlying assumptions and limitations of the machine learning solution.
 You are familiar with the basic tools and of a programming environments suitable for solving machine learning problems and you are able to independently to do the basic data analysis tasks with such programming environments.
 You understand the concept of generalisation, can use validation set methods, and you are able to evaluate the performance of machine learning methods and to do model selection.
 You know the principles of and are able to apply to the realworld problems the following techniques:
 supervised learning: basic regression methods (linear etc.), classification methods (at least one example of: linear, distance based, generative, discriminative, and algorithmic).
 unsupervised learning: the most important clustering formalisms (kmeans, hierarchical clustering) and the most important dimensionality reduction approaches (PCA, at least one distancebased, at least one manifold method).
 You can read machine learning literature (textbooks, scientific articles etc.) and you are prepared for further studies in machine learning or in other disciplines which need machine learning methods.
 You can explain and report your machine learning approaches and solutions to your peers and to your future colleagues in an understandable and coherent manner.

Toteutus 

Edeltävät opinnot tai edeltävä osaaminen 
The students should have the following prerequisite knowledge, with examples of courses providing the necessary skills:
 Generic skills learned during BSc studies, including writing of academic reports.
 High school mathematics and university mathematics, including basics of optimization with differentiation. Courses: MAT11001 or FYS1010.
 Linear algebra, including basic matrix and vector operations, eigenvalues, and eigenvectors. Courses: MAT11002 or MAT11009 or FYS1012.
 Probability and statistics, including random variables, expectation, and rules of probability. Courses: MAT12003 or MAT11015 or FYS1014.
 Programming skills, some programming experience, and ability to quickly acquire the basics of a new environment such as R or Python (courses: TKT10002 or FYS1013). Additionally, it is useful to know the basic ideas of pseudocode and the analysis of time and space complexity with big O notation.
The course has a short prerequisite knowledge test – available at the course web site – which contains more detailed description of the required prerequisites and pointers to selfstudy materials. Courses Introduction to Data Science and Introduction to Artificial Intelligence are recommended but not required. 
Suositeltavat valinnaiset opinnot 
Courses in the Machine Learning module. Courses in other degree programmes in which machine learning methods are applied. 
Sisältö 
The course includes the following content:
 Ingredients of machine learning: components (tasks, computational problems, algorithms etc.) and necessary tools.
 Introduction to statistical learning and probabilistic modelling.
 Supervised learning: basic definition, basic regression and classification algorithms (linear, probabilistic, distance based models).
 Statistics and evaluation: estimating parameters and resampling methods (including validation set methods).
 Unsupervised learning: clustering methods (kmeans, agglomerative clustering) and basics of dimensionality reduction (PCA and variants).

Oppimateriaali ja kirjallisuus 
 Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani: An Introduction to Statistical Learning with Applications in R, Springer, 2017.
 Additional readings are announced during the course
Parts of the textbook that are required are specified on the course web page. 
Oppimista tukevat aktiviteetit ja opetusmenetelmät 
The course includes lectures, solving exercises, and doing the term project.

Arviointimenetelmät ja kriteerit 
Assessment and grading is based on completed exercises and term project. Possible other criteria will be specified on the course web page.

Vastuuhenkilö 
Kai Puolamäki 
Avainsanat 
Suitable for exchange students 
