Unsupervised Learning
Beschrijving
Unsupervised learning methods aim to support insights from data by revealing relevant structure in function of a goal. To do this efficiently when many variables present themselves, we will focus on methods that cluster the data into subsets that share common features or have observations that are `close’ in terms of some well-chosen distance measure. Once the instructions are formalized, this can happen semi-automatically without much prior knowledge of the detailed meaning of individual variables that make up the data set. The term Unsupervised learning collects clustering methods, dimensionality reduction algorithms and associations that are not aiming for the prediction or estimation of a jointly observed outcome variable.
In this course we will focus on clustering and dimensionality reduction methods. We start by reviewing different types of applications where unsupervised learning methods are called upon. We show how dimensionality reduction methods are used to simplify data sets to enhance data visualization, reduce noise and speed up performance and we illustrate how the use of clustering methods can reduce complexity of large data sets into interpretable categories and segments.
The different approaches are then situated in the wide field of AI, machine learning and data analysis. We continue with the introduction of Principle Component Analysis, or PCA. PCA is often used as a pre-processing method before application of other methods, including a clustering method. It is a linear dimensionality reduction technique which transforms existing, possibly highly correlated variables, into fewer but independent variables retaining most information contained in the dataset. Afterwards we move on to the clustering methods. In this course, we will discuss three methods: K-Means, agglomerative hierarchical clustering and density-based clustering, more specifically DBSCAN. In addition to these methods, we present different metrics to evaluate the performance of the applied method. Finally, we discuss the applicability of these algorithms and their differences. Depending on the distributions of the variables and the research question or goal at hand, choices have to be made. Is the data `well-behaved’ (e.g. Gaussian distributions) or do the data exhibit complex patterns or noise? Is the data strongly correlated? Do we want to explore a single level of grouping, or do we want to keep granularity in our findings? We will aim to properly address these questions and formulate a suitable framework based on the presented algorithms.
We alternate theoretical sessions with practical sessions to allow participants to apply and practice the new algorithms. This will further enhance insight in the method as well as programming skills in interaction with the teacher. The student can choose to work with either R or Python.
The course material includes slides, datasets and notebooks of examples. Solutions to additional exercises will also be given.
The objective of the course is that you can perform the new algorithms on simulated data sets and in context.
Target Audience
This course targets professionals and investigators from diverse areas with basic Python or R experience who wish to start using automated clustering algorithms for their data preparation or data exploration.
Course Prerequisites:
The course is open to all interested persons, but basic Python (at a level equivalent to Module 4 (Getting started with Python of this year’s program) or R (at a level equivalent to Module 2 ( Getting started with R software of this year’s program) knowledge is expected.
Exam / Certificate
There is no exam connected to this module. If you attend both classes, you will receive a certificate of attendance via e-mail at the end of the course.
Course material
You will get access to slides, data files and code (notebooks).
Type of course
This is an on campus course. We offer blended learning options if, exceptionally, you cannot attend a session on campus.
Schedule
February 24th and March 3th, from 5.30 pm to 9 pm
Venue
Faculty of Science, Campus Sterre, Krijgslaan 281, 9000 Ghent
Fees
The participation fee is 455 EUR for participants from the private sector. Reduced prices apply to students and staff from non-profit, social profit, and government organizations
- Industry, private sector, profession*: € 455
- Non profit, government, higher education staff, (Doctoral) students, unemployed: € 210
*If two or more employees from the same company enrol simultaneously for this course a reduction of 20% on the course fee is taken into account starting from the second enrolment.
Registration
To register, add the course below to your shopping cart and proceed to checkout.
Is this your first registration for a Beta Academy course? In that case, you will need to create an account first. Afterward, you will receive a confirmation email to activate your account on the academy platform. You do not have to click on the activation link but can immediately return to your shopping cart to complete your course registration. If you do not receive a confirmation email for your course order, please contact our Science Academy at science-academy@ugent.be.
Are you currently on the Nova-academy website? To proceed with the registration, simply click on the "More information" box located on the left side.
UGent PhD Students
Doctoral School pays for your course on the condition hat you sign the attendance list for each lesson. If you are absent, please notify our academy in advance by email and provide the necessary documents.
By registering for a course or event organized by the Science Academy, you agree to the cancellation procedure that you can find on our website.
KMO-portefeuille
Information on "KMO-portefeuille":https://www.ugent.be/nl/opleidingen/levenslang-leren/kmo
Organisation
Science Academy
Faculty of Science
science-academy@ugent.be