New semester begins and a new round of training workshops to support research activities will be offered to students in September 2021. The topics of the training cover the use of GitLab service, Pilot HPC platform and machine learning with Python.
Gitlab Service for Research
The 'GitLab Service for Research' is a web-based DevOps (development + operation) lifecycle tool for software development. It provides PolyU researchers with on premises Git repository as an alternative for storing code base and implementing software project management. This workshop introduces and shares tips on the 'GitLab Service for Research'.
Date: 10 Sep (Fri)
Time: 14:30 – 17:00
Venue: Online and On-site
Pre-requisite: Basic OS (Linux & Windows) and programming knowledge
Target Audience: Rpg, Tpg, Ug Students
Medium of Instruction: English
Course Outline:
- Introduction to GitLab Service for Research
- Basic Operations of Git (Clone, push and pull)
- Branching of Git
- Managing Users and Groups under GitLab
- Permission of Project
- Examples of Integrating Pilot HPC Platform and GitLab
Registration: click here
Machine Learning with Python (1)
This workshop, consists of six sessions, aims at introducing participants the general workflow of building machine learning models using Python library 'scikit-learn' with practical examples.
Basic categories of machine learning, supervised machine learning algorithms, unsupervised learning algorithms, model validation methods, over-sampling and under-sample techniques will also be covered. This workshop provides participants the basic knowledge and skills to construct machine learning models.
Date: 14 Sep (Tue), 16 Sep (Thu), 23 Sep (Thu), 28 Sep (Tue), 30 Sep (Thu), 5 Oct (Tue)
Time: 14:30 – 17:00
Venue: Online and On-site
Pre-requisite: Basic programming concepts
Target Audience: Rpg, Tpg, Ug Students
Medium of Instruction: English
Certificate: Attended at least 5 lessons out of 6 lessons
Course Outline:
Lesson 1
- Introduction to Machine Learning
- Supervised learning, unsupervised learning and reinforcement learning
- Feature engineering
- Numerical data
- Categorical data
- Text feature
- Image feature - Modal pipeline
- Naïve Bayes Classifier
- Conditional probability and Bayes Theorem
- Gaussian Naïve Bayes
- Multinomial Naïve Bayes
Lesson 2
- Linear Regression
- Formulation and Gradient Descent
- Regression variations
- Simple linear regression
- Multiple linear regression
- Basis function regression - Regularization
- Ridge, lasso and elastic net
- Logistic Regression
- Formulation and Cost function (log loss)
- Example on breast cancer dataset
Lesson 3
- Support Vector Machine
- Basic Linear Algebra
- SVM optimization problem
- Linear and nonlinear boundary
- Soft margins - SVM on face recognition
- Decision Tree and Random Forest
- Decision Tree
- Terminology and mathematical expression
- Decision boundary - Random Forest
- Classification and regression - Visualizing tree models
- Problems with Tree-based algorithm
- Overfitting
- Bias on imbalance dataset
- Decision Tree
Lesson 4
- Principal Component Analysis
- Linear algebra prerequisite
- Orthogonal basis, eigenvectors and eigenvalues, covariance matrix - Applications of PCA
- Dimensional reduction
- Visualization of high dimensional data
- Noise filtering - Example: combine application with SVM to improve performance on face recognition
- Linear algebra prerequisite
- K-Means Clustering
- Lloyd’s algorithm
- Challenges of using K-means
- Non-linear boundary problems
- Spectral clustering - Use cases
- Data clustering and color compression
Lesson 5
- Modal Validation I
- Evaluation metrics for classification
- Accuracy, precision, recall, f1 score - Evaluation metrics for regression
- MAE, MSE and coefficient of determination - Training and testing
- Splitting data
- K-fold cross validation
- Leave-one-out cross validation
- Evaluation metrics for classification
- Modal Validation II
- Bias-variance trade-off
- Validation curve
- Learning curve
- Hyperparameter search
- Grid search and random search
Lesson 6
- Handling Imbalanced dataset
- Choosing the right metrics
- Resampling
- Random sampling
- Undersampling
(1) Tomek Links
(2) Near-Miss
- Oversampling
(1) SMOTE
(2) ADASYN
- Putting it all together – survival analysis
- Introduction to the problem
- Exploratory Data Analysis (EDA)
- Filling missing data
- Feature engineering
- Classification with random forest
- Hyperparameter tuning
Registration: click here
Pilot HPC Platform
The Pilot High Performance Computing (HPC) Platform allows PolyU research staff and students to test or develop their applications for research purpose. This workshop aims at getting users familiar with the platform and features, and how to leverage Pilot HPC Platform for research activities.
Date: 17 Sep (Fri)
Time: 14:30 – 17:00
Venue: Online and On-site
Pre-requisite: Basic Linux and programming knowledge
Target Audience: Research staff & Rpg, Tpg, Ug Students
Medium of Instruction: English
Course Outline:
- Introduction of Pilot HPC Platform
- Account application
- Resources available
- Core operations of using Pilot HPC Platform
- Application examples running on Pilot HPC Platform
- Jupyter Notebook on Pilot HPC Platform
Registration: click here