Fabio POSTIGLIONE | DATA SCIENCE
Fabio POSTIGLIONE DATA SCIENCE
cod. 0622700123
DATA SCIENCE
0622700123 | |
DEPARTMENT OF INFORMATION AND ELECTRICAL ENGINEERING AND APPLIED MATHEMATICS | |
EQF7 | |
COMPUTER ENGINEERING | |
2024/2025 |
OBBLIGATORIO | |
YEAR OF COURSE 1 | |
YEAR OF DIDACTIC SYSTEM 2022 | |
AUTUMN SEMESTER |
SSD | CFU | HOURS | ACTIVITY | ||
---|---|---|---|---|---|
DATA SCIENCE | |||||
ING-INF/03 | 4 | 32 | LESSONS | ||
ING-INF/03 | 2 | 16 | EXERCISES | ||
DATA SCIENCE | |||||
SECS-S/02 | 1 | 8 | LESSONS | ||
SECS-S/02 | 2 | 16 | LAB |
Objectives | |
---|---|
THE COURSE HAS THE TWOFOLD PURPOSE OF: I) ILLUSTRATING THE MAIN METHODOLOGIES OF INTEREST FOR STATISTICAL DATA ANALYSIS; II) APPLYING SUCH METHODOLOGIES TO RELEVANT PRACTICAL PROBLEMS, USING TOOLS COMMONLY EMPLOYED FOR STATISTICAL ANALYSIS, DATA VISUALIZATION AND PROCESSING. KNOWLEDGE AND UNDERSTANDING. •ACQUISITION OF THE MAIN STATISTICAL INFERENCE AND DATA ANALYSIS. •PARAMETRIC VS. NON PARAMETRIC APPROACHES. SUPERVISED VS. UNSUPERVISED APPROACHES. •ACQUISITION OF THE MAIN TECHNIQUES AND TOOLS FOR BIG DATA ANALYSIS. APPLYING KNOWLEDGE AND UNDERSTANDING. •ABILITY TO APPLY THE MAIN TECHNIQUES FOR STATISTICAL INFERENCE AND DATA ANALYSIS TO PRACTICAL PROBLEMS (E.G., SOCIAL OR BIOMEDICAL DATA). •ABILITY TO EXAMINE BIG DATA, ARRANGED IN RATHER COMPLEX AND/OR HETEROGENEOUS STRUCTURES • ABILITY TO USE SOFTWARE (E.G., R, PYTHON, MATLAB) FOR STATISTICAL DATA ANALYSIS, DATA VISUALIZATION AND PROCESSING. •ABILITY TO USE TOOLS OF PRACTICAL INTEREST FOR DATA ANALYTICS (E.G., APACHE SPARK) . |
Prerequisites | |
---|---|
PREREQUISITES: SUITABLE KNOWLEDGE OF MATHEMATICS AND FUNDAMENTALS OF PROBABILITY AND STATISTICS. |
Contents | |
---|---|
Didactic unit 1: Introduction and parametric methods (LECTURE/PRACTICE/LABORATORY HOURS 6/0/2) - 1 (2 Hours Lecture): Introduction to data analysis. Prediction vs. inference. Regression vs. classification. Parametric methods and Maximum Likelihood (ML) estimation. - 2 (2 Hours Lecture): Bayesian approach and Minimum-Mean-Squared-Error (MMSE). Cost functions for estimation and regression problems. - 3 (2 Hours Lecture): ML and MMSE estimators for classic Gaussian problems. - 4 (2 Hours Laboratory): Computer-aided implementation and performance evaluation of the estimators discussed in the previous lectures. KNOWLEDGE AND UNDERSTANDING: Parametric estimation methods for statistical learning problems. APPLYING KNOWLEDGE AND UNDERSTANDING: Designing and implementing parametric estimation algorithms. Didactic unit 2: Supervised learning methods for regression (LECTURE/PRACTICE/LABORATORY HOURS 20/0/8) - 5 (2 Hours Lecture): Regression function and supervised parametric models. - 6 (2 Hours Lecture): Simple linear regression. - 7 (2 Hours Lecture): Multiple linear regression. - 8 (2 Hours Lecture): Statistical inference. Hypothesis tests and p-value. - 9 (2 Hours Lecture): Variable selection. Stepwise procedures. - 10 (2 Hours Lecture): Data normalization. - 11 (2 Hours Lecture): Regularization/shrinkage strategies. Multicollinearity and high dimensionality. Ridge regression. - 12 (2 Hours Lecture): LASSO method. Dimensionality reduction. - 13 (2 Hours Laboratory): Computer-aided implementation of simple and multiple linear regression algorithms. - 14 (2 Hours Laboratory): Computer-aided implementation of inferential strategies. - 15 (2 Hours Laboratory): Computer-aided implementation of Ridge and LASSO regularization. - 16 (2 Hours Lecture): Cross-validation. Bootstrap - 17 (2 Hours Lecture): Supervised non-parametric methods. Local methods. Naïve-kernel. K-NN method. - 18 (2 Hours Laboratory): Computer-aided implementation of naïve-kernel and K-NN methods. KNOWLEDGE AND UNDERSTANDING: Regression models. Estimation of model parameters, variable selection, and significance tests to determine the influence factors and perform model interpretability. Regularization techniques in high-dimensional problems. APPLYING KNOWLEDGE AND UNDERSTANDING: Designing and implementing regression and inference algorithms, for prediction, data interpretation and evaluation of the statistical significance of the results. Didactic unit 3: Classification (LECTURE/PRACTICE/LABORATORY HOURS 12/0/12) - 19 (2 Hours Lecture): Parametric decision methods. Neyman-Pearson criterion and Bayesian approach. - 20 (2 Hours Lecture): Parametric detectors for a classic Gaussian problem. - 21 (2 Hours Laboratory): Computer-aided implementation of the parametric detectors for the Gaussian problem illustrated in the previous lecture. - 22 (2 Hours Lecture): Supervised methods and Naïve-Bayes. - 23 (2 Hours Lecture): Logistic regression. - 24 (2 Hours Lecture): Gradient descent algorithms for regression and classification problems. - 25 (2 Hours Laboratory): Computer-aided implementation of gradient descent algorithms for logistic regression. - 26 (2 Hours Laboratory): Computer-aided implementation of the Naïve-Bayes classifier. - 27 (2 Hours Laboratory): Computer-aided implementation of the logistic-regression classifier. - 28 (2 Hours Laboratory): Frameworks for a distributed implementation of data analysis algorithms. - 29 (2 Hours Lecture): LINEAR DISCRIMINANT ANALYSIS (LDA). - 30 (2 Hours Laboratory): Implementation of LDA algorithms. KNOWLEDGE AND UNDERSTANDING: Strategies for classification problems. Optimization algorithms for statistical learning (e.g., gradient-descent and stochastic-gradient-descent algorithms). APPLYING KNOWLEDGE AND UNDERSTANDING: Designing and implementing classification algorithms. Implementing distributed data-analysis algorithms by means of suitable frameworks. Didactic unit 4: Classification (LECTURE/PRACTICE/LABORATORY HOURS 8/0/4) - 31 (2 Hours Lecture): Principal Component analysis (PCA): Methodology. - 32 (2 Hours Lecture): Principal Component analysis: Interpretation and applications. - 33 (2 Hours Laboratory): Computer-aided implementation of PCA. - 34 (2 Hours Lecture): CLUSTERING. K-MEANS algorithm. Hierarchical CLUSTERING. - 35 (2 Hours Lecture): Expectation-Maximization algorithm. Dbscan algorithm. - 36 (2 Hours Laboratory): Computer-aided implementation of clustering algorithms. KNOWLEDGE AND UNDERSTANDING: Unsupervised statistical learning strategies. PCA and clustering. APPLYING KNOWLEDGE AND UNDERSTANDING: Designing and implementing algorithms for unsupervised statistical learning problems. TOTAL LECTURE/PRACTICE/LABORATORY HOURS 46/0/26 |
Teaching Methods | |
---|---|
THE COURSE INCLUDES THEORETICAL LECTURES AND CLASSROOM EXERCISES ALSO WITH THE USAGE OF COMPUTERS. |
Verification of learning | |
---|---|
THE FINAL EXAM CONSISTS OF DISCUSSING A PROJECT WORK, AIMED AT EVALUATING: THE KNOWLEDGE AND UNDERSTANDING OF THE CONCEPTS PRESENTED DURING THE COURSE; THE ABILITY OF SOLVING STATISTICAL-DATA-ANALYSIS PROBLEMS BY APPLYING THE METHODS AND TOOLS ILLUSTRATED DURING THE COURSE. FURTHERMORE, THE PERSONAL JUDGEMENT, THE COMMUNICATION SKILLS AND THE LEARNING ABILITIES ARE ALSO EVALUATED. |
Texts | |
---|---|
AN INTRODUCTION TO STATISTICAL LEARNING, G. JAMES, D. WITTEN, T. HASTIE, R. TIBSHIRANI, SPRINGER, 2013. AN ELEMENTARY INTRODUCTION TO STATISTICAL LEARNING, S. KULKARNI, G. HARMAN, WILEY, 2010. SUPPLEMENTARY TEACHING MATERIAL WILL BE AVAILABLE ON THE UNIVERSITY E-LEARNING PLATFORM (HTTP://ELEARNING.UNISA.IT) ACCESSIBLE TO STUDENTS USING THEIR OWN UNIVERSITY CREDENTIALS. |
More Information | |
---|---|
THE COURSE IS HELD IN ITALIAN. |
BETA VERSION Data source ESSE3 [Ultima Sincronizzazione: 2024-11-18]