STATISTICAL DATA ANALYSIS

Fabio POSTIGLIONE STATISTICAL DATA ANALYSIS

0622900026
DIPARTIMENTO DI INGEGNERIA DELL'INFORMAZIONE ED ELETTRICA E MATEMATICA APPLICATA
EQF7
DIGITAL HEALTH AND BIOINFORMATIC ENGINEERING
2020/2021



YEAR OF COURSE 2
YEAR OF DIDACTIC SYSTEM 2018
SECONDO SEMESTRE
CFUHOURSACTIVITY
1STATISTICAL DATA ANALYSIS (MODULO 1)
324LESSONS
324EXERCISES
2STATISTICAL DATA ANALYSIS (MODULO 2)
218LESSONS
18EXERCISES


Objectives
THE COURSE HAS THE TWOFOLD PURPOSE OF: I) ILLUSTRATING THE MAIN METHODOLOGIES OF INTEREST FOR STATISTICAL DATA ANALYSIS; II) APPLYING SUCH METHODOLOGIES TO RELEVANT PRACTICAL PROBLEMS, USING TOOLS COMMONLY EMPLOYED FOR STATISTICAL ANALYSIS, DATA VISUALIZATION AND PROCESSING.


KNOWLEDGE AND UNDERSTANDING.
•ACQUISITION OF THE MAIN STATISTICAL INFERENCE AND DATA ANALYSIS.
•PARAMETRIC VS. NON PARAMETRIC APPROACHES. SUPERVISED VS. UNSUPERVISED APPROACHES.
•ACQUISITION OF THE MAIN TECHNIQUES AND TOOLS FOR BIG DATA ANALYSIS.

APPLICATION KNOWLEDGE AND UNDERSTANDING.
•ABILITY TO APPLY THE MAIN TECHNIQUES FOR STATISTICAL INFERENCE AND DATA ANALYSIS TO PRACTICAL PROBLEMS (E.G., SOCIAL OR BIOMEDICAL DATA).
•ABILITY TO EXAMINE BIG DATA, ARRANGED IN RATHER COMPLEX AND/OR HETEROGENEOUS STRUCTURES
• ABILITY TO USE SOFTWARE (E.G., R, MATLAB) FOR STATISTICAL DATA ANALYSIS, DATA VISUALIZATION AND PROCESSING.
•ABILITY TO USE TOOLS OF PRACTICAL INTEREST FOR DATA ANALYTICS (E.G., APACHE SPARKS) .
Prerequisites
PREREQUISITES: SUITABLE KNOWLEDGE OF MATHEMATICS AND FUNDAMENTALS OF PROBABILITY AND STATISTICS.

Contents
- FUNDAMENTALS OF STATISTICS (HOURS FOR LECTURE/EXERCISES: 7/3)
STATISTICAL INFERENCE, PARAMETRIC METHODS, MAXIMUM LIKELIHOOD. DECISION THEORY. BAYESIAN APPROACH.

- DATA NORMALIZATION. WHITENING (1/1)

- INTRODUCTION TO SUPERVISED LEARNING AND LINEAR MODELS (6/3)
MULTIPLE LINEAR REGRESSION. GENERALIZED LINEAR MODELS.

CLASSIFICATION (11/5)
LOGISTIC REGRESSION. LINEAR DISCRIMINANT ANALYSIS. BAYESIAN FORMULATION OF REGRESSION/CLASSIFICATION. BIAS AND VARIANCE. NAÏVE-BAYES. NONPARAMETRIC SUPERVISED APPROACHES. EXAMPLES: NAÏVE-KERNEL, NEAREST-NEIGHBOR AND K-NEAREST-NEIGHBOR.

- RESAMPLING (2/1)
CROSS-VALIDATION (LOO, K-FOLD). BOOTSTRAP.

- LINEAR MODEL SELECTION AND REGULARIZATION (9/3)
STEPWISE SELECTION. RIDGE REGRESSION. LASSO. DIMENSIONALITY REDUCTION. PRINCIPAL COMPONENT REGRESSION. EXTENSION TO HIGH-DIMENSIONAL DATA. SPARSITY-AWARE METHODS FOR BIG DATA ANALYTICS.

- GENERALIZED ADDITIVE MODELS AND TREE-BASED METHODS (HOURS: LESSONS/EXERCISES/LABORATORY 1/0/0)

- SUPPORT VECTOR MACHINES (1/1)

- UNSUPERVISED LEARNING (11/5)
PRINCIPAL COMPONENTS ANALYSIS. CENTROID-BASED CLUSTERING: K-MEANS. HIERARCHICAL CLUSTERING. OTHER EXAMPLES OF CLUSTERING. GAUSSIAN MIXTURES AND THE EXPECTATION-MAXIMIZATION ALGORITHM. DENSITY-BASED CLUSTERING: DBSCAN. NONPARAMETRIC STATISTICS AND

- INTRODUCTION TO FUNCTIONAL DATA ANALYSIS (2/0)

- SOFTWARE AND TOOLS:
R
MATLAB
APACHE SPARK
Teaching Methods
THE COURSE INCLUDES THEORETICAL LECTURES AND CLASSROOM EXERCISES ALSO WITH THE USAGE OF COMPUTERS.
Verification of learning
THE FINAL EXAM CONSISTS OF DISCUSSING A PROJECT WORK, AIMED AT EVALUATING: THE KNOWLEDGE AND UNDERSTANDING OF THE CONCEPTS PRESENTED DURING THE COURSE; THE ABILITY OF SOLVING STATISTICAL-DATA-ANALYSIS PROBLEMS APPLYING THE METHODS AND TOOLS ILLUSTRATED DURING THE COURSE. FURTHERMORE, THE PERSONAL JUDGEMENT, THE COMMUNICATION SKILLS AND THE LEARNING ABILITIES ARE ALSO EVALUATED.
Texts
AN INTRODUCTION TO STATISTICAL LEARNING,
G. JAMES, D. WITTEN, T. HASTIE, R. TIBSHIRANI,
SPRINGER, 2013.

AN ELEMENTARY INTRODUCTION TO STATISTICAL LEARNING,
S. KULKARNI, G. HARMAN,
WILEY, 2010.
More Information
THE COURSE LANGUAGE IS ENGLISH.
  BETA VERSION Data source ESSE3 [Ultima Sincronizzazione: 2022-05-23]