STATISTICAL DATA ANALYSIS

Fabio POSTIGLIONE STATISTICAL DATA ANALYSIS

0622700059
DIPARTIMENTO DI INGEGNERIA DELL'INFORMAZIONE ED ELETTRICA E MATEMATICA APPLICATA
EQF7
COMPUTER ENGINEERING
2019/2020



OBBLIGATORIO
YEAR OF COURSE 1
YEAR OF DIDACTIC SYSTEM 2017
SECONDO SEMESTRE
CFUHOURSACTIVITY
1STASTICAL DATA ANALYSIS
324LESSONS
324EXERCISES
2STASTICAL DATA ANALYSIS
18EXERCISES
216LESSONS
Objectives
THE COURSE HAS THE TWOFOLD PURPOSE OF: I) ILLUSTRATING THE MAIN METHODOLOGIES OF INTEREST FOR STATISTICAL DATA ANALYSIS; II) APPLYING SUCH METHODOLOGIES TO RELEVANT PRACTICAL PROBLEMS, USING TOOLS COMMONLY EMPLOYED FOR STATISTICAL ANALYSIS, DATA VISUALIZATION AND PROCESSING.


KNOWLEDGE AND UNDERSTANDING.
•ACQUISITION OF THE MAIN STATISTICAL INFERENCE AND DATA ANALYSIS.
•PARAMETRIC VS. NON PARAMETRIC APPROACHES. SUPERVISED VS. UNSUPERVISED APPROACHES.
•ACQUISITION OF THE MAIN TECHNIQUES AND TOOLS FOR BIG DATA ANALYSIS.

APPLICATION KNOWLEDGE AND UNDERSTANDING.
•ABILITY TO APPLY THE MAIN TECHNIQUES FOR STATISTICAL INFERENCE AND DATA ANALYSIS TO PRACTICAL PROBLEMS (E.G., SOCIAL OR BIOMEDICAL DATA).
•ABILITY TO EXAMINE BIG DATA, ARRANGED IN RATHER COMPLEX AND/OR HETEROGENEOUS STRUCTURES
• ABILITY TO USE SOFTWARE (E.G., R, MATLAB) FOR STATISTICAL DATA ANALYSIS, DATA VISUALIZATION AND PROCESSING.
•ABILITY TO USE TOOLS OF PRACTICAL INTEREST FOR DATA ANALYTICS (E.G., APACHE SPARKS) .
Prerequisites
PREREQUISITES: SUITABLE KNOWLEDGE OF MATHEMATICS AND FUNDAMENTALS OF PROBABILITY AND STATISTICS.

Contents
- FUNDAMENTALS OF STATISTICS (HOURS FOR LECTURE/EXERCISES: 7/3)
STATISTICAL INFERENCE, PARAMETRIC METHODS, MAXIMUM LIKELIHOOD. DECISION THEORY. BAYESIAN APPROACH.

- DATA NORMALIZATION. WHITENING (1/1)

- INTRODUCTION TO SUPERVISED LEARNING AND LINEAR MODELS (6/3)
MULTIPLE LINEAR REGRESSION. GENERALIZED LINEAR MODELS.

CLASSIFICATION (11/5)
LOGISTIC REGRESSION. LINEAR DISCRIMINANT ANALYSIS. BAYESIAN FORMULATION OF REGRESSION/CLASSIFICATION. BIAS AND VARIANCE. NAÏVE-BAYES. NONPARAMETRIC SUPERVISED APPROACHES. EXAMPLES: NAÏVE-KERNEL, NEAREST-NEIGHBOR AND K-NEAREST-NEIGHBOR.

- RESAMPLING (2/1)
CROSS-VALIDATION (LOO, K-FOLD). BOOTSTRAP.

- LINEAR MODEL SELECTION AND REGULARIZATION (9/3)
STEPWISE SELECTION. RIDGE REGRESSION. LASSO. DIMENSIONALITY REDUCTION. PRINCIPAL COMPONENT REGRESSION. EXTENSION TO HIGH-DIMENSIONAL DATA. SPARSITY-AWARE METHODS FOR BIG DATA ANALYTICS.

- GENERALIZED ADDITIVE MODELS AND TREE-BASED METHODS (HOURS: LESSONS/EXERCISES/LABORATORY 1/0/0)

- SUPPORT VECTOR MACHINES (1/1)

- UNSUPERVISED LEARNING (11/5)
PRINCIPAL COMPONENTS ANALYSIS. CENTROID-BASED CLUSTERING: K-MEANS. HIERARCHICAL CLUSTERING. OTHER EXAMPLES OF CLUSTERING. GAUSSIAN MIXTURES AND THE EXPECTATION-MAXIMIZATION ALGORITHM. DENSITY-BASED CLUSTERING: DBSCAN. NONPARAMETRIC STATISTICS AND

- INTRODUCTION TO FUNCTIONAL DATA ANALYSIS (2/0)

- SOFTWARE AND TOOLS:
R
MATLAB
APACHE SPARK
Teaching Methods
THE COURSE INCLUDES THEORETICAL LECTURES AND CLASSROOM EXERCISES ALSO WITH THE USAGE OF COMPUTERS.
Verification of learning
THE FINAL EXAM CONSISTS OF DISCUSSING A PROJECT WORK, AIMED AT EVALUATING: THE KNOWLEDGE AND UNDERSTANDING OF THE CONCEPTS PRESENTED DURING THE COURSE; THE ABILITY OF SOLVING STATISTICAL-DATA-ANALYSIS PROBLEMS APPLYING THE METHODS AND TOOLS ILLUSTRATED DURING THE COURSE. FURTHERMORE, THE PERSONAL JUDGEMENT, THE COMMUNICATION SKILLS AND THE LEARNING ABILITIES ARE ALSO EVALUATED.
Texts
AN INTRODUCTION TO STATISTICAL LEARNING,
G. JAMES, D. WITTEN, T. HASTIE, R. TIBSHIRANI,
SPRINGER, 2013.

AN ELEMENTARY INTRODUCTION TO STATISTICAL LEARNING,
S. KULKARNI, G. HARMAN,
WILEY, 2010.
More Information
THE COURSE LANGUAGE IS ENGLISH.
  BETA VERSION Data source ESSE3 [Ultima Sincronizzazione: 2021-02-19]