APPLIED STATISTICS

CLAUDIA ANGELINI APPLIED STATISTICS

0522100042
DEPARTMENT OF CHEMISTRY AND BIOLOGY "ADOLFO ZAMBELLI"
EQF7
BIOLOGY
2024/2025

YEAR OF COURSE 1
YEAR OF DIDACTIC SYSTEM 2022
SPRING SEMESTER
CFUHOURSACTIVITY
324LESSONS
336LAB
Objectives
The objective of the course is to provide students with i) the basics of statistical reasoning, ii) the tools to organize experimental data, and produce descriptive and exploratory graphs, iii) the use of statistical models for the analysis of experimental data, iv) the skills to communicate and discuss the results of biological data analysis, v) the skills to apply these concepts independently through the analysis of real data to be performed during laboratory activities. Overall, the course aims to deepen methodological and applicative aspects using the R statistical software.
Knowledge and understanding skills:
-Knowledge of the theoretical principles of inferential statistics.
-Understand the scientific information conveyed by the various graphical representations and summary tables.
-Understand the methods for hypothesis testing.
-Understand the methods for using a linear or logistic regression model and evaluating the goodness of fit.
-Understand the Cluster Analysis methods
-Understand the methods of Principal Component Analysis
-Understand regularization and feature selection methods in a regression context
-Understand the fundamentals of the R programming statistical environment.
-Understand the elements and tools for reproducible Computational Research.

Ability to apply knowledge and understanding:
-Ability to visualize and synthesize biological data sets using exploratory techniques.
-Ability to perform pre-processing, filtering, and table integration to combine and extract information from large datasets in tabular form.
-Ability to perform statistical hypothesis tests.
-Ability to analyze biological data sets using linear or logistic regression models (simple and multiple).
-Ability to analyze biological data sets using Clustering techniques.
-Ability to analyze biological data sets using principal component analysis or other dimensionality reduction techniques.
-Ability to use the statistical language R.
-Ability to use computational tools for reproducible research.
Autonomy of Judgment:
The student(s) will be able to evaluate independently:
-The quality of organization and information content in experimental data, with reference to data collected in laboratories of the degree program
-The level of reliability expected for the algorithms and statistical methods used
-The identification of potentially critical elements (such as the presence of outliers or other anomalies) in the data
Communication Skills:
The student(s) will be able to
-Explain and illustrate concisely and clearly the results of a statistical analysis using graphs and tables
-Produce written reports, using appropriate scientific terminology, related to data analysis performed, in which he or she will describe in sufficient detail the statistical procedures adopted and critically analyze the results obtained, organizing them in a structure like that of a scientific article.
-Produce concise presentations of these results, using commonly used tools such as PowerPoint presentations or similar, organizing them in a structure like that of a report at a scientific conference.
-Support a discussion from these presentations, describing the methodologies used and answering relevant questions.
Prerequisites
Elements of linear algebra such as matrices, vectors, and related operations such as the computation of matrix eigenvalues and eigenvectors from a general mathematics course in a bachelor's degree. Ability to use the computer to perform practical exercises.
Contents
Theoretical lectures (24 hours)
-From probability to statistics: probability, estimators, correlation measures, association measures, and elements of descriptive statistics (6 hours).
-Statistical hypothesis testing (2 hours)
-Fundamentals of Linear Algebra, distance, similarity, and dissimilarity measures and transformations of random variables (2 hours).
-Simple and multiple linear regression (4 hours).
-Algorithms for Hierarchical Clustering and/or Partitional Clustering (2 hours).
-Principal component analysis (PCA) (2 hours).
-Generalized linear regression models: Logistic regression (2 hours).
-Advanced statistics methods (regularization, performance evaluation, multiple tests)
Laboratory activities (36 hours)
-Introduction to the R programming environment (2 hours).
-Data and data structures, reading and writing files, using graphical functions, and control structures in R (6 hours).
-Descriptive statistics functions and graphs in R (2 hours).
-Statistical Tests in R(3 hours)
-Distance functions, similarity, and dissimilarity in R (3 hours).
-Simple linear regression and multiple linear regression (6 hours).
-Hierarchical clustering and partitional clustering in R (3 hours).
-Principal component analysis in R (3 hours).
-Generalized linear regression models: logistic regression (3 hours).
-Model selection and regularization (3 hours)
-How to create professional reports using R and Rmarkdown (2 hours).
Teaching Methods
The course includes 60 hours of teaching between theoretical lectures and classroom exercises with the computer. During the exercises, students will analyze the data available in the literature. The approach consists of 1) formulation of the statistical problem, 2) planning of the steps for the analysis, 3) the analysis, and 4) discussion of the results. This last phase promotes the ability to evaluate the reasonableness of the results and verify the consistency with the methodologies used.
Verification of learning
The achievement of the objectives will be verified by
1) Written test consisting of the writing of a short report (i.e., project) containing the results of analysis (carried out with the support of the statistical software R) of one or more datasets using the methods in the course program (statistical tests, regression, clustering, PCA, model selection with regularization or other) such as to adhere to the dictates of reproducible computational research.
2) Oral test consists of discussing the project in a 5-minute presentation and answering at least three questions on the theoretical and methodological contents indicated in the program. The oral interview aims to verify the ability to expose using the appropriate scientific terminology and the ability to organize the exposure on the same topics independently.
The final grade is expressed in thirtieths, of which a maximum of 10 points for the written report (project), a maximum of 5 points for the oral presentation of the project, and a maximum of 5 points for each of the 3 questions.
Texts
Didactic material provided by the teacher, and at least one of the following texts (limited to the chapters related to the teaching program)
-BRIAN EVERITT AND TORSTEN HOTHORN. AN INTRODUCTION TO APPLIED MULTIVARIATE ANALYSIS WITH R, SPRINGER 2011
-ZELTERMAN, D. APPLIED MULTIVARIATE STATISTICS WITH R. SPRINGER (2015)
-HÄRDLE, W.K. AND SIMAR, L., APPLIED MULTIVARIATE STATISTICAL ANALYSIS, FOURTH EDITION. SPRINGER (2015)
-PETER DALGAARD. INTRODUCTORY STATISTICS WITH R, 2ND EDITION. SPRINGER 2008
-ALAIN F. ZUUR, ELENA N. IENO, ERIK H.W.G. MEESTERS. A BEGINNER’S GUIDE TO R. SPRINGER 2009
More Information
Regular attendance to both theoretical and practical lessons is strongly recommended.

  BETA VERSION Data source ESSE3 [Ultima Sincronizzazione: 2024-11-18]