STATISTICS AND DATA ANALYSIS

Luigi Di Biasi STATISTICS AND DATA ANALYSIS

0522500094
COMPUTER SCIENCE
EQF7
COMPUTER SCIENCE
2024/2025



OBBLIGATORIO
YEAR OF COURSE 1
YEAR OF DIDACTIC SYSTEM 2016
AUTUMN SEMESTER
CFUHOURSACTIVITY
972LESSONS
ExamDate
APPELLO PROF. CIRILLO05/02/2025 - 09:00
APPELLO PROF. CIRILLO05/02/2025 - 09:00
APPELLO PROF. CIRILLO19/02/2025 - 09:00
APPELLO PROF. CIRILLO19/02/2025 - 09:00
Objectives
THE COURSE AIMS TO PROVIDE STUDENTS WITH THE THEORETICAL KNOWLEDGE AND PRACTICAL SKILLS RELATED TO STATISTICAL DATA ANALYSIS, WITH A PARTICULAR FOCUS ON THE USE OF THE R PROGRAMMING LANGUAGE.
SPECIFICALLY, THE COURSE AIMS TO:
•DEVELOP STUDENTS' MASTERY OF METHODS AND TECHNIQUES FOR PROCESSING AND ANALYZING COMPLEX DATA USING THE R LANGUAGE, A WIDELY USED TOOL IN THE FIELD OF COMPUTER SCIENCE AND DATA SCIENCE.
•PROVIDE STUDENTS WITH A SOLID FOUNDATION IN DESCRIPTIVE AND INFERENTIAL STATISTICS, WITH PARTICULAR ATTENTION TO THE MOST WIDELY USED DATA ANALYSIS METHODOLOGIES AND THEIR APPLICATIONS IN COMPUTER SCIENCE CONTEXTS.
•DEVELOP STUDENTS' ABILITY TO APPLY THE KNOWLEDGE ACQUIRED TO REAL-WORLD PROBLEMS RELATED TO THE MANAGEMENT, MANIPULATION, AND ANALYSIS OF STATISTICAL DATA.
•GUIDE STUDENTS IN DEVELOPING APPLICATIONS FOR THE MANAGEMENT, ANALYSIS, AND VISUALIZATION OF STATISTICAL DATA, TAKING ADVANTAGE OF THE CAPABILITIES OF THE R LANGUAGE.
KNOWLEDGE AND UNDERSTANDING
•DEVELOPMENT OF METHODS AND TECHNIQUES FOR DATA PROCESSING AND ANALYSIS USING ONE OF THE MOST POWERFUL AND FLEXIBLE STATISTICAL SOFTWARE, THE R PROGRAMMING LANGUAGE
•DESCRIPTIVE AND INFERENTIAL STATISTICS WITH R
ABILITY TO APPLY KNOWLEDGE AND UNDERSTANDING
•APPLIED PROBLEMS RELATED TO DATA PROCESSING AND ANALYSIS
•DEVELOPMENT OF COMPUTER APPLICATIONS FOR THE MANAGEMENT, MANIPULATION AND ANALYSIS OF STATISTICAL DATA
THE STUDENT WILL BE ABLE TO:
•UNDERSTAND THE FUNDAMENTAL PRINCIPLES OF DESCRIPTIVE AND INFERENTIAL STATISTICS.
•FORMULATE STATISTICAL HYPOTHESES AND TEST THEM USING APPROPRIATE METHODS.
•APPLY METHODS AND TECHNIQUES OF INFERENTIAL STATISTICS FOR THE ANALYSIS OF COMPLEX DATA.
•APPLY THE ACQUIRED KNOWLEDGE AND SKILLS IN REAL-WORLD CONTEXTS.
•USE THE R LANGUAGE AS A TOOL FOR STATISTICAL ANALYSIS OF COMPLEX DATA.
•IMPLEMENT LINEAR AND NON-LINEAR STATISTICAL MODELS USING THE R LANGUAGE.
•ADDRESS APPLICATION PROBLEMS RELATED TO DATA PROCESSING AND ANALYSIS IN COMPUTER SCIENCE CONTEXTS.
THE STUDENT WILL BE ABLE TO:
•EVALUATE SUITABLE DATA SOURCES TO ADDRESS REAL-WORLD PROBLEMS USING INFERENTIAL STATISTICAL METHODS.
•DEVELOP THE ABILITY TO CRITICALLY ANALYZE COMPLEX PROBLEMS, IDENTIFYING KEY VARIABLES AND FORMULATING HYPOTHESES FOR SOLUTIONS.
•ASSESS THE GOODNESS OF STATISTICAL METHODOLOGIES AS WELL AS PREDICTIVE AND REGRESSION MODELS WHEN APPLIED TO COMPLEX DATA SOURCES.
•CRITICALLY EVALUATE THE VALIDITY AND RELIABILITY OF RESEARCH BASED ON STATISTICAL ANALYSIS.
THE ACQUISITION OF THESE TRANSVERSAL SKILLS, IN ADDITION TO THE SPECIFIC SKILLS IN THE FIELD OF DATA ANALYSIS AND STATISTICS, REPRESENTS A SIGNIFICANT ADDED VALUE FOR STUDENTS, ALLOWING THEM TO SUCCESSFULLY ENTER THE WORKFORCE AND MEET THE CHALLENGES POSED BY TODAY'S SOCIETY, CHARACTERIZED BY INCREASING COMPLEXITY AND RAPID CHANGE.
IN ADDITION TO THE TECHNICAL AND TRANSVERSAL SKILLS ALREADY LISTED, THE COURSE IN STATISTICS AND DATA ANALYSIS AIMS TO DEVELOP VARIOUS COMMUNICATION SKILLS IN STUDENTS. THESE SKILLS WILL BE ESSENTIAL TO ENABLE STUDENTS TO:
•COMMUNICATE CLEARLY, CONCISELY, AND EFFECTIVELY ABOUT DATA STATISTICS, EVEN WITH NON-STATISTICS EXPERTS.
•EFFECTIVELY REPRESENT THE RESULTS OF STATISTICAL ANALYSIS THROUGH VISUALIZATION PARADIGMS AND STATISTICAL DATA GRAPHS.
•WRITE CLEAR, COMPLETE, AND WELL-STRUCTURED STATISTICAL ANALYSIS REPORTS.
•CONDUCT A CONSTRUCTIVE DEBATE WITH OTHER INTERLOCUTORS BASED ON SOLID STATISTICAL EVIDENCE.
•EFFECTIVELY CONSULT AND UTILIZE SCIENTIFIC AND TECHNICAL LITERATURE.
•CONTINUOUSLY UPDATE ONE'S KNOWLEDGE USING TECHNICAL AND SCIENTIFIC LITERATURE.
•APPROACH CLASSIFICATION AND PREDICTION PROBLEMS WITH A PRELIMINARY STATISTICAL UNDERSTANDING OF THE DATA.
•FORMULATE WELL-DEFINED ANALYTICAL AND RESEARCH QUESTIONS THAT ARE BASED ON SOUND THEORETICAL PREMISES.
Prerequisites
BASIC KNOWLEDGE OF PROBABILITY AND STATISTICS.
Contents
THE COURSE WILL FOCUS ON THE FOLLOWING TOPICS:
•THE INTEGRATED ENVIRONMENT R: INTRODUCTION AND HISTORICAL NOTES. (LESSONS, 2H)
•VECTORS, ARRAYS, AND MATRICES. LISTS. DATA FRAME. FACTORS. DEFINITION OF NEW FUNCTIONS. (LESSONS, 4H)
•TABLES AND GRAPHS: SIMPLE FREQUENCY DISTRIBUTIONS. DOUBLE FREQUENCY DISTRIBUTIONS. CONDITIONED FREQUENCY DISTRIBUTIONS. THE MAIN GRAPHICAL REPRESENTATIONS. GRAPHIC FUNCTIONS AT A HIGH LEVEL, LOW LEVEL AND INTERACTIVE GRAPHICS. BAR CHARTS, PIE CHARTS AND STICKS. HISTOGRAMS. BOXPLOT. PARETO DIAGRAM. GRAPHICAL REPRESENTATIONS OF TABLES. GRAPHICAL REPRESENTATIONS TO COMPARE VARIABLES. SCATTERPLOT. GRAPHS OF FUNCTIONS. (LESSONS, 6H)
•UNIVARIATE DESCRIPTIVE STATISTICS WITH R: INTRODUCTION TO DESCRIPTIVE STATISTICS. EMPIRICAL DISCRETE AND CONTINUOUS DISTRIBUTION FUNCTION. POSITION AND DISPERSION INDICES. SAMPLE MEAN, SAMPLE MEDIAN AND SAMPLE MODE. PERCENTILES AND QUARTILES. SAMPLE VARIANCE, SAMPLE STANDARD DEVIATION, AND COEFFICIENT OF VARIATION. THE FORM OF A FREQUENCY DISTRIBUTION. SKEWNESS AND KURTOSIS. WEIGHTED AVERAGE. (LESSONS, 8H)
•BIVARIATE DESCRIPTIVE STATISTICS WITH R
•CORRELATION, COVARIANCE, AND CORRELATION COEFFICIENT. LINEAR AND NONLINEAR REGRESSION MODELS. RESIDUES AND DETERMINATION COEFFICIENT. (LESSONS,8H)
•TECHNIQUES OF MULTIVARIATE STATISTICAL ANALYSIS WITH R. CLUSTER ANALYSIS. INTRODUCTION TO THE ANALYSIS OF THE CLUSTER. BASICS AND DEFINITIONS. FUNCTIONS OF DISTANCE AND SIMILARITY MEASURES. OPTIMIZATION METHODS. HIERARCHICAL METHODS. ANALYSIS OF THE DENDROGRAM. NON-HIERARCHICAL METHODS.
•SYNTHESIS MEASURES ASSOCIATED WITH CLUSTERS. (LESSONS, 8H)
•INTRODUCTION TO STATISTICAL INFERENCE. (LESSONS, 2H)
•DISCRETE RANDOM VARIABLES WITH R: DISCRETE PROBABILITY DISTRIBUTIONS AND THEIR SIMULATION (BERNOULLI, BINOMIAL, GEOMETRIC, MODIFIED GEOMETRIC, NEGATIVE BINOMIAL, MODIFIED NEGATIVE BINOMIAL, HYPERGEOMETRIC, POISSON). SOME IMPORTANT RESULTS RELATED TO THE DISCRETE RANDOM VARIABLES ANALYZED WITH THE SIMULATION IN R. (LESSONS, 5H)
•CONTINUOUS RANDOM VARIABLES WITH R:
•CONTINUOUS PROBABILITY DISTRIBUTIONS AND THEIR SIMULATION (UNIFORM, EXPONENTIAL, NORMAL, CHI-SQUARE, STUDENT). SOME IMPORTANT RESULTS RELATED TO THE CONTINUOUS RANDOM VARIABLES ANALYZED WITH THE SIMULATION IN R. (LESSONS, 5H)
•STATISTICAL INFERENCE WITH R: POINT ESTIMATION. PROPERTIES OF ESTIMATORS. METHODS FOR THE SEARCH OF ESTIMATORS. METHODS OF MOMENTS AND OF THE MAXIMUM LIKELIHOOD. (LESSONS, 4H)
•INTERVAL ESTIMATION WITH R: CONFIDENCE INTERVALS. CONFIDENCE INTERVALS FOR THE MEAN AND THE VARIANCE OF A NORMAL POPULATION. (LESSONS, 6H)
•INTERVAL ESTIMATION FOR LARGE SAMPLES. CONFIDENCE INTERVAL FOR THE PARAMETER OF A POPULATION OF BERNOULLI, POISSON, AND EXPONENTIAL. MEAN DIFFERENCES IN NORMAL POPULATIONS. MEAN DIFFERENCES IN BERNOULLI POPULATIONS. (LESSONS, 6H)
•HYPOTHESIS TESTING WITH R: TESTS CONCERNING MEANS. TEST CONCERNING DIFFERENCES BETWEEN MEANS. TEST CONCERNING VARIANCE. TEST CONCERNING PROPORTIONS. (LESSONS, 4H)
•GOODNESS OF FIT. HYPOTHESIS TESTING IN LINEAR AND NONLINEAR REGRESSION MODELS. (LESSONS, 4H)
Teaching Methods
THE TEACHING METHOD INCLUDES THEORETICAL LESSONS INTEGRATED BY EXERCISES AND PROBLEMS, ALL RELATED TO THE METHODOLOGIES FOR THE ANALYSIS OF UNIVARIATE AND MULTIVARIATE DATA (CFUS 9, HOURS(H): 72). THE CLASS ATTENDANCE IS STRONGLY RECOMMENDED. THE STUDENTS ARE GUIDED TO LEARN, IN A CRITICAL AND RESPONSIBLE WAY, EVERYTHING WHAT THE TEACHER PRESENTS DURING THE LECTURES. STUDENTS ARE THUS ENCOURAGED TO COMMUNICATE TO THE ENTIRE CLASS THE IDEAS OF DEVELOPMENT AND OF PROBLEM SOLVING, AND ARE ALSO ENCOURAGED TO ACQUIRE SKILLS AND EXPERTISE IN MANAGING THE COMPLEXITY OF NEW PROBLEMS CONCERNING TO DATA ANALYSIS.
Verification of learning
THE COURSE ASSESSMENT CONSISTS OF AN EXAM GRADED ON A SCALE OF THIRTY, INVOLVING THE DEVELOPMENT OF A PROJECT AND AN ORAL TEST. THE PROJECT, WHICH CAN BE UNDERTAKEN INDIVIDUALLY OR IN GROUPS OF UP TO TWO PEOPLE, AIMS TO ASSESS THE APPLICATION OF ACQUIRED KNOWLEDGE. FOLLOWING THE PROJECT SUBMISSION, STUDENTS UNDERGO AN INDIVIDUAL ORAL EXAMINATION, COMPRISING QUESTIONS ON THE THEORETICAL CONTENT COVERED, AIMED AT ASSESSING COMPREHENSION AND THE ABILITY TO ARTICULATE CONCEPTS. THE FINAL GRADE DEPENDS ON BOTH THE ACQUIRED KNOWLEDGE AND THE ABILITY TO APPLY METHODOLOGIES TO SOLVE REAL-WORLD PROBLEMS.
Texts
•MICHAEL J. CRAWLEY (2017) THE R BOOK, WILEY
•JANE M. HORGAN (2019) PROBABILITY WITH R. AN INTRODUCTION WITH COMPUTER SCIENCE APPLICATIONS. WILEY
•ALVIN C. RENCHER (2012) METHODS OF MULTIVARIATE ANALYSIS. WILEY SERIES IN PROBABILITY AND STATISTICS
•LECTURE NOTES OF THE TEACHER (IN ITALIAN)
More Information
ATTENDANCE AT THE COURSE IS HIGHLY RECOMMENDED. TO ASSIST STUDENTS IN INDIVIDUAL STUDY, THE INSTRUCTOR WILL PROVIDE LECTURE NOTES COVERING VARIOUS TOPICS AND ISSUES ADDRESSED. STUDENTS WHO HAVE ATTENDED REGULARLY HAVE AN ADVANTAGE IN THE ORAL DISCUSSION AS THEY HAVE BEEN GUIDED DURING THE LECTURES TO SYSTEMATICALLY AND CRITICALLY LEARN, PROCESS, AND CONNECT VARIOUS TOPICS, AS WELL AS MANAGE THE COMPLEXITY OF NEW PROBLEMS. LECTURE MATERIALS WILL BE AVAILABLE ON THE DEPARTMENTAL E-LEARNING PLATFORM AT HTTP://ELEARNING.INFORMATICA.UNISA.IT/EL-PLATFORM/.
Lessons Timetable

  BETA VERSION Data source ESSE3 [Ultima Sincronizzazione: 2025-01-16]