DATA ANALYSIS AND VISUALIZATION

Pietro CORETTO DATA ANALYSIS AND VISUALIZATION

0212800004
DEPARTMENT OF ECONOMICS AND STATISTICS
EQF6
STATISTICS FOR BIG DATA
2021/2022

OBBLIGATORIO
YEAR OF COURSE 1
YEAR OF DIDACTIC SYSTEM 2018
SPRING SEMESTER
CFUHOURSACTIVITY
1060LESSONS
Objectives
THE OBJECTIVE OF THE COURSE IS TO INTRODUCE THE MAIN STATISTICAL TOOLS FOR EXPLORATORY DATA ANALYSIS. THESE TOOLS INCLUDE BOTH NUMERICAL, AND GRAPHICAL METHODS THAT ARE DESIGNED TO SUMMARIZE THE MAIN STRUCTURE OF A MODERN LARGE DATA SET PRIOR TO DATA MODELING.

KNOWLEDGE AND UNDERSTANDING
THE COURSE IS DESIGNED TO TRANSFER KNOWLEDGE OF KEY NUMERICAL AND GRAPHICAL METHODS OF EXPLORATORY DATA ANALYSIS. THE INTRODUCTION TO THE METHODOLOGY IS ALWAYS ACCOMPANIED WITH A PRACTICAL GUIDE TO THE USE OF RELEVANT STATISTICAL SOFTWARE. THE ACQUISITION OF THIS KNOWLEDGE ADDRESSES THE FOLLOWING OBJECTIVES: (I) GIVEN THE THE SAMPLING PROCESS THAT PRODUCED DATA, THE STUDENT WILL BE ABLE TO DETECT THE MAIN FEATURES OF THE DATA SET WITHOUT ANY ADDITIONAL PRIOR KNOWLEDGE ABOUT THE VARIABLES INVOLVED; (II) THE STUDENT WILL BE ABLE TO DECIDE WHICH METHOD TO APPLY THAT IS THE MOST APPROPRIATE FOR THE SPECIFIC CASE UNDER STUDY; (III) THE STUDENT WILL BE ABLE TO DEVELOP A CRITICAL EVALUATION OF THE IMPLICATIONS OF THE RESULTS.

APPLYING KNOWLEDGE AND UNDERSTANDING
BASED ON THE ACQUIRED KNOWLEDGE STUDENTS WILL BE ABLE TO: (I) FULLY UNDERSTAND THE TECHNICALITIES OF STATISTICAL METHODS OF EXPLORATORY DATA ANALYSIS; (II) USE THOUGHT METHODOLOGIES IN THE DIVERSE FIELDS OF APPLICATION (ECONOMICS AND SOCIAL SCIENCES, BIO SCIENCES, INDUSTRY, NETWORK DATA, ETC); (III) USE RELEVANT STATISTICAL SOFTWARES FOR THE ANALYSIS OF REAL WORLD CASE STUDIES.
Prerequisites
BASIC KNOWLEDGE, CALCULUS, MATRIX ALGEBRA AND PROGRAMMING
Contents
THE R LANGUAGE, THE IDE RSTUDIO, MARKDOWN E RMARKDOWN, VECTORS, MATRICES, ARRAY, LISTS. DATA FRAMES. CONTROL STRUCTURES. MAPPING AND FUNCTIONS IN R. DATA COLECTION AND MANIPULATION. EXPLOARTORY DATA ANALYSIS. DATA VISUALIZATION (COLORING, CONDITIONING E FACETING). INTERCATIVE AND DYNAMIC GRAPHICS. CLUSTERING, PCA, MDS. REPORTING. LAB CLASSES WITH R AND CASE STUDIES.
Teaching Methods
LECTURES, LAB CLASSES AND CASE STUDIES
Verification of learning
THE FINAL EXAM CONSISTS OF A WRITTEN AND AN ORAL EXAM.
BOTH PARTS WILL BE EVALUATED ON A NUMERICAL SCALE BETWEEN 1 AND 30. TO ACCESS THE ORAL PART A MINIMUM OF 18/30 IS REQUIRED FOR THE WRITTEN PART.
THE WRITTEN PART FOCUSES ON THE ABILITY TO USE THE STATISTICAL LANGUAGE R, TO MAKE A CORRECT DATA ANALYSIS AND VISUALIZATION OF A GIVEN, EVEN COMPLEX, DATASET, TO EFFECTIVELY COMMUNICATE THE RESULTS USING A STATISTICAL REPORT. DURING THE WRITTEN TEST THE STUDENT WILL RECEIVE AN EXAM TRACE AND WILL BE ASKED TO ANSWER 10 QUESTIONS (EACH WITH A MAXIMUM SCORE OF 3 POINTS) ON THE ENTIRE PROGRAM OF THE COURSE, USING A DATASET PROVIDED DURING THE EXAM. THE ORAL EXAM (LASTING ABOUT 30 MINUTES) FOCUSES ON THE GENERAL KNOWLEDGE OF THE TOPICS TREATED DURING THE COURSE, THE ABILITY TO PRODUCE A CORRECT STATISTICAL ANALYSIS, THE ABILITY TO CORRECTLY COMMUNICATE THE RESULTS.

THE FINAL MARK WILL REFLECT THE EFFECTIVINESS OF THE TOOLS EMPLOYED, OF THE THOROUGHNESS AND LUCIDITY OF ANSWERS.

THE FINAL MARK, ON A SCALE BETWEEN 1 AND 30 WITH LAUDE, WILL CONSIDER BOTH THE PERFORMANCE ON THE WRITTEN AND ORAL PART.
Texts
LECTURES NOTES OF THE INSTRUCTOR AND REFERENCES AVAILABLE ON-LINE SUGGESTED BY THE INSTRUCTOR

EXPLORATORY DATA ANALYSIS WITH R, ROGER D. PENG, (AVAILABLE ON LINE HTTPS://BOOKDOWN.ORG/RDPENG/EXDATA/)

LABORATORIO DI STATISTICA CON R 2/ED
STEFANO M. IACUS, GUIDO MASAROTTO, MCGRAW HILL
More Information
ADDITIONAL INFORMATION WILL BE AVAILABLE ON THE WEB PAGE OF THE INSTRUCTOR. ATTENDANCY EVEN IF NOT COPULSORY IS STRONGLY ENCOURAGED.
  BETA VERSION Data source ESSE3 [Ultima Sincronizzazione: 2022-11-21]