Domenico PARENTE | MASSIVE DATA MINING
Domenico PARENTE MASSIVE DATA MINING
cod. 0222700008
MASSIVE DATA MINING
0222700008 | |
DEPARTMENT OF MANAGEMENT & INNOVATION SYSTEMS | |
EQF7 | |
DATA SCIENCE AND INNOVATION MANAGEMENT | |
2022/2023 |
OBBLIGATORIO | |
YEAR OF COURSE 2 | |
YEAR OF DIDACTIC SYSTEM 2020 | |
AUTUMN SEMESTER |
SSD | CFU | HOURS | ACTIVITY | |
---|---|---|---|---|
INF/01 | 6 | 42 | LESSONS | |
INF/01 | 3 | 21 | LAB |
Objectives | |
---|---|
THE COURSE (63 HOURS AND 9 ECTS) AIMS AT PROVIDING STUDENTS WITH AN ENDOWMENT OF KNOWLEDGE RELATED TO DATA ANALYSIS, IN ORDER TO ALLOW A SCALABLE MANAGEMENT OF COMPLEX SYSTEMS. IT ALSO AIMS AT DEVELOPING ANALYTICAL CAPABILITIES TO SOLVE COMPLEX PROBLEMS, WHOSE SOLUTIONS GO TOWARD SYNERGIC APPROACHES IN TERMS OF DATA MINING ALGORITHMS, ADVANCED COMPUTATIONAL PARADIGMS, DISTRIBUTED SYSTEM FOR DATA MANAGEMENT, TARGETED AT DATA-DRIVEN DISCOVERY AND PREDICTIVE ANALYSIS. THE STUDENT, AT THE END OF THE COURSE, WILL HAVE ACQUIRED THEORETICAL KNOWLEDGE AND PRACTICAL SKILLS RELATED TO DATA ANALYSIS AND ANALYTICS (FOR SOLVING PROBLEMS RELATED TO THE ACQUISITION AND MANAGEMENT OF BIG DATA) AND THE ABILITY TO USE THE MAIN TECHNIQUES AND TOOLS FOR THE RESOLUTION OF SPECIFIC PROBLEMS. THE STUDENT WILL BE ENCOURAGED TO DEVELOP ANALYTICAL SKILLS TARGETED AT EXTRACTING INTRINSIC DATA FEATURES AND THE CAPABILITY TO GET AN ABSTRACTION THAT EMPHASIZES THE NATURE OF THE PROCESSED DATA. THE COURSE AIMS AT FOSTERING THE DEVELOPMENT OF SKILLS IN DATA COLLECTION AND DATA ANALYSIS, THROUGH HYBRID APPROACHES THAT COMBINE COMPLEX STRATEGIES TO EXTRACT EFFECTIVE INFORMATION FROM ROUGH DATA |
Prerequisites | |
---|---|
BASIC NOTIONS OF DATA BASES AND ALGORITHMIC THINKING FOR PROBLEM SOLVING |
Contents | |
---|---|
GOAL IS TO PROVIDE A SOLID AND MODERN ACADEMIC PREPARATION FOR UNDERSTANDING AND MANAGING THE VARIOUS PERSPECTIVES AND NUANCES INVOLVED IN THE DATA ANALYSIS. THE COURSE IS STRUCTURED AS A SINGLE MODULE OF 63 HOURS INCLUDING: - INTRODUCTION TO DATA SCIENCE AND ITS APPLICATIONS (3 HOURS) - BRIEF NOTES ON DATA VISUALIZATION (3 HOURS) - BACKGROUND ON PYTHON LIBRARIES FOR DATA MANIPULATION (3 HOURS) - SIMILARITY AND DISSIMILARITY MEASURES (3 HOURS) - SIMILAR ITEMS (LOCALITY SENSITIVE HASHING) (12 HOURS) - PREPROCESSING, DATA REDUCTION (3 HOURS) - FREQUENT ITEMSET (9 HOURS) - DIMENSIONAL REDUCTION (3 HOURS) - CLUSTERING (9 HOURS) - ADVANCED CLUSTERING (3 HOURS) - CLASSIFICATION (9 HOURS) - ADVANCED CLASSIFICATION (3 HOURS) |
Teaching Methods | |
---|---|
THE COURSE INCLUDES LECTURES IN CLASSROOMS (42 HOURS) AND PRACTICAL EXERCISES ON THE TOPICS COVERED (21 HOURS). BY THE END OF THE COURSE, STUDENTS WILL BE ABLE TO: 1.ASSESS AND ARTICULATE THE RELEVANCE OF DATA FOR A PARTICULAR BUSINESS OR SOCIETAL PROBLEM. 2.COLLECT, STORE, AND RETRIEVE DATA ORIGINATING FROM MULTIPLE SOURCES. 3.PREPROCESS DIVERSE DATA INTO STANDARDIZED FORMATS 4.UNDERTAKE EXPLORATORY DATA ANALYSIS TO GENERATE INSIGHTS FROM THE DATA. 5.VISUALIZE DATA INTO CHARTS AND OTHER VISUAL REPRESENTATIONS FOR GENERATING INSIGHTS AND SUPPORTING DECISION MAKING. |
Verification of learning | |
---|---|
THE EXAM INCLUDES A PROJECT AND AN ORAL EXAMINATION. |
Texts | |
---|---|
1) J. LESKOVEC, A. RAJARAMAN, J.D. ULLMAN, "MINING OF MASSIVE DATASETS", 2ND ED., CAMBRIDGE UNIVERSITY PRESS. 2) PEI, JIAN,KAMBER, MICHELINE,HAN, JIAWEI, "DATA MINING: CONCEPTS AND TECHNIQUES" MORGAN KAUFMANN (THIRD EDITION) |
More Information | |
---|---|
SLIDES AND OTHER MATERIAL PROVIDED BY THE TEACHER |
BETA VERSION Data source ESSE3 [Ultima Sincronizzazione: 2024-08-21]