MASSIVE DATA MINING

Domenico PARENTE MASSIVE DATA MINING

0222800008
DEPARTMENT OF MANAGEMENT & INNOVATION SYSTEMS
EQF7
DATA SCIENCE E GESTIONE DELL'INNOVAZIONE
2025/2026



OBBLIGATORIO
YEAR OF COURSE 2
YEAR OF DIDACTIC SYSTEM 2022
AUTUMN SEMESTER
CFUHOURSACTIVITY
642LESSONS
321LAB
Objectives
THE TRAINING COURSE (63 HOURS) AIMS TO PROVIDE STUDENTS WITH A KNOWLEDGE BASE RELATING TO THE ANALYSIS OF DATA, EVEN FROM HETEROGENEOUS SOURCES, TO ALLOW SCALABLE MANAGEMENT WITH COMPLEX SYSTEMS. IN FACT, THE COURSE AIMS TO DEVELOP ANALYTICAL SKILLS ORIENTED TO THE SOLUTION OF COMPLEX AND ARTICULATED PROBLEMS THAT REQUIRE HYBRID SOLUTIONS IN DATA MANAGEMENT THROUGH DATA MINING APPROACHES, WITH DISTRIBUTED TECHNIQUES, WITH ADVANCED COMPUTATION PARADIGMS, AIMED AT DATA-DRIVEN DISCOVERY AND PREDICTION.
AT THE END OF THE TRAINING COURSE, THE STUDENT WILL HAVE ACQUIRED THEORETICAL KNOWLEDGE AND PRACTICAL SKILLS IN DATA ANALYTICS (FOR SOLVING PROBLEMS ARISING FROM THE ACQUISITION AND MANAGEMENT OF LARGE AMOUNTS OF DATA).
THE COURSE AIMS TO DEVELOP: I) SKILLS IN DATA COLLECTION AND PROMOTE THE DEVELOPMENT OF CRITICAL ANALYSIS SKILLS, THROUGH A HYBRID APPROACH TO DEFINE AN OVERALL STRATEGY AIMED AT TRANSFORMING DATA INTO USEFUL INFORMATION, II) ABILITY TO USE THE MAIN TECHNIQUES AND TOOLS USEFUL FOR SOLVING CERTAIN SPECIFIC PROBLEMS. THE STUDENT WILL BE ENCOURAGED TO DEVELOP ANALYSIS AND DESCRIPTION/EXTRACTION SKILLS OF THE CHARACTERISTICS INHERENT IN THE DATA, AND THE ABILITY TO PROVIDE AN ABSTRACT MODEL THAT HIGHLIGHTS THE PECULIARITIES DETECTED BY THE PROCESSING OF THE DATA ITSELF.
AT THE END OF THE TRAINING COURSE, THE STUDENT WILL BE ABLE TO:

· CRITICALLY EVALUATE AND INDEPENDENTLY IMPLEMENT APPROPRIATE DATA SCIENCE SOLUTIONS IN DIFFERENT CONTEXTS;

· EVALUATE THE POTENTIAL AND LIMITS OF USE OF THE TECHNIQUES AND MODELS LEARNED

· CHOOSE THE DECISION-MAKING CRITERIA, METHODOLOGIES, TECHNIQUES AND TECHNOLOGIES BEST SUITED TO SOLVING SPECIFIC PROBLEMS AND CLASSES OF PROBLEMS.
FURTHERMORE, THE STUDENT WILL BE ABLE TO IMPLEMENT APPROPRIATE SYNTHESES TO EFFECTIVELY COMMUNICATE THE RESULTS OF DATA ANALYSIS (INCLUDING BIG DATA) AND HIGHLIGHT THE ESSENTIAL ASPECTS USEFUL FOR IDENTIFYING SOLUTIONS.
FINALLY, THE STUDENT WILL DEVELOP THE ABILITY TO:

· STUDY INDEPENDENTLY, EFFECTIVELY INTEGRATING THE KNOWLEDGE ACQUIRED;

· KEEP THEIR SKILLS UPDATED IN A CONSTANTLY EVOLVING SECTOR SUCH AS COMPUTER SCIENCE;

· EFFECTIVELY UNDERTAKE HIGHER LEVEL TRAINING COURSES.
Prerequisites
BASIC NOTIONS OF DATA BASES AND ALGORITHMIC THINKING FOR PROBLEM SOLVING
Contents
GOAL IS TO PROVIDE A SOLID AND MODERN ACADEMIC PREPARATION FOR UNDERSTANDING AND MANAGING THE VARIOUS PERSPECTIVES AND NUANCES INVOLVED IN THE DATA ANALYSIS.
THE COURSE IS STRUCTURED AS A SINGLE MODULE OF 63 HOURS INCLUDING:
- INTRODUCTION TO DATA SCIENCE AND ITS APPLICATIONS (3 HOURS)
- BRIEF NOTES ON DATA VISUALIZATION (3 HOURS)
- BACKGROUND ON PYTHON LIBRARIES FOR DATA MANIPULATION (3 HOURS)
- SIMILARITY AND DISSIMILARITY MEASURES (3 HOURS)
- SIMILAR ITEMS (LOCALITY SENSITIVE HASHING) (12 HOURS)
- PREPROCESSING, DATA REDUCTION (3 HOURS)
- FREQUENT ITEMSET (9 HOURS)
- DIMENSIONAL REDUCTION (3 HOURS)
- CLUSTERING (9 HOURS)
- ADVANCED CLUSTERING (3 HOURS)
- CLASSIFICATION (9 HOURS)
- ADVANCED CLASSIFICATION (3 HOURS)
Teaching Methods
THE COURSE INCLUDES LECTURES IN CLASSROOMS (42 HOURS) AND PRACTICAL EXERCISES ON THE TOPICS COVERED (21 HOURS OF LABORATORY).
BY THE END OF THE COURSE, STUDENTS WILL BE ABLE TO:
1.ASSESS AND ARTICULATE THE RELEVANCE OF DATA FOR A PARTICULAR BUSINESS OR SOCIETAL PROBLEM.
2.COLLECT, STORE, AND RETRIEVE DATA ORIGINATING FROM MULTIPLE SOURCES.
3.PREPROCESS DIVERSE DATA INTO STANDARDIZED FORMATS
4.UNDERTAKE EXPLORATORY DATA ANALYSIS TO GENERATE INSIGHTS FROM THE DATA.
5.VISUALIZE DATA INTO CHARTS AND OTHER VISUAL REPRESENTATIONS FOR GENERATING INSIGHTS AND SUPPORTING DECISION MAKING.
ATTENDANCE TO LESSONS IS NOT MANDATORY BUT STRONGLY RECOMMENDED.

Verification of learning
THE EXAM INCLUDES A WRITTEN TEST IN THE FORM OF A PROJECT AND AN ORAL EXAMINATION. BOTH TESTS WILL COVER ALL THE COURSE TOPICS. THE EVALUATION (IN THIRTY) IS CALCULATED AS THE AVERAGE OF THE SCORE OBTAINED IN THE PROJECT AND THAT OBTAINED IN THE ORAL EXAMINATION. THE FINAL SCORE, WHEN THE EXAM IS PASSED, IS EXPRESSED ON THE BASIS OF THE SCALE FROM 18/30 (LIMITED KNOWLEDGE OF THE TOPICS) TO 30/30 LODE (THE CANDIDATE DEMONSTRATES SIGNIFICANT MASTERY OF THE CONTENTS).
Texts
1) J. LESKOVEC, A. RAJARAMAN, J.D. ULLMAN, "MINING OF MASSIVE DATASETS", 2ND ED., CAMBRIDGE UNIVERSITY PRESS.
2) PEI, JIAN,KAMBER, MICHELINE,HAN, JIAWEI, "DATA MINING: CONCEPTS AND TECHNIQUES"
MORGAN KAUFMANN (THIRD EDITION)
More Information
SLIDES AND OTHER MATERIAL PROVIDED BY THE TEACHER
Lessons Timetable

  BETA VERSION Data source ESSE3 [Ultima Sincronizzazione: 2025-09-25]