COMPUTER SCIENCE LABORATORY

Luigi TROIANO COMPUTER SCIENCE LABORATORY

0212800021
DEPARTMENT OF ECONOMICS AND STATISTICS
EQF6
STATISTICS FOR BIG DATA
2024/2025

YEAR OF COURSE 3
YEAR OF DIDACTIC SYSTEM 2018
SPRING SEMESTER
CFUHOURSACTIVITY
530LAB
ExamDate
TROIANO03/02/2025 - 11:00
TROIANO03/02/2025 - 11:00
TROIANO07/04/2025 - 11:00
TROIANO09/06/2025 - 11:00
TROIANO30/06/2025 - 11:00
TROIANO18/07/2025 - 11:00
TROIANO12/09/2025 - 11:00
Objectives
THE OBJECTIVE OF THE COURSE IS TO INTRODUCE THE STUDENT TO THE DEVELOPMENT OF ALGORITHMS FOR BIG DATA PROCESSING. PROBLEMS IN VARIOUS AREAS OF BIG DATA APPLICATIONS WILL BE INTRODUCED, AND STRATEGIES AND SOLUTION MODELS WILL BE STUDIED.

KNOWLEDGE AND COMPREHENSION
DURING THE COURSE, THE STUDENT WILL ACQUIRE BOTH THEORETICAL AND PRACTICAL KNOWLEDGE OF PROBLEMS RELATED TO BIG DATA PROCESSING IN VARIOUS APPLICATION SECTORS. THIS WILL ENABLE THEM TO ANALYZE THE COMPUTATIONAL ASPECTS AND FIND SUITABLE ALGORITHMIC SOLUTIONS. THE OBJECTIVE IS TO LEARN HOW TO MAKE THE BEST USE OF THE VARIETY OF AVAILABLE SOLUTIONS, GUIDED BY AN UNDERSTANDING OF THEIR CHARACTERISTICS, INCLUDING PRACTICAL EXAMPLES OF APPLICATIONS.

ABILITY TO APPLY KNOWLEDGE AND COMPREHENSION
THE COURSE AIMS TO DEVELOP IN THE STUDENT AN AWARENESS OF DESIGNING AND IMPLEMENTING COMPUTATIONAL SOLUTIONS FOR BIG DATA THROUGH THEORETICAL STUDY AND PRACTICAL EXERCISES ON ASPECTS RELATED TO MANAGING THE COMPLEXITY OF DIFFERENT APPROACHES AND THE SPECIFIC CHARACTERISTICS OF DIFFERENT APPLICATION DOMAINS.
Prerequisites
THE TEACHING ASSUMES KNOWLEDGE OF PROGRAMMING; ALGORITHMS AND DATA STRUCTURES; DATA ANALYSIS AND VISUALIZATION; ARCHITECTURES FOR BIG DATA; PROBABILISTIC MODELS FOR DATA ANALYSIS.
Contents
PYTHON RECAP:

1. VARIABLES, DATA TYPES, AND OPERATORS.
2. CONTROL STRUCTURES: CONDITIONS, LOOPS, AND BREAK/CONTINUE STATEMENTS.
3. FUNCTIONS: DEFINITION, INVOCATION, AND PARAMETERS.
4. EXCEPTION HANDLING WITH TRY-EXCEPT.
5. DATA STRUCTURES: LISTS, DICTIONARIES, AND TUPLES.
6. ADVANCED TOPICS LIKE FILE HANDLING, OBJECT-ORIENTED PROGRAMMING, AND SPECIFIC MODULES.

PANDAS:

1. INTRODUCTION TO PANDAS.
2. DATA LOADING AND MANIPULATION.
3. DATA CLEANING AND TRANSFORMATION.
4. DATA ANALYSIS.
5. DATA VISUALIZATION.
6. DATA EXPORTING.

SCIKIT-LEARN:

1. INTRODUCTION TO SCIKIT-LEARN.
2. DATA PREPARATION.
3. MACHINE LEARNING MODELS.
4. MODEL EVALUATION.
5. MODEL OPTIMIZATION.
6. INTEGRATING SCIKIT-LEARN WITH OTHER TOOLS.

OTHER LIBS

1. DATA SOURCES
2. DATA VISUALIZATION
3. STATISTICS
Teaching Methods
THE COURSE IS ORIENTED TOWARDS THE PRACTICAL APPLICATION OF TECHNIQUES FOR CODING ALGORITHMIC SOLUTIONS.
Verification of learning
THE EXAMINATION CONSISTS OF A PROJECT WORK, A LABORATORY TEST AND A WRITTEN TEST.
THE PROJECT WORK, WHICH IS CARRIED OUT BY THE STUDENT INDIVIDUALLY OR IN A GROUP, CONSISTS OF A SMALL TEACHING PROJECT IN WHICH THE STUDENT WILL HAVE THE OPPORTUNITY TO TEST HIMSELF/HERSELF WITH THE APPLICATION OF THE TECHNOLOGIES LEARNT DURING THE COURSE AND TO PRESENT THE SOLUTION IN THE EXAM. THE WRITTEN TEST FOLLOWS THE WORKSHOP TEST AND CONSISTS OF 5 MULTIPLE-CHOICE QUESTIONS. IT HAS A DURATION OF 15 MINUTES AND IS DESIGNED TO CHECK THE LEARNING OF THE TECHNICAL AND METHODOLOGICAL NOTIONS EXPLAINED DURING THE COURSE.
Texts
LECTURE NOTES
More Information
REGULAR FREQUENCY IS REQUIRED FOR THE COURSE ACCORDING TO THE CRITERIA DEFINED BY THE DIDACTIC AREA.
  BETA VERSION Data source ESSE3 [Ultima Sincronizzazione: 2025-01-31]