PROGRAMMING FOR DATA SCIENCE

Francesco ORCIUOLI PROGRAMMING FOR DATA SCIENCE

0222800027
DEPARTMENT OF MANAGEMENT & INNOVATION SYSTEMS
EQF7
DATA SCIENCE E GESTIONE DELL'INNOVAZIONE
2024/2025



OBBLIGATORIO
YEAR OF COURSE 1
YEAR OF DIDACTIC SYSTEM 2022
AUTUMN SEMESTER
CFUHOURSACTIVITY
642LESSONS
ExamDate
ORCIUOLI22/01/2025 - 14:30
ORCIUOLI22/01/2025 - 14:30
ORCIUOLI05/02/2025 - 14:30
ORCIUOLI05/02/2025 - 14:30
Objectives
THE COURSE FOCUSES ON THE STUDY OF PROGRAMMING TECHNIQUES USEFUL FOR THE DATA ANALYST/SCIENTIST TO DESIGN, IMPLEMENT, TEST AND OPTIMIZE SIMPLE AND COMPLEX DATA PIPELINES. IN PARTICULAR, THE STUDENT WILL ACQUIRE KNOWLEDGE RELATED TO THE PARADIGMS OF FUNCTIONAL PROGRAMMING AND OBJECT-ORIENTED PROGRAMMING, TOOLS AND TECHNIQUES FOR DATA MANIPULATION WITH PARTICULAR REFERENCE TO ARRAY-ORIENTED PROGRAMMING AND TABULAR ORGANIZATION OF DATA AND, LASTLY, TO APPROACHES AND TOOLS FOR DATA ENGINEERING. AT THE END OF THE COURSE, THE STUDENT WILL BE ABLE TO APPLY THE ACQUIRED KNOWLEDGE TO DESIGN AND DEVELOP SIMPLE AND COMPLEX DATA PIPELINES USING THE PYTHON PROGRAMMING LANGUAGE AND IDE AND NOTEBOOK-BASED PROGRAMMING ENVIRONMENTS.
At the end of the course, the student will be able to:
- critically evaluate and independently implement appropriate Data Science solutions in different contexts;
- choose decision criteria, methodologies, techniques and technologies best suited to solving specific problems and classes of problems.
In addition, the student will be able to implement appropriate syntheses to effectively communicate the results of data analysis (including Big Data) and highlight the essential aspects that are useful in identifying solutions.
Lastly, the student will develop the skills to:
- study independently, effectively integrating the knowledge acquired;
- maintain up-to-date skills in an ever-changing field
such as computer science;
- effectively undertake higher-level educational paths.
Prerequisites
IN ORDER TO ADEQUATELY DEAL WITH THE COURSE CONTENT, IT IS DESIRABLE THAT STUDENTS KNOW AT LEAST A COMPUTER PROGRAMMING LANGUAGE (PYTHON IS PREFERABLE).
Contents
WITH RESPECT TO THE CONTENT, THE COURSE IS DIVIDED INTO FOUR PARTS:

I – PART 1 (8 HOURS)
- BUILT-IN DATA STRUCTURES IN PYTHON, COMPREHENSIONS, GENERATORS (LAZY EVALUATION) (3 HOURS OF FRONTAL LESSONS)
- FUNCTIONAL PROGRAMMING AND MAP-REDUCE IN PYTHON (3 HOURS OF FRONTAL LESSONS)
- TIME AND MEMORY PROFILING (1 HOUR OF FRONTAL LESSONS)
- TYPE HINTS (1 HOUR OF FRONTAL LESSONS)

II – PART 2 (14 HOURS)
- DATA MANIPULATION BASICS (2 HOURS OF FRONTAL LESSONS)
- DATA SOURCES (2 HOURS OF FRONTAL LESSONS)
- ARRAY-ORIENTED PROGRAMMING: NUMPY (4 HOURS OF FRONTAL LESSONS)
- TABULAR DATA MANIPULATION (6 HOURS OF FRONTAL LESSONS)

III – PART 3 (14 HOURS)
- DATA LOADING (2 HOURS OF FRONTAL LESSONS)
- DATA CLEANING (2 HOURS OF FRONTAL LESSONS)
- DATA PREPARATION (2 HOURS OF FRONTAL LESSONS)
- DATA WRANGLING (4 HOURS OF FRONTAL LESSONS)
- DATA AGGREGATION (2 HOURS OF FRONTAL LESSONS)
- TIME SERIES (2 HOURS OF FRONTAL LESSONS)

IV – PART 4 (6 HOURS)
- MACHINE LEARNING TOOLS: INTRODUCTION (2 HOURS OF FRONTAL LESSONS)
- SEMANTIC DATA (2 HOURS OF FRONTAL LESSONS)
- DATA ENGINEERING NOTIONS: DATA PIPELINES (2 HOURS OF FRONTAL LESSONS)
Teaching Methods
THE TEACHING ACTIVITIES (42 HOURS OF LECTURES) WILL BE DIVIDED INTO FOUR PARTS AS INDICATED IN THE "CONTENT" SECTION. AT THE END OF EACH PART, STUDENTS WILL CARRY OUT EXERCISES AIMED AT CONSOLIDATING THE KNOWLEDGE ACQUIRED AND DEVELOPING THE RELATED APPLICATION CAPACITY. THE CONDUCT OF THE EXERCISES WILL BE THE SUBJECT OF DISCUSSION IN THE CLASSROOM. ATTENDANCE TO LESSONS IS NOT MANDATORY BUT STRONGLY RECOMMENDED.
Verification of learning
ASSESSMENT WILL BE BY MEANS OF TWO TESTS: AN INDIVIDUAL PROJECT WORK AND AN ORAL TEST. IN THE PROJECT WORK, THE STUDENT WILL HAVE TO SOLVE A DATA PROCESSING PROBLEM (THE TOPIC AND OBJECTIVES WILL BE AGREED WITH THE LECTURER) USING THE KNOWLEDGE ACQUIRED AND SKILLS DEVELOPED DURING THE COURSE.
THE ORAL TEST WILL LAST ABOUT 30 MINUTES FOR EACH STUDENT AND WILL CONSIST OF TWO PHASES: 1) DISCUSSION OF THE PROGRESS AND RESULTS OF THE PROJECT WORK, AND 2) IN-DEPTH DISCUSSION OF SOME OF THE TOPICS COVERED IN CLASS (METHODOLOGIES, TECHNIQUES AND TOOLS FOR DEFINING DATA PIPELINES THROUGH PYTHON). THE PHASES WILL BE GRADED INDIVIDUALLY IN THIRTIETHS. THE FINAL GRADE WILL BE CALCULATED AS THE AVERAGE OF THE RESULTS OBTAINED IN THE TWO PHASES. THE MINIMUM SCORE TO PASS THE EXAM IS 18/30 (LIMITED KNOWLEDGE OF THE TOPICS). THE MAXIMUM SCORE IS 30/30 HONORS (CANDIDATE DEMONSTRATES SIGNIFICANT MASTERY OF CONTENT). IN ADDITION, A MARK GREATER THAN OR EQUAL TO 18/30 IN BOTH PHASES IS REQUIRED TO PASS THE EXAM.

WITH REGARD TO THE FIRST STAGE, THE MINIMUM SCORE OF 18/30 IS OBTAINED FOLLOWING THE DEVELOPMENT OF A SIMPLE DATA PIPELINE (e.g., Extract-Transform-Load TYPE) WITH NO PARTICULAR COMPLEXITIES ARISING FROM TRANSFORMATION OR DATA MANAGEMENT and discussion of the project demonstrating sufficient mastery of the domain and techniques used. THE MAXIMUM SCORE OF 30/30 IS OBTAINED FOLLOWING THE DEVELOPMENT OF A DATA PIPELINE WITH AN ORIGINAL DATA ANALYSIS PHASE AND A DISCUSSION THAT ALSO SHOWS CONSIDERABLE MASTERY OF THE APPLICATION DOMAIN AS WELL AS THE DATA PROCESSING TECHNIQUES USED.

REGARDING THE SECOND PHASE, THE MINIMUM SCORE OF 18/30 IS OBTAINED BY DEMONSTRATING KNOWLEDGE OF THE MAIN ACTIVITIES OF A DATA ANALYSIS WORKFLOW AND THE BASIC FUNCTIONALITY OF THE MAIN LIBRARIES COVERED IN CLASS (PANDAS, NUMPY, MATPLOTLIB). MAXIMUM SCORE OF 30/30 IS AWARDED BY DEMONSTRATING THE ABILITY TO SELECT TECHNIQUES AND TOOLS APPROPRIATE TO A SPECIFIC DATA ANALYSIS APPLICATION SCENARIO AND KNOWING THE ADVANCED FUNCTIONALITY OF THE MAIN DATA ANALYSIS LIBRARIES COVERED IN LECTURE. PRAISE IS AWARDED IN CASE OF A 30/30 SCORE IN BOTH STAGES OF EVALUATION AND IF THE CANDIDATE DEMONSTRATES THE ABILITY TO PROPOSE EFFICIENT AS WELL AS EFFECTIVE SOLUTIONS TO THE IMPLEMENTATION OF DATA PIPELINES IN SPECIFIC CONTEXTS.
Texts
MAIN TEXTBOOKS:

- WES MCKINNEY – PYTHON FOR DATA ANALYSIS – O’REILLY (THIRD EDITION)
More Information
THE SUPPLEMENTARY TEACHING MATERIAL WILL BE MADE AVAILABLE TO STUDENTS BEFORE THE RELATIVE LESSONS THROUGH THE SHARED WORKSPACE ASSOCIATED WITH THE CLASS (INFORMATION ON THIS WILL BE PROVIDED BY THE TEACHER AT THE BEGINNING OF THE COURSE).
Lessons Timetable

  BETA VERSION Data source ESSE3 [Ultima Sincronizzazione: 2025-01-16]