INFORMATION SYSTEMS FOR BIG DATA

Giuseppe FENZA INFORMATION SYSTEMS FOR BIG DATA

0222800009
DEPARTMENT OF MANAGEMENT & INNOVATION SYSTEMS
EQF7
DATA SCIENCE E GESTIONE DELL'INNOVAZIONE
2024/2025

OBBLIGATORIO
YEAR OF COURSE 2
YEAR OF DIDACTIC SYSTEM 2022
AUTUMN SEMESTER
CFUHOURSACTIVITY
321LESSONS
321LAB
ExamDate
FENZA09/12/2024 - 14:30
FENZA09/12/2024 - 14:30
Objectives
THE COURSE AIMS TO INTRODUCE FUNDAMENTAL CONCEPTS, REQUIREMENTS, TECHNOLOGIES, AND REFERENCE ARCHITECTURES FOR DEFINING AND IMPLEMENTING BIG DATA-ORIENTED INFORMATION SYSTEMS.
SKILLS WILL BE LEARNED BY STUDYING EXISTING TECHNOLOGICAL FRAMEWORKS FOR ACQUISITION; STORAGE THROUGH NOSQL-DB (SOLR, MONGODB, NEO4J, ETC.) AND FORMATS FOR BIG DATA FILES (AVRO, PARQUET, ETC.); AND DISTRIBUTED PROCESSING, BOTH IN BATCH AND STREAM MODE (HADOOP, SPARK, ETC.), WITH THE AIMING OF CALCULATING ANALYTICS FROM UNSTRUCTURED OR SEMI-STRUCTURED RESOURCES, IN A SCALABLE MANNER.
IT WILL BE PROVIDED AN INTRODUCTION TO WEB APPLICATIONS FOR ANALYTICS VISUALIZATION, INCLUDING D3.JS AND TECHNOLOGY STACKS SUCH AS APACHE SOLR+BANANA AND ELASTICSEARCH+KIBANA.
AT THE END OF THE COURSE, THE STUDENT WILL BE ABLE TO USE THE MAIN TECHNOLOGICAL TOOLS FOR ACQUIRING, STORING, PROCESSING, AND ANALYZING BIG DATA. FURTHERMORE, THE STUDENT WILL BE ENCOURAGED TO CARRY OUT GROUP WORK AND APPLY THE ACQUIRED KNOWLEDGE TO IMPLEMENT A PROJECT EXHIBITING BIG DATA ANALYTICS FUNCTIONALITIES IN A CHOSEN FIELD (E.G., SOCIAL MEDIA, WEB INTELLIGENCE, SMART ENVIRONMENT, ETC.). THE OBJECTIVE CONSISTS IN EXERCISING THE ABILITY TO SELECT AND ADOPT SUITABLE TECHNOLOGIES DEPENDING ON HETEROGENEOUS REQUIREMENTS COMING FROM THE PROJECT CONTEXT.
Prerequisites
IT IS DESIRABLE THAT STUDENTS KNOW: THE BASIC CONCEPTS OF ALGORITHMS AND DATA STRUCTURES; AT LEAST A PROGRAMMING LANGUAGE AMONG JAVA, PYTHON, SCALA, TO WRITE SIMPLE PROGRAMS; THE BASICS OF DATABASES AND SQL.
Contents
AFTER A BRIEF INTRODUCTION TO THE MAIN LEARNING OBJECTIVES OF THE COURSE, STUDENTS WILL BE INTRODUCED TO THE BIG DATA WORLD.
IN THE EARLY PART OF THE COURSE, THE STUDENTS WILL BE ENCOURAGED TO WORK IN TEAM DEFINING A PROJECT WORK IN WHICH APPLY THE KNOWLEDGE ACQUIRED DURING THE CLASSES FOLLOWING A STEP-BY-STEP APPROACH.
THE COURSE WILL BE COMPOSED OF THE FOLLOWING MAIN PARTS.

(4 HOURS) INTRODUCTION TO BIGDATA-ENABLED ARCHITECTURES
BIGDATA LANDSCAPE
REQUIREMENTS OF BIGDATA INFORMATION SYSTEM
LAMBDA AND KAPPA ARCHITECTURE


(4 HOURS, ONE OF WHICH ARE LABORATORY ACTIVITIES) ACQUISITION
SERIALIZATION AND EXCHANGE DATA FORMATS: JSON, AVRO, PARQUET, ETC.
REST AND STREAM API FOR ACCESSING TWITTER, DROPBOX, ETC.

(10 HOURS, SEVEN OF WHICH ARE LABORATORY ACTIVITIES) DISTRIBUTED PROCESSING
HADOOP AND RELATED TECHNOLOGIES.
SPARK, AND OTHER BIG DATA PROCESSING ENGINES.
HANDS ON SPARK DATAFRAME
HANDS ON SPARK MACHINE LEARNING

(10 HOURS, SEVEN OF WHICH ARE LABORATORY ACTIVITIES) STORAGE
INTRODUCTION TO NOSQL DATABASE, SUCH AS KEY-VALUE STORE, DOCUMENT-ORIENTED DATABASE, COLUMN-ORIENTED AND GRAPH DB.
HANDS ON MONGODB
HANDS ON NEO4J

(10 HOURS, FOUR OF WHICH ARE LABORATORY ACTIVITIES) DISTRIBUTED STREAM PROCESSING
INTRODUCTION TO DISTRIBUTED DATA STREAM STREAM PROCESSING.
APACHE STORM, SPARK STREAMING, KAFKA STREAMS
HANDS ON SPARK STREAMING
HANDS ON KAFKA STREAMS

(4 HOURS, TWO OF WHICH ARE LABORATORY ACTIVITIES) BIG DATA ANALYTICS
INTRODUCTION TO ANALYTICS VISUALIZATION THROUGH A WEB APPLICATION CONSIDERING D3.JS AND THE MOST USED TECHNOLOGICAL STACKS: APACHE SOLR AND BANANA, ELASTICSEARCH AND KIBANA
HANDS ON APACHE SOLR AND BANANA
Teaching Methods
THE COURSE AIMS TO ENCOURAGE STUDENTS TO THE LIFELONG LEARNING PROCESS, WHICH INVOLVES THE CONTINUOUS UPDATING (THROUGHOUT LIFE) OF KNOWLEDGE AND SKILLS, TRYING TO STIMULATE CURIOSITY AND INTEREST IN INFORMATION TECHNOLOGY AND NEW TECHNOLOGIES ATTAINING WITH THE MATTER OF THE COURSE.
IN ORDER TO GET THEM USED TO SELF-LEARNING, STUDENTS WILL BE INVITED TO DEEPEN THE TOPICS OF THE COURSE BY OFFERING THEM ACCESS TO ONLINE RESOURCES OF PARTICULAR INTEREST.
DURING THE COURSE THE TEACHER WILL MAKE AMPLE USE OF EXAMPLES, GUIDED EXERCISES.
FROM A STRUCTURAL POINT OF VIEW, THE LESSONS WILL CONSIST OF
(21 HOURS) FRONTAL LESSONS.
(21 HOURS) LABORATORY ACTIVITIES.

ATTENDANCE TO LESSONS IS NOT MANDATORY BUT STRONGLY RECOMMENDED.
Verification of learning
THE ACHIEVEMENT OF THE COURSE OBJECTIVES IS CERTIFIED THROUGH PASSING AN EXAM GRADED IN THIRTIETHS, DIVIDED INTO TWO PARTS: A THEORETICAL TEST AND A PRACTICAL PROJECT, EACH WITH A MINIMUM PASSING THRESHOLD.

THEORETICAL TEST: THIS CONSISTS OF A 40-MINUTE ORAL PRESENTATION WHERE THE STUDENT DISCUSSES A TOPIC OF TECHNOLOGICAL, METHODOLOGICAL, AND/OR APPLICATIVE INTEREST FROM THE COURSE. THE PRESENTATION SHOULD BE SUPPORTED BY INDIVIDUAL RESEARCH THAT LINKS TO THE THEMES COVERED IN THE LESSONS. THE EVALUATION OF THE ORAL TEST CONSIDERS THE STUDENT’S PRESENTATION SKILLS, KNOWLEDGE OF COURSE TOPICS, AND CRITICAL ANALYSIS.

PRACTICAL PROJECT: THE PROJECT CAN BE CARRIED OUT INDIVIDUALLY OR IN GROUPS AND MUST BE SUBMITTED BEFORE THE ORAL EXAM. IT SHOULD INCLUDE ALL PHASES OF A TYPICAL DATA ANALYSIS PIPELINE: ACQUISITION, STORAGE, PROCESSING, AND VISUALIZATION. DOCUMENTATION OF GROUP PROJECTS MUST SPECIFY EACH MEMBER’S CONTRIBUTION. THE EVALUATION IS BASED ON THE TECHNICAL VALIDITY OF THE CHOICES MADE, THE RELEVANCE OF THE ADOPTED DATA ANALYSIS FLOW, AND THE CLARITY OF THE PRESENTED ANALYSES.

THE FINAL GRADE IS THE AVERAGE OF THE TWO PARTS, EXPRESSED IN THIRTIETHS, WITH THE POSSIBILITY OF HONORS. HONORS ARE AWARDED IF THE CANDIDATE DEMONSTRATES MASTERY OF THEORETICAL AND OPERATIONAL CONTENT, WITH THE ABILITY TO PRESENT AND ELABORATE AUTONOMOUSLY EVEN IN CONTEXTS DIFFERENT FROM THOSE PROPOSED BY THE TEACHER. THE MINIMUM SCORE TO PASS THE EXAM IS 18/30, ASSIGNED IN CASES OF SIGNIFICANT UNCERTAINTIES IN THE USE OF TERMINOLOGY AND CONCEPTS STUDIED, AND INAPPROPRIATE USE OF TOOLS IN THE PROJECT. THE MAXIMUM SCORE, 30/30, IS AWARDED WHEN THE STUDENT SHOWS COMPLETE KNOWLEDGE AND OPERATIONAL CAPABILITY IN THE CARRIED-OUT PROJECT.
Texts
MARZ, N., & WARREN, J. (2015). BIG DATA: PRINCIPLES AND BEST PRACTICES OF SCALABLE REAL-TIME DATA SYSTEMS. NEW YORK; MANNING PUBLICATIONS CO.

SUGGESTED READINGS:

BAHGA, ARSHDEEP, AND VIJAY MADISETTI. BIG DATA SCIENCE & ANALYTICS: A HANDS-ON APPROACH. VPT, 2016.

More Information
LINKS TO ADDITIONAL MATERIAL AND TEACHING MATERIALS WILL BE PROVIDED.
Lessons Timetable

  BETA VERSION Data source ESSE3 [Ultima Sincronizzazione: 2024-11-18]