Giuseppe FENZA | INFORMATION SYSTEMS FOR BIG DATA
Giuseppe FENZA INFORMATION SYSTEMS FOR BIG DATA
cod. 0222800009
INFORMATION SYSTEMS FOR BIG DATA
0222800009 | |
DEPARTMENT OF MANAGEMENT & INNOVATION SYSTEMS | |
EQF7 | |
DATA SCIENCE E GESTIONE DELL'INNOVAZIONE | |
2024/2025 |
OBBLIGATORIO | |
YEAR OF COURSE 2 | |
YEAR OF DIDACTIC SYSTEM 2022 | |
AUTUMN SEMESTER |
SSD | CFU | HOURS | ACTIVITY | |
---|---|---|---|---|
INF/01 | 3 | 21 | LESSONS | |
INF/01 | 3 | 21 | LAB |
Exam | Date | Session | |
---|---|---|---|
FENZA | 09/12/2024 - 14:30 | SESSIONE ORDINARIA | |
FENZA | 09/12/2024 - 14:30 | SESSIONE DI RECUPERO |
Objectives | |
---|---|
THE COURSE AIMS TO INTRODUCE FUNDAMENTAL CONCEPTS, REQUIREMENTS, TECHNOLOGIES, AND REFERENCE ARCHITECTURES FOR DEFINING AND IMPLEMENTING BIG DATA-ORIENTED INFORMATION SYSTEMS. SKILLS WILL BE LEARNED BY STUDYING EXISTING TECHNOLOGICAL FRAMEWORKS FOR ACQUISITION; STORAGE THROUGH NOSQL-DB (SOLR, MONGODB, NEO4J, ETC.) AND FORMATS FOR BIG DATA FILES (AVRO, PARQUET, ETC.); AND DISTRIBUTED PROCESSING, BOTH IN BATCH AND STREAM MODE (HADOOP, SPARK, ETC.), WITH THE AIMING OF CALCULATING ANALYTICS FROM UNSTRUCTURED OR SEMI-STRUCTURED RESOURCES, IN A SCALABLE MANNER. IT WILL BE PROVIDED AN INTRODUCTION TO WEB APPLICATIONS FOR ANALYTICS VISUALIZATION, INCLUDING D3.JS AND TECHNOLOGY STACKS SUCH AS APACHE SOLR+BANANA AND ELASTICSEARCH+KIBANA. AT THE END OF THE COURSE, THE STUDENT WILL BE ABLE TO USE THE MAIN TECHNOLOGICAL TOOLS FOR ACQUIRING, STORING, PROCESSING, AND ANALYZING BIG DATA. FURTHERMORE, THE STUDENT WILL BE ENCOURAGED TO CARRY OUT GROUP WORK AND APPLY THE ACQUIRED KNOWLEDGE TO IMPLEMENT A PROJECT EXHIBITING BIG DATA ANALYTICS FUNCTIONALITIES IN A CHOSEN FIELD (E.G., SOCIAL MEDIA, WEB INTELLIGENCE, SMART ENVIRONMENT, ETC.). THE OBJECTIVE CONSISTS IN EXERCISING THE ABILITY TO SELECT AND ADOPT SUITABLE TECHNOLOGIES DEPENDING ON HETEROGENEOUS REQUIREMENTS COMING FROM THE PROJECT CONTEXT. |
Prerequisites | |
---|---|
IT IS DESIRABLE THAT STUDENTS KNOW: THE BASIC CONCEPTS OF ALGORITHMS AND DATA STRUCTURES; AT LEAST A PROGRAMMING LANGUAGE AMONG JAVA, PYTHON, SCALA, TO WRITE SIMPLE PROGRAMS; THE BASICS OF DATABASES AND SQL. |
Contents | |
---|---|
AFTER A BRIEF INTRODUCTION TO THE MAIN LEARNING OBJECTIVES OF THE COURSE, STUDENTS WILL BE INTRODUCED TO THE BIG DATA WORLD. IN THE EARLY PART OF THE COURSE, THE STUDENTS WILL BE ENCOURAGED TO WORK IN TEAM DEFINING A PROJECT WORK IN WHICH APPLY THE KNOWLEDGE ACQUIRED DURING THE CLASSES FOLLOWING A STEP-BY-STEP APPROACH. THE COURSE WILL BE COMPOSED OF THE FOLLOWING MAIN PARTS. (4 HOURS) INTRODUCTION TO BIGDATA-ENABLED ARCHITECTURES BIGDATA LANDSCAPE REQUIREMENTS OF BIGDATA INFORMATION SYSTEM LAMBDA AND KAPPA ARCHITECTURE (4 HOURS, ONE OF WHICH ARE LABORATORY ACTIVITIES) ACQUISITION SERIALIZATION AND EXCHANGE DATA FORMATS: JSON, AVRO, PARQUET, ETC. REST AND STREAM API FOR ACCESSING TWITTER, DROPBOX, ETC. (10 HOURS, SEVEN OF WHICH ARE LABORATORY ACTIVITIES) DISTRIBUTED PROCESSING HADOOP AND RELATED TECHNOLOGIES. SPARK, AND OTHER BIG DATA PROCESSING ENGINES. HANDS ON SPARK DATAFRAME HANDS ON SPARK MACHINE LEARNING (10 HOURS, SEVEN OF WHICH ARE LABORATORY ACTIVITIES) STORAGE INTRODUCTION TO NOSQL DATABASE, SUCH AS KEY-VALUE STORE, DOCUMENT-ORIENTED DATABASE, COLUMN-ORIENTED AND GRAPH DB. HANDS ON MONGODB HANDS ON NEO4J (10 HOURS, FOUR OF WHICH ARE LABORATORY ACTIVITIES) DISTRIBUTED STREAM PROCESSING INTRODUCTION TO DISTRIBUTED DATA STREAM STREAM PROCESSING. APACHE STORM, SPARK STREAMING, KAFKA STREAMS HANDS ON SPARK STREAMING HANDS ON KAFKA STREAMS (4 HOURS, TWO OF WHICH ARE LABORATORY ACTIVITIES) BIG DATA ANALYTICS INTRODUCTION TO ANALYTICS VISUALIZATION THROUGH A WEB APPLICATION CONSIDERING D3.JS AND THE MOST USED TECHNOLOGICAL STACKS: APACHE SOLR AND BANANA, ELASTICSEARCH AND KIBANA HANDS ON APACHE SOLR AND BANANA |
Teaching Methods | |
---|---|
THE COURSE AIMS TO ENCOURAGE STUDENTS TO THE LIFELONG LEARNING PROCESS, WHICH INVOLVES THE CONTINUOUS UPDATING (THROUGHOUT LIFE) OF KNOWLEDGE AND SKILLS, TRYING TO STIMULATE CURIOSITY AND INTEREST IN INFORMATION TECHNOLOGY AND NEW TECHNOLOGIES ATTAINING WITH THE MATTER OF THE COURSE. IN ORDER TO GET THEM USED TO SELF-LEARNING, STUDENTS WILL BE INVITED TO DEEPEN THE TOPICS OF THE COURSE BY OFFERING THEM ACCESS TO ONLINE RESOURCES OF PARTICULAR INTEREST. DURING THE COURSE THE TEACHER WILL MAKE AMPLE USE OF EXAMPLES, GUIDED EXERCISES. FROM A STRUCTURAL POINT OF VIEW, THE LESSONS WILL CONSIST OF (21 HOURS) FRONTAL LESSONS. (21 HOURS) LABORATORY ACTIVITIES. ATTENDANCE TO LESSONS IS NOT MANDATORY BUT STRONGLY RECOMMENDED. |
Verification of learning | |
---|---|
THE ACHIEVEMENT OF THE COURSE OBJECTIVES IS CERTIFIED THROUGH PASSING AN EXAM GRADED IN THIRTIETHS, DIVIDED INTO TWO PARTS: A THEORETICAL TEST AND A PRACTICAL PROJECT, EACH WITH A MINIMUM PASSING THRESHOLD. THEORETICAL TEST: THIS CONSISTS OF A 40-MINUTE ORAL PRESENTATION WHERE THE STUDENT DISCUSSES A TOPIC OF TECHNOLOGICAL, METHODOLOGICAL, AND/OR APPLICATIVE INTEREST FROM THE COURSE. THE PRESENTATION SHOULD BE SUPPORTED BY INDIVIDUAL RESEARCH THAT LINKS TO THE THEMES COVERED IN THE LESSONS. THE EVALUATION OF THE ORAL TEST CONSIDERS THE STUDENT’S PRESENTATION SKILLS, KNOWLEDGE OF COURSE TOPICS, AND CRITICAL ANALYSIS. PRACTICAL PROJECT: THE PROJECT CAN BE CARRIED OUT INDIVIDUALLY OR IN GROUPS AND MUST BE SUBMITTED BEFORE THE ORAL EXAM. IT SHOULD INCLUDE ALL PHASES OF A TYPICAL DATA ANALYSIS PIPELINE: ACQUISITION, STORAGE, PROCESSING, AND VISUALIZATION. DOCUMENTATION OF GROUP PROJECTS MUST SPECIFY EACH MEMBER’S CONTRIBUTION. THE EVALUATION IS BASED ON THE TECHNICAL VALIDITY OF THE CHOICES MADE, THE RELEVANCE OF THE ADOPTED DATA ANALYSIS FLOW, AND THE CLARITY OF THE PRESENTED ANALYSES. THE FINAL GRADE IS THE AVERAGE OF THE TWO PARTS, EXPRESSED IN THIRTIETHS, WITH THE POSSIBILITY OF HONORS. HONORS ARE AWARDED IF THE CANDIDATE DEMONSTRATES MASTERY OF THEORETICAL AND OPERATIONAL CONTENT, WITH THE ABILITY TO PRESENT AND ELABORATE AUTONOMOUSLY EVEN IN CONTEXTS DIFFERENT FROM THOSE PROPOSED BY THE TEACHER. THE MINIMUM SCORE TO PASS THE EXAM IS 18/30, ASSIGNED IN CASES OF SIGNIFICANT UNCERTAINTIES IN THE USE OF TERMINOLOGY AND CONCEPTS STUDIED, AND INAPPROPRIATE USE OF TOOLS IN THE PROJECT. THE MAXIMUM SCORE, 30/30, IS AWARDED WHEN THE STUDENT SHOWS COMPLETE KNOWLEDGE AND OPERATIONAL CAPABILITY IN THE CARRIED-OUT PROJECT. |
Texts | |
---|---|
MARZ, N., & WARREN, J. (2015). BIG DATA: PRINCIPLES AND BEST PRACTICES OF SCALABLE REAL-TIME DATA SYSTEMS. NEW YORK; MANNING PUBLICATIONS CO. SUGGESTED READINGS: BAHGA, ARSHDEEP, AND VIJAY MADISETTI. BIG DATA SCIENCE & ANALYTICS: A HANDS-ON APPROACH. VPT, 2016. |
More Information | |
---|---|
LINKS TO ADDITIONAL MATERIAL AND TEACHING MATERIALS WILL BE PROVIDED. |
BETA VERSION Data source ESSE3 [Ultima Sincronizzazione: 2024-11-18]