NATURAL LANGUAGE PROCESSING

Genoveffa TORTORA NATURAL LANGUAGE PROCESSING

0522500138
COMPUTER SCIENCE
EQF7
COMPUTER SCIENCE
2023/2024



YEAR OF DIDACTIC SYSTEM 2016
SPRING SEMESTER
CFUHOURSACTIVITY
432LESSONS
216LAB
Objectives
THE GOAL OF THIS COURSE IS TO PROVIDE STUDENTS WITH METHODOLOGICAL AND TECHNOLOGICAL SKILLS TO DESIGN AND DEVELOP NLP SYSTEMS.

KNOWLEDGE AND UNDERSTANDING
•KNOWLEDGE OF LINGUISTIC PHENOMENA THAT CHARACTERIZE AND MAKE HARD THE DEVELOPMENT OF NLP APPROACHES;
•DEEP KNOWLEDGE OF THE MAIN TECHNIQUES FOR STRUCTURAL ANALYSIS AND SEMANTIC INTERPRETATION OF TEXTS;
• DEEP KNOWLEDGE OF PRACTICAL TOOLS TO PERFORM NLP TASKS;
• KNOWLEDGE OF NLP APPLICATIONS, SUCH AS MACHINE TRANSLATION, INFORMATION EXTRACTION, SENTIMENT ANALYSIS, AND INTERACTIVE DIALOG SYSTEMS (BOTH WRITTEN AND SPOKEN)

APPLYING KNOWLEDGE AND UNDERSTANDING:
•KNOW HOW TO ANALYSE GENERAL PROBLEMS AND APPLY PROPER STRATEGIES IN NATURAL LANGUAGE PROCESSING (NLP);
•KNOW HOW TO CHARACTERIZE THE ROLE OF BOTH DATA AND APPLIED MACHINE LEARNING MODELS WITHIN NLP SYSTEMS;
•KNOW HOW TO DESIGN AND IMPLEMENT APPLICATION SOLUTIONS FOR SOLVING SOME NLP PROBLEMS.
Prerequisites
STUDENTS SHOULD BE FAMILIAR WITH DI PROBABILITY, LINEAR ALGEBRA, PROGRAMMING, AND MACHINE LEARNING METHODS. NO PROPAEDEUTIC TEACHING ARE REQUIRED.
Contents
AFTER INTRODUCING THE NATURAL LANGUAGE PROCESSING, BY INCLUDING ITS CHARACTERIZATION AS A DISCIPLINE THAT COMBINES COMPUTER SCIENCE METHODS WITH RESEARCH INSIGHTS FROM LINGUISTICS (THE STUDY OF HUMAN LANGUAGE), THE COURSE WILL FOCUS ON THE FOLLOWING TOPICS:

STRUCTURAL ANALYSIS OF TEXTS
•WORDS, WORD COUNTING, LEXICONS (2 HOURS)
•TEXT NORMALIZATION (2 HOURS)
•DISTANCE MEASURES (2 HOURS)
•PART-OF-SPEECH TAGGING (2 HOURS)

TEXT SEMANTICS AND EMERGING ARCHITECTURES
•VECTOR SEMANTICS AND WORD EMBEDDINGS (2 HOURS)
•TEXT CLASSIFICATION WITH NEURAL NETWORKS (3 HOURS)
•RECURRENT NEURAL NETWORKS AND LANGUAGE MODELS (3 HOURS)
•TRANSFORMER ARCHITECTURE (2 HOURS)
•PRETRAINED MODELS (2 HOURS)

NLP: THE MAIN APPLICATIONS
•INTERACTIVE DIALOG (2 HOURS)
•MACHINE TRANSLATION (2 HOURS)
•INFORMATION RETRIEVAL (2 HOURS)
•SENTIMENT ANALYSIS (2 HOURS)
•TEXT SUMMARIZATION (2 HOURS)
•NATURAL LANGUAGE GENERATION (2 HOURS)

LABORATORY:
•TEXT PROCESSING WITH PYTHON (3 HOURS)
•CATEGORIZATION AND WORD TAGGING (3 HOURS)
•EXTRACTING INFORMATION FROM TEXT (3 HOURS)
•TRANSFORMER ARCHITECTURE: APPLICATION EXAMPLES (3 HOURS)
•DESIGN AND DEVELOPMENT OF NLP SOLUTIONS: PRESENTATION OF CASE STUDIES (4 HOURS)
Teaching Methods
THE COURSE INCLUDES:
•FRONTAL LECTURES TO TRANSFER THE KNOWLEDGE RELATED TO THE COURSE CONTENTS (4 CFUS/32 HOURS)
•LABORATORY SESSIONS AND TUTORIALS TO TRAIN STUDENTS ON PRACTICAL AND COLLABORATIVE ACTIVITIES (2 CFUS/16 HOURS)
•EACH LECTURE WILL INCLUDE BOTH THE PRESENTATION BY TEACHERS OF THE COURSE CONTENTS AND TUTORIALS OF THEIR PRACTICAL APPLICATION
Verification of learning
•THE EXAM CONSISTS OF A PRELIMINARY WRITTEN TEST AND AN ORAL EXAMINATION TO VERIFY THE ACQUIRED KNOWLEDGE AND TO DISCUSS THE ACTIVITIES CARRIED OUT DURING THE COURSE. ACTIVITIES INCLUDE THE REALIZATION OF A PROJECT IN A GROUP. WRITTEN EXAMS CAN BE REPLACED BY PROGRESSIVE ASSESSMENT TESTS THAT INCLUDE QUESTIONS CONCERNING BOTH THE KNOWLEDGE AND UNDERSTANDING OF LECTURE ARGUMENTS AND THE ABILITY TO APPLY THEM THROUGH EXERCISES.

•WRITTEN EXAMINATION (2 HOURS): TO EVALUATE THE GAINED KNOWLEDGE ON NATURAL LANGUAGE PROCESSING TECHNIQUES AND SOLUTIONS, THE TESTS WILL BE COMPOSED OF OPEN QUESTIONS AND EXERCISES. THE SCORES ARE ASSIGNED DEPENDING ON THE COMPLEXITY OF THE QUESTIONS OR EXERCISES (BETWEEN 4 AND 10 POINTS). THE EVALUATION CRITERIA INCLUDE THE CORRECTNESS AND COMPLETENESS OF THE LEARNING AND THE CLARITY OF THE PRESENTATION. THE FINAL MARK IS OUT OF 30.

•ASSESSMENT TESTS: NON-CUMULATIVE TESTS COULD BE DELIVERED. STUDENTS WHO WILL PASS THE TESTS WILL BE EXEMPTED FROM THE WRITTEN EXAMINATION. THE AIM IS TO ENCOURAGE STUDENTS TO FOLLOW EFFECTIVELY THE COURSE.

•PROJECT: THE PROJECT ALLOWS THE STUDENT TO PRACTICE ON THE CONTENTS LEARNED DURING THE COURSE. DURING THE ORAL EXAM, THE PROJECT WILL BE DISCUSSED DIRECTLY WITH THE TEACHER THAT WILL VERIFY THE FOLLOWING:
•ADHERENCE TO THE REQUIREMENTS
• COMPLETENESS AND THE CORRECTNESS OF THE PRODUCED SOFTWARE
•COMPREHENSION OF THE REALIZED ARTIFACTS
•ABILITY TO DESCRIBE THE OBTAINED RESULTS AND TO POINT OUT ANY LIMITATIONS AND PROBLEMS STILL OPEN.

•ORAL EXAMINATION AIMS TO EVALUATE THE GENERAL KNOWLEDGE OF THE STUDENT WITH RESPECT TO THE ENTIRE COURSE PROGRAM. THE EVALUATION CRITERIA INCLUDE THE COMPLETENESS AND CORRECTNESS OF THE LEARNING AND THE CLARITY OF THE PRESENTATION.

•FINAL EVALUATION: THE EVALUATION WILL BE GIVEN BY THE AVERAGE SCORE OF ASSESSMENT TESTS (OR THE WRITTEN EXAMINATION) AND THE POINTS OBTAINED BY DISCUSSING THE PROJECT AND THE ORAL TEST.
Texts
COURSE BOOK:
D. JURAFSKY AND J. MARTIN. SPEECH AND LANGUAGE PROCESSING, PRENTICE HALL, THIRD EDITION (2022).

RECOMMENDED READING:
•EISENSTEIN, JACOB. INTRODUCTION TO NATURAL LANGUAGE PROCESSING. MIT PRESS, 2019.
•HARDENIYA, NITIN, ET AL. NATURAL LANGUAGE PROCESSING: PYTHON AND NLTK. PACKT PUBLISHING LTD, 2016.
•BIRD, STEVEN, EWAN KLEIN, AND EDWARD LOPER. NATURAL LANGUAGE PROCESSING WITH PYTHON: ANALYZING TEXT WITH THE NATURAL LANGUAGE TOOLKIT. " O'REILLY MEDIA, INC.", 2009.
More Information
ATTENDANCE OF LECTURES IS STRONGLY ENCOURAGED. STUDENTS MUST SPEND A CONSIDERABLE AMOUNT OF TIME STUDYING AT HOME, AND FOR DEVELOPING THE COURSE PROJECT.
INFORMATION CONCERNING THE COURSE IS AVAILABLE ON THE E-LEARNING PLATFORM OF THE DIPARTIMENTO DI INFORMATICA AT HTTP://ELEARNING.INFORMATICA.UNISA.IT/EL-PLATFORM/

CONTACTS
PROF.SSA GENOVEFFA TORTORA
TORTORA@UNISA.IT

PROF.SSA LOREDANA CARUCCIO
LCARUCCIO@UNISA.IT
  BETA VERSION Data source ESSE3 [Ultima Sincronizzazione: 2024-11-05]