NATURAL LANGUAGE PROCESSING

Genoveffa TORTORA NATURAL LANGUAGE PROCESSING

0522500138
COMPUTER SCIENCE
EQF7
COMPUTER SCIENCE
2022/2023

YEAR OF DIDACTIC SYSTEM 2016
SPRING SEMESTER
CFUHOURSACTIVITY
432LESSONS
216LAB
Objectives
THE GOAL OF THIS COURSE IS TO PROVIDE STUDENTS WITH METHODOLOGICAL AND TECHNOLOGICAL SKILLS TO DESIGN AND DEVELOP NLP SYSTEMS.

KNOWLEDGE AND UNDERSTANDING
•KNOWLEDGE OF LINGUISTIC PHENOMENA THAT CHARACTERIZE AND MAKE HARD THE DEVELOPMENT OF NLP APPROACHES;
•DEEP KNOWLEDGE OF THE MAIN TECHNIQUES FOT STRUCTURAL ANALYSIS AND SEMANTIC INTERPRETATION OF TEXTS;
• DEEP KNOWLEDGE OF PRACTICAL TOOLS TO PERFORM NLP TASKS;
• KNOWLEDGE OF NLP APPLICATIONS, SUCH AS QUESTION ANSWERING, MACHINE TRANSLATION, INFORMATION EXTRACTION, ANS INTERACTIVE DIALOG SYSTEMS (BOTH WRITTEN AND SPOKEN)

APPLYING KNOWLEDGE AND UNDERSTANDING:
•KNOW HOW TO ANALYSE THE GENERAL PROBLEMS AND APPLY PROPER STRATEGIES IN THE NATURAL LANGUAGE PROCESSING (NLP);
•KNOW HOW TO CHARACTERIZE THE ROLE OF BOTH DATA AND APPLIED MACHINE LEARNING MODELS WITHIN NLP SYSTEMS;
•KNOW HOW TO DESIGN AND IMPLEMENT MODELS FOR SOLVING SOME NLP PROBLEMS.
Prerequisites
STUDENTS SHOULD BE FAMILIAR WITH DI PROBABILITY, LINEAR ALGEBRA, PROGRAMMING, AND MACHINE LEARNING METHODS.
NO PREVIOUS COURSES ARE REQUIRED.
Contents

AFTER INTRODUCING THE NATURAL LANGUAGE PROCESSING, BY INCLUDING ITS CHARACTERIZATION AS A DISCIPLINE THAT COMBINES COMPUTER SCIENCE METHODS WITH RESEARCH INSIGHTS FROM LINGUISTICS (THE STUDY OF HUMAN LANGUAGE), THE COURSE WILL FOCUS ON THE FOLLOWING TOPICS:

STRUCTURAL ANALYSIS OF TEXTS
•WORDS, WORD COUNTING, LEXICONS (3 HOURS)
•DISTANCE MEASURES (2 HOURS)
•PART-OF-SPEECH TAGGING (2 HOURS)
•PROBABILISTIC LANGUAGE MODELING (4 HOURS)

TEXT SEMANTICS
•VECTOR SEMANTICS AND WORD EMBEDDINGS (3 HOURS)
•TEXT CLASSIFICATION WITH LANGUAGE MODELS (3 HOURS)
•TEXT CLASSIFICATION WITH NEURAL NETWORKS (4 HOURS)
•VITERBI ALGORITHM (DYNAMIC PROGRAMMING) (3 HOURS)

NLP: THE MAIN APPLICATIONS
•INTERACTIVE DIALOG (2 HOURS)
•MACHINE TRANSLATION (2 HOURS)
•INFORMATION EXTRACTION (2 HOURS)
•QUESTION ANSWERING SYSTEMS (1 HOUR)
•DEPENDENCY PARSING (1 HOUR)

LABORATORY:
•TEXT PROCESSING WITH PYTHON (3 HOURS)
•CATEGORIZATION AND WORD TAGGING (3 HOURS)
•EXTRACTING INFORMANTION FROM TEXT (3 HOURS)
•ANALYZING SENTENCE STRUCTURE (4 HOURS)
•ANALYZING THE MEANING OF SENTENCES (3 HOURS)

Teaching Methods
THE COURSE INCLUDES:
•FRONTAL LECTURES TO TRANSFER THE KNOWLEDGE RELATED TO THE COURSE CONTENTS (4 CFUS/32 HOURS)
•LABORATORY SESSIONS AND TUTORIALS TO TRAIN STUDENTS ON PRACTICAL AND COLLABORATIVE ACTIVITIES (2 CFUS/16 HOURS)
•EACH LECTURE WILL INCLUDE BOTH THE PRESENTATION BY TEACHERS OF THE COURSE CONTENTS AND TUTORIALS OF THEIR PRACTICAL APPLICATION
Verification of learning
•THE EXAM CONSISTS OF A PRELIMINARY WRITTEN TEST AND AN ORAL EXAMINATION TO VERIFY THE ACQUIRED KNOWLEDGE AND TO DISCUSS THE ACTIVITIES CARRIED OUT DURING THE COURSE. ACTIVITIES INCLUDE THE REALIZATION OF A PROJECT IN A GROUP. WRITTEN EXAMS CAN BE REPLACED BY PROGRESSIVE ASSESSMENT TESTS THAT INCLUDE QUESTIONS CONCERNING BOTH THE KNOWLEDGE AND UNDERSTANDING OF CLASSROOM ARGUMENTS AND THE ABILITY TO APPLY THEM THROUGH EXERCISES.

•WRITTEN EXAMINATION (2 HOURS): TO EVALUATE THE GAINED KNOWLEDGE ON ADVANCED DATABASE TECHNIQUES, THE TESTS WILL BE COMPOSED OF OPEN QUESTIONS AND EXERCISES. THE SCORES ARE ASSIGNED DEPENDING ON THE COMPLEXITY OF THE QUESTIONS OR EXERCISES (BETWEEN 4 AND 10 POINTS). THE EVALUATION CRITERIA INCLUDE THE CORRECTNESS AND COMPLETENESS OF THE LEARNING AND THE CLARITY OF THE PRESENTATION. THE FINAL MARK IS OUT OF 30.

•ASSESSMENT TESTS: NON-CUMULATIVE TESTS COULD BE DELIVERED. STUDENTS WHO WILL PASS THE TESTS WILL BE EXEMPTED FROM THE WRITTEN EXAMINATION. THE AIM IS TO ENCOURAGE STUDENTS TO FOLLOW EFFECTIVELY THE COURSE.

•PROJECT: THE PROJECT ALLOWS THE STUDENT TO PRACTICE ON THE CONTENTS LEARNED DURING THE COURSE. DURING THE ORAL EXAM, THE PROJECT WILL BE DISCUSSED DIRECTLY WITH THE TEACHER THAT WILL VERIFY THE FOLLOWING:
•COMPLETENESS AND THE CORRECTNESS OF THE PROJECT
•COMPREHENSION OF THE REALIZED ARTIFACTS
•LEVEL OF FAMILIARITY AND ABILITY TO MODIFY THE PRODUCED SOFTWARE.

•ORAL EXAMINATION AIMS TO EVALUATE THE GENERAL KNOWLEDGE OF THE STUDENT WITH RESPECT TO THE ENTIRE COURSE PROGRAM. THE EVALUATION CRITERIA INCLUDE THE COMPLETENESS AND CORRECTNESS OF THE LEARNING AND THE CLARITY OF THE PRESENTATION.

•FINAL EVALUATION: THE EVALUATION WILL BE GIVEN BY THE AVERAGE SCORE OF ASSESSMENT TESTS (OR THE WRITTEN EXAMINATION) AND THE POINTS OBTAINED BY DISCUSSING THE PROJECT AND THE ORAL TEST.
Texts
COURSE BOOK:
D. JURAFSKY AND J. MARTIN. SPEECH AND LANGUAGE PROCESSING, PRENTICE HALL, THIRD EDITION (2022).

RECOMMENDED READING:
•EISENSTEIN, JACOB. INTRODUCTION TO NATURAL LANGUAGE PROCESSING. MIT PRESS, 2019.
•HARDENIYA, NITIN, ET AL. NATURAL LANGUAGE PROCESSING: PYTHON AND NLTK. PACKT PUBLISHING LTD, 2016.
•BIRD, STEVEN, EWAN KLEIN, AND EDWARD LOPER. NATURAL LANGUAGE PROCESSING WITH PYTHON: ANALYZING TEXT WITH THE NATURAL LANGUAGE TOOLKIT. " O'REILLY MEDIA, INC.", 2009.
More Information
ATTENDANCE OF LECTURES IS STRONGLY ENCOURAGED. STUDENTS MUST SPEND A CONSIDERABLE AMOUNT OF TIME STUDYING AT HOME, AND FOR DEVELOPING THE COURSE PROJECT.
INFORMATION CONCERNING THE COURSE IS AVAILABLE ON THE E-LEARNING PLATFORM OF THE DIPARTIMENTO DI INFORMATICA AT HTTP://ELEARNING.INFORMATICA.UNISA.IT/EL-PLATFORM/

CONTACTS
PROF.SSA GENOVEFFA TORTORA
TORTORA@UNISA.IT

PROF.SSA LOREDANA CARUCCIO
LCARUCCIO@UNISA.IT
  BETA VERSION Data source ESSE3 [Ultima Sincronizzazione: 2024-08-21]