NATURAL LANGUAGE PROCESSING AND LARGE LANGUAGE MODELS

Nicola CAPUANO NATURAL LANGUAGE PROCESSING AND LARGE LANGUAGE MODELS

0622700126
DEPARTMENT OF INFORMATION AND ELECTRICAL ENGINEERING AND APPLIED MATHEMATICS
EQF7
COMPUTER ENGINEERING
2024/2025



YEAR OF COURSE 2
YEAR OF DIDACTIC SYSTEM 2022
AUTUMN SEMESTER
CFUHOURSACTIVITY
324LESSONS
324LAB
Objectives
THE COURSE PROVIDES THE THEORETICAL, METHODOLOGICAL, TECHNOLOGICAL, AND OPERATIONAL KNOWLEDGE RELATED TO THE AUTOMATIC UNDERSTANDING OF LANGUAGE AND TEXT, FRAMING THE INNOVATIVE PARADIGMS INTRODUCED BY LARGE LANGUAGE MODELS WITHIN THE GENERAL FRAMEWORK FOR THE IMPLEMENTATION OF NATURAL LANGUAGE PROCESSING SYSTEMS AND THE NUMEROUS MODERN APPLICATIONS OF THESE TECHNOLOGIES.

KNOWLEDGE AND UNDERSTANDING
BASIC CONCEPTS ON NATURAL LANGUAGE PROCESSING SYSTEMS. STANDARD LANGUAGE MODELS. LARGE LANGUAGE MODELS BASED ON TRANSFORMERS. NATURAL LANGUAGE PROCESSING APPLICATIONS WITH LARGE LANGUAGE MODELS. PROMPT ENGINEERING. FINE TUNING OF LARGE LANGUAGE MODELS.

ABILITY TO APPLY KNOWLEDGE AND UNDERSTANDING
DESIGN AND IMPLEMENTATION OF A NATURAL LANGUAGE PROCESSING SYSTEM BASED ON LARGE LANGUAGE MODELS, EFFECTIVELY INTEGRATING EXISTING TECHNOLOGIES AND TOOLS AND OPTIMALLY CONFIGURING THE OPERATING PARAMETERS
Prerequisites
PROPAEDEUTIC EXAM: MACHINE LEARNING
Contents
TEACHING UNIT 1: FUNDAMENTALS OF NATURAL LANGUAGE PROCESSING
(LESSON/PRACTICE/WORKSHOP HOURS 10/6/0)
- 1 (2 HOURS LESSON): BASIC CONCEPTS, TASKS, EVOLUTION AND APPLICATIONS OF NATURAL LANGUAGE PROCESSING
- 2 (2 HOURS LESSON): REPRESENTING A TEXT, TOKENIZATION, STEMMING, LEMMATIZATION, BAG OF WORDS, N-GRAMS, SIMILARITY MEASURES, WORD EMBEDDINGS
- 3 (2 HOURS LESSON): TF-IDF VECTORS, CLASSIFICATION AND CLUSTERING OF TEXT, WORD EMBEDDINGS
- 4 (2 HOURS LESSON): NEURAL NETWORKS AND TEXT ANALYSIS, APPLICATION OF CNN, RECURRENT NETWORKS AND LSTM
- 5 (2 HOURS PRACTICE): CREATE A TEXT CLASSIFIER
- 6 (2 HOURS LESSON): EXTRACTION OF INFORMATION FROM THE TEXT, NAMED ENTITY RECOGNITION AND QUESTION ANSWERING
- 7 (4 HOURS PRACTICE): CREATION OF A SIMPLE CHATBOT IN PYTHON AND SPACY/RASA
KNOWLEDGE AND UNDERSTANDING ABILITY: KNOWLEDGE OF THE BASIC CONCEPTS AND TECHNIQUES FOR THE PROCESSING OF NATURAL LANGUAGE.
APPLIED KNOWLEDGE AND UNDERSTANDING: APPLY BASIC CONCEPTS AND TECHNIQUES TO THE CREATION OF SIMPLE TEXT CLASSIFICATION AND ANALYSIS TOOLS.

TEACHING UNIT 2: TRANSFORMERS
(LESSON/PRACTICE/WORKSHOP HOURS 6/10/0)
- 1 (2 HOURS LESSON): SELF ATTENTION, MULTI-HEAD ATTENTION, POSITIONAL ENCODING, MASKING
- 2 (2 HOURS LESSON): ENCODER AND DECODER OF A TRANSFORMER
- 3 (2 HOURS PRACTICE): INTRODUCTION TO HUGGINGFACE
- 4 (2 HOURS PRACTICE): ENCODER-DECODER OR SEQ2SEQ MODELS (TRANSLATION AND SUMMARIZATION)
- 5 (2 HOURS PRACTICE): ENCODER-ONLY MODELS (SENTENCE CLASSIFICATION AND NAMED ENTITY RECOGNITION)
- 6 (2 HOURS PRACTICE): DECODER-ONLY MODELS (TEXT GENERATION)
- 7 (2 HOURS LESSON): DEFINITION AND TRAINING OF A LARGE LANGUAGE MODEL
- 8 (2 HOURS PRACTICE): TRAINING A LARGE LANGUAGE MODEL
KNOWLEDGE AND UNDERSTANDING: KNOWLEDGE OF BASIC AND ADVANCED TRANSFORMERS CONCEPTS.
APPLIED KNOWLEDGE AND UNDERSTANDING: ABILITY TO DESIGN AND CREATE TRANSFORMERS WITH DIFFERENT ARCHITECTURES SUITABLE FOR SOLVE SPECIFIC REAL PROBLEMS.

TEACHING UNIT 3: PROMPT ENGINEERING
(LESSON/PRACTICE/WORKSHOP HOURS 2/4/0)
- 1 (2 HOURS LESSON): ZERO-SHOT, FEW-SHOT AND CHAIN-OF-THOUGHT PROMPTING, SELF CONSISTENCY, GENERATED KNOWLEDGE, PROMPT CHAINING, REACT, RETRIEVAL AUGMENTED GENERATION (RAG)
- 2 (2 HOURS PRACTICE): APPLICATION OF BASIC PROMPTING TECHNIQUES
- 3 (2 HOURS PRACTICE): PRACTICE ON RAG AND LANGCHAIN
KNOWLEDGE AND UNDERSTANDING: KNOWLEDGE OF BASIC AND ADVANCED PROMPT ENGINEERING TECHNIQUES.
APPLIED KNOWLEDGE AND UNDERSTANDING: APPLICATION OF PROMPT ENGINEERING METHODOLOGIES AND TOOLS TO SOLVE REAL PROBLEMS OF DIFFERENT COMPLEXITY.

TEACHING UNIT 4: LLM FINE TUNING
(LESSON/PRACTICE/WORKSHOP HOURS 4/4/2)
- 1 (2 HOURS LESSON): FEATURE-BASED FINE TUNING, UPDATING THE OUTPUT LAYERS, UPDATING ALL LAYERS, PARAMETER EFFICIENT FINE TUNING (PEFT) AND LOW RANK ADAPTATION (LORA)
- 2 (2 HOURS PRACTICE): PRACTICE ON LLM FINE TUNING
- 3 (2 HOURS LESSON): REINFORCEMENT LEARNING WITH HUMAN FEEDBACK (RLHF)
- 4 (2 HOURS PRACTICE): PRACTICE ON RLHF
- 5 (2 HOURS WORKSHOP): FINAL PROJECT
KNOWLEDGE AND UNDERSTANDING: KNOWLEDGE OF BASIC AND ADVANCED FINE TUNING TECHNIQUES OF LARGE LANGUAGE MODELS.
APPLIED KNOWLEDGE AND UNDERSTANDING: APPLICATION OF FINE TUNING METHODOLOGIES TO ADAPT LARGE LANGUAGE MODELS TO SPECIFIC APPLICATIONS OF INTEREST.

TOTAL HOURS LECTURE/PRACTICE/LABORATORY 22/24/2
Teaching Methods
THE COURSE INCLUDES LECTURES AND CLASSROOM EXERCISES. THE LECTURES WILL PROVIDE STUDENTS WITH FUNDAMENTAL KNOWLEDGE ON THE MAIN BASIC AND ADVANCED TECHNIQUES FOR THE REPRESENTATION, ANALYSIS AND CLASSIFICATION OF TEXT IN NATURAL LANGUAGE WITH LARGE LANGUAGE MODELS. THE EXERCISES WILL DEVELOP THE ABILITY TO APPLY THESE TECHNIQUES TO THE CREATION OF TEXT CLASSIFICATION AND ANALYSIS AND QUESTION ANSWERING TOOLS. PARTICIPATION IN LECTURES IS MANDATORY AND A MINIMUM ATTENDANCE OF 70% IS REQUIRED TO TAKE THE EXAM. ATTENDANCE WILL BE MONITORED VIA THE AUTOMATIC EASYBADGE SYSTEM PROVIDED BY THE UNIVERSITY.
Verification of learning
THE EXAM CONSISTS OF A PROJECT WORK AND AN ORAL TEST. THE PROJECT WORK REQUIRES STUDENTS TO CRITICALLY APPLY THE METHODOLOGIES LEARNED DURING THE COURSE TO A PRACTICAL CASE. THE ORAL TEST WILL EVALUATE THE THEORETICAL SKILLS ACQUIRED DURING THE COURSE, THE ABILITY TO ARGUE THE DESIGN CHOICES MADE IN THE PROJECT WORK AND TO ANSWER QUESTIONS ON SPECIFIC TOPICS COVERED IN THE LECTURES. THE FINAL MARK WILL BE DETERMINED BY THE AVERAGE OF THE MARKS OBTAINED IN THE TWO TESTS.
Texts
REFERENCE TEXT:
H. LANE, C. HOWARD, H. M. HAPKE: NATURAL LANGUAGE PROCESSING IN ACTION - UNDERSTANDING, ANALYZING AND GENERATING TEXT WITH PYTHON, MANNING.

MATERIALE DIDATTICO INTEGRATIVO SARÀ DISPONIBILE NELLA SEZIONE DEDICATA DELL'INSEGNAMENTO ALL'INTERNO DELLA PIATTAFORMA E-LEARNING DI ATENEO (HTTPS://ELEARNING.UNISA.IT) ACCESSIBILE AGLI STUDENTI DEL CORSO TRAMITE LE CREDENZIALI UNICHE DI ATENEO
Lessons Timetable

  BETA VERSION Data source ESSE3 [Ultima Sincronizzazione: 2024-11-18]