FORMAL TOOLS FOR BIOINFORMATICS

Clelia DE FELICE FORMAL TOOLS FOR BIOINFORMATICS

0522500145
COMPUTER SCIENCE
EQF7
COMPUTER SCIENCE
2022/2023



YEAR OF COURSE 1
YEAR OF DIDACTIC SYSTEM 2016
AUTUMN SEMESTER
CFUHOURSACTIVITY
1STRUMENTI FORMALI PER LA BIOINFORMATICA
540LESSONS
18LAB
2STRUMENTI FORMALI PER LA BIOINFORMATICA
324LAB
Objectives
KNOWLEDGE AND UNDERSTANDING
•THE STUDENT MUST HAVE KNOWLEDGE OF COMPUTATIONAL MODELS AND GENERATING SYSTEMS, OF POTENTIALITY, LIMITATIONS, MATHEMATICAL ASPECTS AND AREAS OF APPLICATION OF THESE MODELS.

•MASTERY OF THE TECHNIQUES OF COMBINATORIAL ALGORITHMS ON WORDS AND DATA STRUCTURES USED TO BE ABLE TO TACKLE THE STUDY AND SOLUTION OF COMPUTATIONAL PROBLEMS OF ANALYSIS AND COMPARISON OF BIOLOGICAL SEQUENCES.

•KNOWLEDGE OF MACHINE AND DEEP LEARNING APPLICATIONS IN GENOMICS PROBLEMS.

APPLYING KNOWLEDGE AND UNDERSTANDING
THE STUDENT MUST BE ABLE:
•TO RECOGNIZE WHETHER A PROBLEM CAN BE SOLVED USING SOLUTIONS BASED ON ONE OF THE STUDIED COMPUTATIONAL MODELS.
•TO KNOW HOW TO MODEL THE SOLUTION OF BIOLOGICAL PROBLEMS ON GENOMIC SEQUENCES THROUGH THE FORMULATION OF COMBINATORIAL PROBLEMS
•TO KNOW HOW TO USE GENOMIC DATABASES TO EXTRACT INFORMATION OF INTEREST.
•TO CRITICALLY EVALUATE THE RESULTS, UNDERSTANDING THE DIFFERENCES THAT DIFFERENT TYPES OF TOOLS AND APPROACHES MAY HAVE ON THE OBTAINED DATA.
Prerequisites
IT IS PREFERABLE, BUT NOT MANDATORY, THAT STUDENTS HAVE KNOWLEDGE OF C/C++ AND PYTHON (IN PARTICULAR TENSORFLOW AND PYTORCH LIBRARIES). SIMILARLY, IT IS PREFERABLE, BUT NOT MANDATORY, THAT STUDENTS HAVE KNOWLEDGE OF BASIC COMPUTATION MODELS (FINITE AUTOMATA AND TURING MACHINES).
Contents
FIRST PART (40 HOURS): WORDS AND LANGUAGES. DETERMINISTIC AND NONDETERMINISTIC FINITE AUTOMATA. REGULAR EXPRESSIONS AND LANGUAGES. EQUIVALENCE AND MINIMIZATION OF AUTOMATA. REGULAR GRAMMARS.CONTEXT-FREE GRAMMARS AND LANGUAGES. PARSE TREES. PUSHDOWN AUTOMATA. OUTLINE OF THE CHOMSKY HIERARCHY. RECENT AND RELEVANT APPLICATIONS OF COMPUTATION MODELS AND MAIN ASPECTS OF COMBINATORICS ON WORDS WITH APPLICATION TO BIOINFORMATICS.

SECOND PART (16 HOURS):
INTRODUCTION TO BIOINFORMATICS: FUNDAMENTAL CONCEPTS OF MOLECULAR BIOLOGY AND GENOMICS. DATA STRUCTURES FOR BIOINFORMATICS: BLOOM FILTERS, DE BRUJIN GRAPHS, OVERLAP GRAPHS, SUFFIX ARRAY, WHEELER GRAPH. ALGORITHMS FOR BIOINFORMATICS: SEARCHING FOR PATTERNS IN BIOLOGICAL SEQUENCES. MULTIPLE ALIGNMENT TECHNIQUES OF SEQUENCES. APPLICATION OF THE BURROWS-WHEELER TRANSFORM IN THE SEARCH FOR PATTERNS IN BIOLOGICAL SEQUENCES.

THIRD PART (16 ORE):
INTRODUCTION TO ML (UNSUPERVISED LEARNING, SUPERVISED LEARNING) AND DL (GRAPH NEURAL NETWORKS, CONTRADICTORY GENERATIVE NETWORK, SIAMESE NETWORKS). APPLICATIONS OF ML AND DL ALGORITHMS IN BIOINFORMATICS: DEFINITION OF EMBEDDED SIGNATURES FOR THE CHARACTERIZATION OF SEQUENCES (SEQUENCE EMBEDDING) AND TREES (GRAPH EMBEDDING/GEOMETRIC DEEP LEARNING). USE OF EMBEDDED SIGNATURES FOR THE DESIGN OF SIMILARITY MEASUREMENTS AND ML AND DL METHODS FOR DATA COMPARISON IN METAGENOMICS (SEQUENCE COMPARISON) AND CANCER PHYLOGENY (TREE COMPARISON). SOME RESEARCH CONTEXTS: METAGENOMIC READS/CONTIGS BINNING, COMPARATIVE GENOMICS, CANCER PHYLOGENY AMALGAMATION.
Teaching Methods
CLASS LESSONS. THE LECTURES INCLUDE EXERCISES THAT WILL PRESENT EXAMPLES OF APPLICATIONS OF THOSE CONCEPTS, WITH THE AIM OF ENABLING THE STUDENT TO APPLY THE KNOWLEDGE ACQUIRED.
PART OF THE LESSONS (THOSE RELATED TO THE THIRD PART), WILL BE HELD IN THE LABORATORY.
Verification of learning
THE EXAM WILL CONSIST IN THE DEVELOPMENT OF A PROJECT (THEORETICAL AND/OR PRACTICAL), INDIVIDUAL OR IN GROUP, RELATED TO SPECIFIC PROBLEMS PRESENTED DURING THE COURSE, ACCOMPANIED BY A SHORT PRESENTATION SEMINAR.
THE EVALUATION CRITERIA INCLUDE THE COMPLETENESS AND CORRECTNESS OF THE LEARNING AND THE CLARITY OF THE PRESENTATION.
THE MINIMUM GRADE (18) IS ASSIGNED WHEN THE STUDENT HAS A LIMITED KNOWLEDGE OF THE COMPUTATIONAL MODELS INTRODUCED AND SHOWS UNCERTAINTIES IN THE APPLICATION OF THE STUDIED METHODS.
THE MAXIMUM GRADE (30) IS ASSIGNED WHEN THE STUDENT SHOWS A COMPLETE AND IN-DEPTH UNDERSTANDING OF THE CONCEPTS AND METHODS STUDIED. IT IS ALSO ABLE TO SOLVE THE PROPOSED PROBLEMS GIVING EFFICIENT AND ACCURATE SOLUTIONS AND SHOWS THE ABILITY TO LINK DIFFERENT CONCEPTS TOGETHER.
''LODE'' IS GIVEN WHEN THE CANDIDATE DEMONSTRATES SIGNIFICANT MASTERY OF THE THEORETICAL AND OPERATIONAL CONTENT AND SHOWS THAT SHE/HE IS ABLE TO PRESENT THE TOPICS WITH OWNERSHIP OF LANGUAGE AND AUTONOMOUS PROCESSING SKILLS.
Texts
1.M. SIPSER, INTRODUZIONE ALLA TEORIA DELLA COMPUTAZIONE, APOGEO, 2016
2.J. HOPCROFT, R. MOTWANI, J. ULLMAN, AUTOMI, LINGUAGGI E CALCOLABILITÀ, ADDISON WESLEY PEARSON EDUCATION ITALIA S.R.L, TERZA EDIZIONE, 2009.
3.M. LOTHAIRE, APPLIED COMBINATORICS ON WORDS,
4.M. LOTHAIRE, ALGEBRAIC COMBINATORICS ON WORDS
5.N.C. JONES, P.A. PEVZNER. AN INTRODUCTION TO BIOINFORMATICS ALGORITHMS. HTTP://BIOINFORMATICSALGORITHMS.COM
6.D. GUSFIELD. ALGORITHM ON STRINGS, TREES, AND SEQUENCES: COMPUTER SCIENCE AND COMPUTATIONAL BIOLOGY. CAMBRIDGE PRESS.
7.INTRODUCTION TO MACHINE LEARNING, LECTURE NOTES, MIT, 2019.
8.IAN GOODFELLOW, YOSHUA BENGIO, AARON COURVILLE, DEEP LEARNING, MIT PRESS, 2016.
9.Z.R. YANG, MACHINE LEARNING APPROACHES TO BIOINFORMATICS, SCIENCE, ENGINEERING, AND BIOLOGY INFORMATICS.
10.HABIB IZADKHAH, DEEP LEARNING IN BIOINFORMATICS TECHNIQUES AND APPLICATIONS IN PRACTICE, 1ST EDITION, ELSEVIER.
11.SELECTION OF SCIENTIFIC PAPERS
More Information
COURSE SITE ON
HTTP://ELEARNING.INFORMATICA.UNISA.IT/EL-PLATFORM/
CONTACTS:
CDEFELICE@UNISA.IT, RZIZZA@UNISA.IT, RZACCAGNINO@UNISA.IT
HTTP://DOCENTI.UNISA.IT/001119/HOME
  BETA VERSION Data source ESSE3 [Ultima Sincronizzazione: 2023-06-21]