Natural Language Processing
Natural Language Processing 2022-23
Ysgol Cyfrifiadureg a Pheirianneg Electronig
Modiwl - Semester 2
Indicative content includes:
- NLP programming in Python using the NLTK and text processing methods using Unix tools such as sed, awk.
- Topics in Computational Linguistics and Natural Language Processing: Zipf’s Law; Heap’s Law; the sparse data problem; the zero frequency problem; regular expressions; n-grams; language modelling; NLP pipeline e.g. tokenization -> word segmentation -> part of speech (POS) tagging -> phrase chunking -> parsing -> named entity recognition -> information extraction -> question answering.
- Topics in Information Retrieval (IR) – systems, theory and technologies: Boolean search; textual conflation e.g. stemming and stopword removal; probabilistic IR model; Vector Space IR model; relevance feedback; text categorization and information filtering; information extraction; question answering; IR systems and search engines e.g. Google’s PageRank, IBM’s Web Fountain; search engine optimization; evaluation.
-threshold -Equivalent to 50%.Uses key areas of theory or knowledge to meet the Learning Outcomes of the module. Is able to formulate an appropriate solution to accurately solve tasks and questions. Can identify individual aspects, but lacks an awareness of links between them and the wider contexts. Outputs can be understood, but lack structure and/or coherence. -good -Equivalent to the range 60%-69%.Is able to analyse a task or problem to decide which aspects of theory and knowledge to apply. Solutions are of a workable quality, demonstrating understanding of underlying principles. Major themes can be linked appropriately but may not be able to extend this to individual aspects. Outputs are readily understood, with an appropriate structure but may lack sophistication. -excellent -Equivalent to the range 70%+.Assemble critically evaluated, relevent areas of knowledge and theory to constuct professional-level solutions to tasks and questions presented. Is able to cross-link themes and aspects to draw considered conclusions. Presents outputs in a cohesive, accurate, and efficient manner.