Vorlesungsbeschreibung Wahlpflicht: WPM Social Networks and Sentiment Analysis
The main objective of this course is to increase student awareness of the fundamental principles of extracting knowledge from unstructured and poorly formalized data sets. This course is designed as a general introductory level course for all students who are interested in Opinion Mining and Sentiment Analysis, as well as Social Network and Social Behavior Analysis. The main sources for knowledge mining will be textual Internet content as well as different types of relationships within Social Networks
Learning goals: students are expected to understand conceptually and choose appropriate advanced algorithms and technical solutions for knowledge extraction to apply in real practical tasks, namely:
- to mine and represent the textual Internet content (opinions, reviews, messages, comments etc.) in structured format;
- to build a hierarchical structure of Topics described in the analyzed textual Corpus;
- to extract the semantically meaningful words (keywords) and words collocations for each Topic;
- to perform the Clustering of texts on the basis of their contextual (semantic) similarity;
- to conduct the Sentiment analysis of texts;
- to formalize and present different types of relationships as a Social Network;
- to understand the structure and main characteristics of the whole analyzed Social Network as well as the specific roles of each of its actors;
- to conduct the structural and content analysis of Social Networking Sites.
- Social Network Analysis: Social Networks in real life. Basic concepts of Social Network Analysis. Network Centrality Measures Based. Communities Detection Algorithms. Bipartite Networks.
- Introduction to Text mining: Defining the Text Mining. Main differences between the Text mining and Natural Language Processing. Text Mining application domains. Corpora. Bag of words representation of text. Vector Space Model.
- Methods/techniques for text pre-processing: Tokenization. Normalization. Zipf’s law understanding. Stop words list creating. Stop words removing. Stemming and Lemmatization. Part-of-speech tagging.
- Vector Space Model and Corpora representation: Document Term Matrix. Binary Weights. Term Frequency. Inverse Document Frequency. TF-IDF transformation.
- Text clustering: Text Clustering Applications. Similarity Measure for Text Mining. Euclidian distance. Hierarchical clustering. k-means clustering. Multidimensional Scaling (MDS). Cosine Similarity. Social Network theory in Text Mining: Adjacency Matrix, Cosine Similarity as a Weight of Graph Edges, Community Detection Algorithms within the Cosine Similarity Graph.
- 6. Topic modeling: Discriminant and Probabilistic Methods. Dimensionality Reduction & Latent Semantic Analysis (LSA). Singular Value Decomposition (SVD). LSA-based Similarity Search. Latent Dirichlet Allocation (LDA). Hierarchical topical structure modeling.
Robert A. Hanneman, Mark Riddle. Introduction to Social Network Methods. faculty.ucr.edu/~hanneman/nettext/
M. E. J. Newman. The structure and function of complex networks arxiv.org/pdf/cond-mat/0303516.pdf
Kate Ehrlich, Inga Carboni. Inside Social Network Analysis ppr.cs.dal.ca/sraza/files/social%20networks(1).pdf
Social Network Analysis Theory and Applications train.ed.psu.edu/WFED-543/SocNetTheoryApp.pdf
Margot Phaneuf. The sociogram, a complementary tool to the genogram and a means of enriching the interview www.infiressources.ca/fer/Depotdocument_anglais/ The_sociogram.pdf
David Easley, Jon Kleinberg. Networks, Crowds, and Markets. www.cs.cornell.edu/home/kleinber/networks-book/ networks-book.pdf
Christopher D. Manning. Prabhakar Raghavan, Hinrich Schütze. An Introduction to Information Retrieval. Cambridge University Press Cambridge, England, 2009. (http://nlp.stanford.edu/IR-book/html/htmledition/contents-1.html)
Daniel Jurafsky & James H. Martin. Speech and Language Processing. Copyright 2015. All rights reserved. Draft of August 24, 2015.(https://web.stanford.edu/~jurafsky/slp3/19.pdf)
Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to Latent Semantic Analysis. Discourse Processes, 25, 259-284 (http://lsa.colorado.edu/papers/dp1.LSAintro.pdf)
Scott Deerwester, Susan T. Dumais, Richard Harshman. Indexing by Latent Semantic Analysis (http://lsa3.colorado.edu/papers/JASIS.lsi.90.pdf)
Scott Deerwester; Susan T Dumais; George W Furnas; Thomas K Landauer; Richard. Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science (1986-1998); Sep 1990; 41, 6. (http://www.cob.unt.edu/itds/faculty/evangelopoulos/dsci5910/ LSA_Deerwester1990.pdf)
Dian I. Martin, Michael W. Berry. Mathematical Foundations Behind Latent Semantic Analysis (http://mall.psy.ohio-state.edu/LexicalSemantics/MartinBerry2006.pdf)
Alex Thomo. Latent Semantic Analysis (Tutorial) (http://www.engr.uvic.ca/~seng474/svd.pdf)
David Tobinski, Oliver Kraft. Latent Semantic Analysis as Method for Automatic Question Scoring (http://ceur-ws.org/Vol-1100/paper9.pdf)
Barbara Rosario. Latent Semantic Indexing: An overview. INFOSYS 240 Spring 2000 Final Paper (http://www.cse.msu.edu/~cse960/Papers/LSI/LSI.pdf)
Latent Semantic Indexing (LSI) An Example (taken from Grossman and Frieder’s Information Retrieval, Algorithms and Heuristics ) (http://www1.se.cuhk.edu.hk/~seem5680/lecture/LSI-Eg.pdf)
Cluster analysis: Basic concepts and algorithms. (http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf
- Lectures,
- Workshops,
- Small Research Projects
English
Projects and Presentation
6
(180 h = 60 h Präsenz- und 120 h Eigenstudium)