Research Group for Applied Software Engineering
Forschungsgruppe für Angewandte Softwaretechnik

Typ: Master-Seminar/ Hauptseminar (IN2107, IN8901)
Semesterwochenstunden: 2+0

Managing unstructured data - Mining databases and the web (WS 10/11)

Lecturers

Teaching Assistants

  • Emitza Guzman
  • Tobias Röhm

Inhalt

Aufgrund der Ausbreitung des Internets werden immer gr??ere Datenmengen erzeugt und gespeichert, die nicht strukturiert und statisch sondern heterogen und dynamisch sind. Das Problem, interessante und relevante Daten und Zusammenh?nge aus diesen Datenmengen zu gewinnen wird durch die Masse an Daten und der Heterogenit?t in der Struktur zu einer immer gr??eren Herausforderung f?r Unternehmen, Organisationen und Einzelpersonen.
Thema dieses Seminars sind Methoden zur L?sung dieses Problems aus verschiedenen Disziplinen wie Data Mining, Maschinellem Lernen, Business Intelligenz und Semantic Web.
Betrachtet wird dabei die Repr?sentation von Daten wie z.B. relationale Datenbanken, XML, RDF und Ontologien sowie die Inferenz- und Deduktionsmechanismen, die auf diesen Daten arbeiten und Informationen und Zusammenh?nge ableiten. Beispiele f?r solche Mechanismen sind Klassifikationsalgorithmen und Web-Suche.

Summary

Due to the usage of the Internet huge amounts of data are produced and stored which are not structured and static but rather heterogenous and dynamic. The problem of extracting relevant data and interrelations from those amounts of data becomes more and more challenging for companies, organizations and individuals due to the large amount and the heterogenity in structure.
This seminar covers possible solutions to this problem from different disciplines like data mining, machine learning, business intelligence and the semantic web.
We will examine the representation of data like, e.g., relational databases, XML, RDF and ontologies as well as inference and reasoning mechanims operating on the data with the goal of deducting information and interrelations. Examples for such mechanisms are classification algorithms and web search.

Organizational Issues

  • Presentations will take place on 31.01.11 (TUM) and 01.02.11 (TNS Infratest)
  • Talks will be held in English language, report can be written in German or English language
  • Preliminary talk took place on 26.10. There are still a few topics available (see below), please contact roehm [AT] in.tum.de
  • max. 12 participants

Modalities

You will get a certificate with a grade based on the following criteria:
  • Ability to do independent research
  • Oral presentation (30 minutes)
  • Written term paper (10 pages text)
  • Active participation at all the other presentations (compulsory attendance)

Topics


DatePresenterTopicRequired Readings

Data Warehousing and Business Intelligence


Benedikt Lell

Data Warehouses:
ETL, Data Cubes/ cube operator

Kemper: Datenbanken,
tba

Data Mining & Machine Learning

Aparna Halder Clustering:
K Means, Hierarchical clustering, Probability-based clustering

Witten: Data Mining
tba

Classification 1:
Naive Bayes, Bayesian Networks

Witten: Data Mining,
tba

Thomas Varsa

Classification 2:
Classification Rules, Decision Trees

Witten: Data Mining,
tba

Classification 3:
K
nearest neighbour, Support Vector Machines

Witten: Data Mining,
tba

Regression:
Linear regression, Neural nets

Witten: Data Mining,
tba


Correlation:
Association rules, Linear correlation

Witten: Data Mining,
Runkler: Data Mining,
tba

Web Mining & Semantic Web
Gregor Schneider

Information Retrieval & Web Search:
Pre-Processing, Indexing, Web Crawling, Web Search

Liu: Web Data Mining,
tba
Felix Menzel

Web Mining 1:
Link Analysis (PageRank, ...)

Liu: Web Data Mining,
tba
Yuliya Zhilinskaya

Web Mining 2:
Opinion Mining


Liu: Web Data Mining,
tba
Benjamin Derwell

Web Mining 3:
Web usage mining


Liu: Web Data Mining,
tba
Nigar Gurbanova

Semantic Web 1:
Semantics, Knowledge representation, RDF, Taxonomies


Semantic Web Primer,
tba
Mario Guma

Semantic Web 2:
Knowledge representation with ontologies, OWL, DL Inference


Semantic Web Primer,
tba

Learning of ontologies using machine learning techniques

tba