CCIS Research Blog: CCIS Talk: Semantics-based Keyword Search over XML and Relational Databases by Prof Ling Tok Wang

CCIS invites all to attend the following talks by Prof Ling Tok Wang:

Title: Semantics-based Keyword Search over XML and Relational Databases
Speaker: Prof Ling Tok Wang, School of Computing, NUS, Singapore.

Date: 19 December 2013 (Thursday)
Time: 11:00AM – 12:00 PM
Venue: West Room, 6th Floor, SA, UTAR KL Campus, Setapak, Kuala Lumpur

Date: 20 December 2013 (Friday)
Time: 11:00AM – 12:00 PM
Venue: NF-023, FICT, UTAR Perak Campus, Kampar, Perak.

Admission: FREE
Contact person: Tay Yong Haur

Abstract

Keyword searches on XML and relational databases, as opposed to traditional structured query, have been widely studied in recent years. Users are freed from learning the query languages and database schemas by simply issuing some keywords to query the databases. Unlikely traditional structured query languages which can represent user query request precisely, a keyword query only contains some keywords which may have different interpretations and cannot capture the user’s intention precisely.

Current approaches to XML keyword search are structure-based because they mainly rely on the exploration of the structure of XML data. They can be classified as tree-based and graph-based search. The tree-based search is used when an XML document is modeled as a tree, i.e. without ID references (IDREFs), while the graph-based search is used for XML documents with IDREFs. Almost all tree-based approaches are based on some variations of LCA (Least Common Ancestor) semantics such as SLCA and ELCA. Due to the unawareness of real semantics in XML data, these LCA-based approaches suffer from several serious limitations such as meaningless answers, duplicated answers, missing answers, and answers which depend highly on the hierarchical structure of the XML data, etc.

Current approaches to keyword search on relational databases can be classified as data graph based and schema graph based. Data graph based keyword search on relational databases takes a relational database as a data graph where a tuple in the database is represented as a node in the data graph, and a foreign key-key reference between two tuples in the database is represented as an edge between the two nodes which represent the two tuples. An answer of a user keyword query is defined as a minimal connected subgraph which contains nodes that match keywords in the keyword query. This sort of graph search is equivalent to the Steiner tree problem, which is NP-complete. Research in relational keyword search has been focused on the efficient computation of answers from multiple tuples as well as strategies to rank and output the most relevant ones. Existing relational keyword search techniques suffer from the problems of returning incomplete, duplicated, and meaningless answers. Moreover, the resulting answers highly depend on the schema of the relational database and the difficulty of interpreting the intuitive meanings of the returned Steiner trees as answers.

We thoroughly point out mismatches between answers returned and the expectations of common users in keyword search in XML and relational databases. Through detailed analysis of these mismatches, we discovered the main reasons for the mismatches are due to the unawareness of the semantics of object, relationship, and attribute of object/relationship in the databases. We refer to them as ORA-semantics. In particular, unawareness of objects causes missing answers, duplicated answer, and meaningless answers.

In this talk, we will discuss how ORA-semantics can be used to overcome the above-mentioned problems in existing keyword search approaches and how to improve the effectiveness and performance of keyword search.

Speaker

Prof. Dr. Ling Tok Wang, NUS, Singapore

Dr Ling Tok Wang is a professor in Computer Science at the National University of Singapore. He was Head of IT Division, Deputy Head of the Department of Information Systems and Computer Science, and Vice Dean of the School of Computing. He received his PhD and M.Math, both in Computer Science, from University of Waterloo (Canada), and BSc in Mathematics from Nanyang University (Singapore).

His current research interests include Database Modeling, Semi-Structured Data Model, XML Twig Pattern Query Processing, XML and Relational Database Keyword Query Processing.

He serves/served on the steering committees of 4 international conferences, including ER and DASFAA. He served as Conference Co-chair of 10 international conferences, including ER 2004, DASFAA 2005, SIGMOD 2007, and VLDB 2010. He served as Program Committee Co-chair of 6 international conferences, including DASFAA 1995 and ER 1998, 2003, and 2011.

He received the ACM Recognition of Service Award in 2007, the DASFAA Outstanding Contributions Award in 2010, and the Peter P. Chen Award in 2011. He is an ER Fellow.