Title: Semantics-based Keyword Search over XML and Relational Databases
Speaker: Prof Ling Tok Wang, School of Computing, NUS, Singapore.
Date: 19 December 2013 (Thursday)
Time: 11:00AM – 12:00 PM
Venue: West Room, 6th Floor, SA, UTAR KL Campus, Setapak, Kuala Lumpur
Date: 20 December 2013 (Friday)
Time: 11:00AM – 12:00 PM
Venue: NF-023, FICT, UTAR Perak Campus, Kampar, Perak.
Contact person: Tay Yong Haur
Current approaches to XML keyword search are structure-based because they mainly rely on the exploration of the structure of XML data. They can be classified as tree-based and graph-based search. The tree-based search is used when an XML document is modeled as a tree, i.e. without ID references (IDREFs), while the graph-based search is used for XML documents with IDREFs. Almost all tree-based approaches are based on some variations of LCA (Least Common Ancestor) semantics such as SLCA and ELCA. Due to the unawareness of real semantics in XML data, these LCA-based approaches suffer from several serious limitations such as meaningless answers, duplicated answers, missing answers, and answers which depend highly on the hierarchical structure of the XML data, etc.
Current approaches to keyword search on relational databases can be classified as data graph based and schema graph based. Data graph based keyword search on relational databases takes a relational database as a data graph where a tuple in the database is represented as a node in the data graph, and a foreign key-key reference between two tuples in the database is represented as an edge between the two nodes which represent the two tuples. An answer of a user keyword query is defined as a minimal connected subgraph which contains nodes that match keywords in the keyword query. This sort of graph search is equivalent to the Steiner tree problem, which is NP-complete. Research in relational keyword search has been focused on the efficient computation of answers from multiple tuples as well as strategies to rank and output the most relevant ones. Existing relational keyword search techniques suffer from the problems of returning incomplete, duplicated, and meaningless answers. Moreover, the resulting answers highly depend on the schema of the relational database and the difficulty of interpreting the intuitive meanings of the returned Steiner trees as answers.
We thoroughly point out mismatches between answers returned and the expectations of common users in keyword search in XML and relational databases. Through detailed analysis of these mismatches, we discovered the main reasons for the mismatches are due to the unawareness of the semantics of object, relationship, and attribute of object/relationship in the databases. We refer to them as ORA-semantics. In particular, unawareness of objects causes missing answers, duplicated answer, and meaningless answers.
In this talk, we will discuss how ORA-semantics can be used to overcome the above-mentioned problems in existing keyword search approaches and how to improve the effectiveness and performance of keyword search.
Prof. Dr. Ling Tok Wang, NUS, Singapore
His current research interests include Database Modeling, Semi-Structured Data Model, XML Twig Pattern Query Processing, XML and Relational Database Keyword Query Processing.
He serves/served on the steering committees of 4 international conferences, including ER and DASFAA. He served as Conference Co-chair of 10 international conferences, including ER 2004, DASFAA 2005, SIGMOD 2007, and VLDB 2010. He served as Program Committee Co-chair of 6 international conferences, including DASFAA 1995 and ER 1998, 2003, and 2011.
He received the ACM Recognition of Service Award in 2007, the DASFAA Outstanding Contributions Award in 2010, and the Peter P. Chen Award in 2011. He is an ER Fellow.