Amit P. Sheth
Founding Director, AI Institute@UofSC NCR Chair and Professor, Computer Science & Engineering |
Publications
On Relationships: Research Review
|
On Relationships-centric Views of Semantics: A Brief Research Review"The process of tying two items together is the important thing." -Vannevar Bush, in his seminal article, As we may think (the Atlantic Monthly, July 1945). He pointed out the inadequacy of contemporary indexing systems at mimicking the "natural" way we as humans seek out information. He stressed that the human mind worked by association. "It [the brain] operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association..." he added. Two decades of our research on semantics of information has been influenced by the importance of relationships as underlined by Dr. Bush and several others visionaries (e.g., William Woods' "What's in a Link" [Woods1975]). We have recognized that relationships are at the heart of semantics [Sheth2002], observed the changing focus from documents to entities and on to relationships [Sheth2003], and have investigated broad variety of issues related to modeling, validating, discovering and exploiting various types of relationships between entities in content [Sheth+2003].
Our earliest focus on relationships was in terms of mappings to deal with semantic heterogeneity to achieve semantic interoperability and schema integration [Sheth-Larson1990, Sheth+1988, BERDI]. Over a decade ago, we introduces a comprehensive definition of semantic proximity [Sheth-Kashyap 92, Kashyap-Sheth 96] to address the difficulty in modeling a notion that is also termed semantic similarity or semantic distance. Semantic metadata is key to both the semantic Web and techniques that support semantic relationships. An early work on domain specific metadata annotation and search, which led to the earliest commercial product of this type (from Bellcore in 1995) came from our InfoHarness systems [Shklar+1995] [Shah-Sheth99].
In most of the examples in his article Dr. Bush describes what we term as implicit relationships. We believe that there is a lot of merit in the use of explicit named relationships for the purpose of resource organization. Extracting such relationships from documents has received considerable interest in the field of computational linguistics and information retrieval. Despite a few decades of research this problem remains a very hard problem to solve. One outcome of our research thus far is our ability to create large instance bases for ontologies from multiple trusted sources [Sheth2004]. We have created many populated ontologies in academic and commercial settings in which schemas are populated with corresponding knowledge bases containing multi-million entity instances and relationship instances linking these entities (some of these are being made publicly available, e.g., SWETO, GlycO, ProPreO, while the technologies such as Semagix Freedom have been use to create focused domain ontologies with as many as 14 million entity instances). Such populated ontologies significantly enhance our ability to identify potential relationships from unstructured data as well as from distributed information resources. Several voices have questioned the viability, reliance and adequacy of formal ontologies. One perspective is represented by Google's Peter Norvig (see AOblog) that questions viability of the Semantic Web because creating domain ontologies is considered as impractical, which we have rebutted (see ShethBlog).. An insightful view, that of (in-)adequacy of crisp logics for question-answering system (which demands far more than Web search engines) is that of Lotfi Zadeh [Zadeh2003]. More recently, we have explored the dimension of implicit, formal and powerful semantic representations [Sheth+2005b], and investigated uses of formal and semi-formal ontologies [Sheth-Ramakrishnan2003]. When focusing on deep domain semantics in biology (e.g. GlycO and ProPreO) we explored the need to have more expressive probabilistic modeling of relationships, involved semantic annotation of scientific experiment data (in addition to textual) and integrated discovery over scientific literature and scientific experiment results. Amit Sheth, September 2005 |
|||