Dr. Amit Sheth
Amit P. Sheth
Founding Director, AI Institute@UofSC
NCR Chair and Professor, Computer Science & Engineering
Publications
Publication Impact

Publications


Research Impact (October 2017)


Prof. Sheth has deep, significant, and sustained contributions to the development and use of heterogeneous/federated databases, semantic and knowledge-based techniques for search, browsing, interoperability, integration, and analysis of diverse (text, social, sensor, multimodal) content. This includes knowledge representation and engineering that often involves ontology/knowledge graph development and their use to transform data into wisdom (insight, decision making, action) along the DIKW theme. During the last decade, he also has done broad knowledge-enhanced extraction, NLP, and machine learning.

Database Interoperability/Integration and Database Federations

In the 1980s it became clear that large organizations needed to couple multiple autonomous databases to accomplish certain missions. At that time the problem was not yet well understood from a technical viewpoint. This area of heterogeneous distributed (federated) data management was the first area of Prof. Sheth"s significant professional achievement. Starting in 1987, he gave the first and the most number of tutorials at ICDE, VLDB, SIGMOD, and other major conferences in this area. Prof. Sheth became a leader in developing scientific foundations and architectural principles to address database interoperability. Although proposals for federated databases existed, it was Dr. Sheth who developed a clean reference architecture covered in his best cited paper on federated databases (Federated database systems for managing distributed, heterogeneous, and autonomous databases, 1990, 3517 citations). It provides a comprehensive architecture consisting of a range of tightly coupled (aka global as a view) to loosely coupled (aka local as a view) alternatives to deal with three of the key dimensions: distribution, heterogeneity, and autonomy. He led the development of the first schema integration tool in the USA [A Tool for Integrating Conceptual Schemas and User Views; 1988, 145 citations].

Prof. Sheth analyzed the crucial limitations resulting from the autonomy of the individual databases, recognizing that deep integration was not a viable option. He overcame this dilemma by developing specification models for interdatabase dependencies, allowing just the necessary degree of coupling to ensure global consistency for critical applications ["Specifying interdatabase dependencies in a multidatabase environment," 1991, 157 citations]. With his collaborators Georgakopoulos and Rusinkiewicz, Prof. Sheth developed the first practically viable algorithm, the ticketing method for concurrency control of global transactions that need to see and preserve a consistent state across multiple databases ["On Serializability of Multidatabase Transactions Through Forced Local Conflicts" 1991, 217 citations]. This seminal work, which was recognized with a best paper award at the Data Engineering Conference in 1991, was awarded a patent, and resulted in ground-breaking results on multidatabase transactions by other researchers.

His work continued in the areas of integration/interoperability of networked databases in enterprises to Web-based database access [e.g.,"Changing Focus on Interoperability in Information Systems: From System, Syntax, Structure to Semantics," 1998, 674 citations; "Semantic interoperability in global information systems," 1999, 259 citations]. He has been a leader in characterizing metadata and developing the techniques that extract and use metadata for integrated access to a variety of content ranging from databases to multimedia/multimodal data [e.g., "Multimedia Data Management: Using Metadata to Integrate and Apply Digital Media," 1998, 103 citations; "Metadata for building the multimedia patch quilt," 1996, 72 citations; "Information Brokering Across Heterogeneous Digital Data: A Metadata-based Approach," 2002, 68 citations].

Semantics for Data and the Web

Initially motivated by the need for better integration and interoperability of data, Prof. Sheth introduced the need for semantics and KR to handle semantic/data interoperability, described the need for richer modeling, argued that entities alone are not enough, brought attention to relations ["Attribute relationships: An impediment in automating schema integration," 1989, 78 citations], and used a KL-one class of languages for modeling and reasoning ["On automatic reasoning for schema integration," 1993, 112 citations]. In 1992, he gave an influential keynote titled "So far (schematically) yet so near (semantically)" [317 citations of the related paper; also related: "Semantic and schematic similarities between database objects: a context-based approach"; 1996, 558 citations], which attested to his early proposals of the need for domain-specific semantics and use of ontological representation for richer semantic modeling/KR, and the need for modeling and using context when looking for similarity between objects. All these constitute components of "semantic proximity," an influential and probably the broadest measure of similarity between objects in different databases. Over two decades, Prof. Sheth"s teams in the startups he founded/managed and in academia have developed 50-100 private/commercial and public/open source ontologies for diverse domains (financial, clinical/health, biomedical, etc.), putting him at the forefront of practical ontology design, development, and use in the world. His work spans the gamut of in-depth, handcrafted ontology development (e.g., "Knowledge modeling and its application in life sciences: a tale of two ontologies," 2006,52 citations), ontologies for quality, performance evaluation, and provence ["OntoQA: Metric-based ontology quality analysis," 2005, 258 citations; "SwetoDblp ontology of Computer Science publications," 2007, 82 citations; "Semantic provenance for escience: Managing the deluge of scientific data," 2008, 102 citations], fact/triple extraction from text to enhance/maintain an ontology ["Semantics driven approach for knowledge acquisition from EMRs," 2014, 16 citations], and extensive tooling for populated ontology management (see the next section). His work on using ontologies for information processing encompasses the first discussion and approach to searching for an ontology ("Towards peer-to-peer semantic web: A distributed environment for sharing semantic knowledge on the web", 2002, 86 citations) automated reasoning for schema integration (above), semantic search, other applications (below), and semantic query processing. The latter is significant in terms of the first approach for query transformations involving different ontologies for user queries and resources, federated queries—a concept that is now in vogue in this decade, and associated measures and techniques for computing information loss when traversing taxonomic relationships (e.g., "Semantics-based information brokering," 1994, 93 citations; "OBSERVER: An Approach for Query Processing in Global Information Systems based on Interoperation across Pre-existing Ontologies" 1996, 935 citations).

Transactional Workflows, Distributed Workflow Management, and Semantic Web Services

In workflow management Dr. Sheth was the first to combine transactional technology with a very expressive and mathematically well-founded specification language for defining the control flow and dependencies between activities in complex, long-running workflows. His seminal work applied the temporal logic (CTL) to specify the dynamic behavior of workflows ["Specifying and Enforcing Intertask Dependencies", 1993, 356 citations]. This and other related papers set a new path in that era"s popular research area of advanced transaction models ["Merging application-centric and data-centric approaches to support transaction-oriented multi-system workflows," 1993, 168 citations]. This body of work represented a breakthrough towards a more expressive and rigorously founded generation of process coordination ["Managing heterogeneous multi-system tasks to support enterprise-wide operations," 1995, 285 citations]. A survey paper provided a comprehensive understanding to the field ["An overview of workflow management: From process modeling to workflow automation infrastructure" 1995, 2377 citations].

Prof. Sheth"s key technical areas of contribution in workflow management include adaptive workflow management [A taxonomy of adaptive workflow management, 1998, 168 citations], exception handling [Exception handling in workflow systems, 200, 199 citations], authorization and access control [Authorization and access control of application data in workflow systems, 2002, 115 citations], security, optimization, and quality of service [Workflow quality of service, 2003, 153 citations]. His research matured into very successful system-building projects, including Dr. Sheth's own METEOR system that included perhaps the first robust ORB based [CORBA-based run-time architectures for workflow management systems, 1996, 163 citations] and Web-based distributed workflow systems [WebWork: METEOR2's web-based workflow management system, 1998, 190 citations]. Prototypes of these systems were used in graduate courses worldwide, and a commercial counterpart licensed by him to start Infocosm, Inc., which commercialized it and licensed it to several major companies). This work also strongly influenced industry leading solutions such as Microsoft's BizTalk. Several operational applications ensued in the areas of biomedicine [IntelliGEN: A distributed workflow system for discovering protein-protein interactions, 2003, 107 citations] and healthcare [e.g., "Supporting state-wide immunization tracking using multi-paradigm workflow technology" 1996, 87 citations].

When process-centric applications later became important in the context of Web Services, Prof. Sheth again played a key role in shaping the semantics of process specification languages. Prof. Sheth was one of the earliest and the second most cited authors (following Prof. K. Scara) on the topic of using semantics to improve services interoperability. In a 2003 keynote, he outlined the role of semantics in the entire lifecycle: representation, annotation, discovery, composition, and orchestration/execution that culminated in the development of METEOR-S system, which is likely the best cited semantic Web process or workflow system based on semantic web services ["METEOR-S web service annotation framework," 2004, 639 citations; "Semantic e-workflow composition," 2003, 555 citations; "METEOR-S WSDI: A scalable p2p infrastructure of registries for semantic publication and discovery of web services," 2005, 627 citations; "The METEOR-S approach for configuring and executing dynamic web processes", 2005, 159 citations]. He conceived of WSDL-S ["Adding semantics to web services standards," 2003, 533 citation; "Web Service Semantics: WSDL-S," 2005, 604 citations], and engaged IBM to jointly propose it to the W3C consortium, which subsequently became the W3C recommendation (standard) SAWSDL. He also conceived of and proposed SA-REST ["SA-REST: Semantically interoperable and easier-to-use services and mashups" 2007, 216 citations] and submitted it to W3C. With over 25 publications in 101 to 2294 citations range, he remains among the most cited authors in this area.

Relationships at the Heart of Semantics and the Semantic Web

Prof. Sheth, highly influenced by Vannevar Bush's "As we may think", especially the trailblazing concept, has been one of the most notable researchers on the topic of relationships in the context of diverse data and the Web. This is captured by his keynotes ["Relationships at the heart of semantic web: Modeling, discovering, and exploiting complex semantic relationships," 2003; 191 citations for corresponding paper], and a significant body of publications including that related to the first effort in defining paths and subgraphs in RDF as covered in a series of papers in WWW ["Ranking complex relationships on the semantic web," 2005, 184 citations; "ρ-Queries: enabling querying for semantic associations on the semantic web," 2003, 267 citations; "SemRank: ranking complex relationship search results on the semantic web," 2005, 324 citations; "SPARQ2L: towards support for subgraph extraction queries in RDF databases," 2007, 138 citations] and on finding meaningful complex relationships in graphs ["Context-aware semantic association ranking," 2003, 213 citations; "Discovering informative connection subgraphs in multi-relational graphs," 2005, 103 citations; "A framework for schema-driven relationship discovery from unstructured text," 2006, 70 citations]. He also led the use of the above semantic techniques in numerous scientific and real-world applications (e.g., "An ontology-driven semantic mashup of gene and biological pathway information: Application to the domain of nicotine dependence," 2008, 92 citations). This also included the first demonstration of a real-world application for W3C Semantic Web for Healthcare and Life Sciences in 2003 and the first semantic electronic medical record system that has been operationally deployed since 2005 at the Athens Heart Center ["Active semantic electronic medical records," 2006, 29 citations]. Development of a large medical knowledge graph under his guidance and its use in understanding clinical text in Electronic Medical Records using knowledge graph-enhanced NLP ["Data driven knowledge acquisition method for domain knowledge enrichment in the healthcare" 2013, 11 citations] have also led to the development of computer assisted (ICD10) coding and computerized document improvement application by EZDI (ezdi.com), for which he is a cofounder and the chief technology advisor.

Semantics & Knowledge-empowered Information Extraction/NLP/ML, Search, Browsing & Analysis

A sustained area of Prof. Sheth"s major contributions is in semantic search, browsing, and analysis, where in addition to research, he has had substantial entrepreneurial success and industry impact. In 1993, he initiated InfoHarness, a system that extracted metadata from diverse content (news, software code, and requirements document) that pioneered a Mozilla browser-based faceted search (first presented at WWW2004, also "InfoHarness: Use of automatically generated metadata for search and retrieval of heterogeneous information," 2005, 104 citations). This system transitioned into a product by Bellcore in 1995. Bellcore"s management did not take his advise to market it to the rapidly growing internet/Web market and focused on serving existing "baby Bell" customers (a space later occupied by Excalibur and Verity). This was followed by a metadata-based search engine for a personal, electronic program guide and Web-based videos for a cable set-top box ("VideoAnywhere: a system for searching and managing distributed heterogeneous video assets," 1999, 15 citations). He licensed this technology he developed at a university for Taalee, the established in 1999 as the first Semantic Web technology startup, the same year Tim-Berners Lee coined the term Semantic Web. In the first keynote on Semantic Web given anywhere ["Semantic Web and Information Brokering: Opportunities, Early Commercialization, and Challenges," 2000] he presented Taalee"s commercial implementation of a semantic search engine, covered in-depth in the first ever patent involving Semantic Web technology: "System and method for creating a semantic web and its applications in browsing, searching, profiling, personalization and advertising" (filed in 2000, awarded in 2001, 562 citations). One can observe significant innovations that have since been made mainstream (especially since Google"s introduction of semantic search in 2013). This 1999-2001 incarnation of semantic search (as described in the patent document) started with extensive tooling to create an ontology/WorldModel[™] (today"s Knowledge Graph) to design a schema and then automatically extract (through knowledge extraction agents) and incorporate knowledge from multiple high-quality sources to populate the ontology and keep it fresh. This involves extensive machinery for disambiguation to identify what is new and what has changed. One commercial version (also marketed as MediaAnywhere) covered about 25 subdomains (news/politics, sports, entertainment, etc.). Then the data extraction agents which supported diverse content either pulled (crawled) or pushed (e.g., syndicated news in NewsML), called upon a nine-classifier committee (using bayesian, HMM, and probably the first ever commercially deployed knowledge-based classifiers) to determine the domains of the content, identify the relevant subset of the ontology to use, and perform semantic annotation. "Semantic Enhancement Engine: A Modular Document Enhancement Platform for Semantic Applications over Heterogeneous Content" (2002, 98 citations) is likely the first publication demonstrating the unusual effectiveness of knowledge-based classifiers compared with more traditional ML techniques. The third component of the system utilized ontology and metadata (annotation) to support semantic search, browsing, profiling, personalization, and advertising. This system also supported dynamically-generated a "Rich Media Reference" (aka Google"s InfoBox) that displayed not only metadata about the searched entity pulled from the ontology and metabase but also provided what was termed "blended semantic browsing and querying" ["Managing semantic content for the Web," 2002, 249 citations]. Observing how modern semantic search leveraging knowledge graphs has evolved (attested by the semantic search engines in this decade), I would argue that Prof. Sheth"s realization of Semantic Web has materialized far better than the intelligent agent-centric vision that would do automatic travel arrangements, as portrayed in the very famous Scientific American article (Berners-Lee, et.al., 2001). Complementing the focus on traditional text, Prof. Sheth led efforts in other forms/modality of data, including social and sensor data. He coined the term "Semantic Sensor Web" (2008, 637 citations) and developed the first set of prototypes and applications ["SemSOS: Semantic sensor observation service," 2009, 204 citations], and initiated and co-chaired the W3C effort on Semantic Sensor Networking ["The SSN ontology of the W3C semantic sensor network incubator group," 2012, 767 citations] that resulted in a de-facto standard. Prof. Sheth introduced the concept of semantic perception to reflect the need to convert massive amounts of IoT data into higher level abstractions to support human cognition and perception in decision making, which involves the IneelegO ontology-enabled abductive and deductive reasoning framework for iterative hypothesis refinement and validation ["Semantic perception: Converting sensory observations to abstractions" 2012, 69 citations], and is applying for a growing number of NIH-funded clinical studies for personalized digital health in his kHealth program (currently applied to Asthma in Children, Dementia, and GI surgery).

During the past 10 years, a growing body of his work has also sought to advance text mining, NLP, and machine learning techniques, usually enhanced by use of linguistic, geospatial, domain, and other knowledge, for various issues in understanding and exploiting user generated/social media content (e.g., "Semantic analytics on social networks: experiences in addressing the problem of conflict of interest detection," 2006, 201 citations;"Citizen sensing, social signals, and enriching human experience," 2009, 171 citations; "Harnessing Twitter 'big data' for automatic emotion identification," 2012, 140 citations; "What kind of #conversation is Twitter? Mining #psycholinguistic cues for emergency coordination," 2013, 34 citations; "User interests identification on Twitter using a hierarchical knowledge base," 2014, 66 citations; "Extracting city traffic events from social streams," 2015, 36 citations). Unusual insights derived from social media has even led to a subsequent FDA advisory ["I just wanted to tell you that loperamide WILL WORK": a web-based study of extra-medical use of loperamide" 2013, 59 citations]. Once more, Prof. Sheth has led development of the Twitris technology, licensed it, and commercialized to (co)found his third startup, Cognovi Labs. Prof. Sheth coined now the term "Smart Data" to describe enhancing the value of data in 2004; he has given several keynotes on this topic and this term has now gained extensive use in industry. The first recorded use and definitions of "citizen sensing" (2008) and "semantic perception" (2010 - 2012) are also attributable to Prof. Sheth.

Visions That Drive Future Research

Starting 2008, Prof. Sheth laid out his vision of Computing for Human Experience in several invited talks that takes distinctly human-centric (as opposed to machine centric) view of AI ["Computing for human experience: Semantics-empowered sensors, services, and social computing on the ubiquitous web" 2010, 49 citations]. It advocates technology that serves, assists, and cooperates with humans to nondestructively and unobtrusively complement and enrich normal activities, with minimal explicit concern or effort on the humans' part.

More recently, Prof. Sheth has been combining top-down and, symbolic patch processing with bottom-up and sub-symbolic path processing. His first discussion on this topic appears in "Semantics for the semantic web: The implicit, the formal and the powerful" (2005, 219 citations), algorithmic developments and demonstrations were done in the context of "smart data" and "semantic perception" (mentioned above) and more recently explored in a vision piece "Semantic, cognitive, and perceptual computing: Paradigms that shape human experience" (2016, 5 citations).

Summing It Up

Based on h-index (his is 97), he is and has been among the top 100 Computer Scientists in the world. There are only a handful in this top 100 who have substantial contributions relevant to AI. On aminer.org, a major bibliometric database, he is listed in 1st place on these topics/keywords: Semantic Web, Semantic Technology, Semantic Computing, Semantic Sensor Web, Semantic Information. He lists among top 5 on the topics such as Semantic Social Web, Data Interoperability, and Semantic Web Services. Over 50 of his AI-focused publications appear in WWW, International Semantic Web Conference, VLDB, Web Intelligence, ICWSM, ESWC, IEEE Intelligent Systems, AAAI, AI Magazine, and numerous journals. That he is regarded among the top thinkers in the field is attested by his over 50 keynotes, often at the same venue as those featuring other top leaders (e.g., at IEEE Big Data where Tom Mitchell also presented; at Web Intelligence where a Turing award winner Raj Reddy is also featured). Among his 75+ activities as conference program/general/organization (co)chairs and 230+ program committees include serving as a co-chair of the "Semantic Web" and "Semantics and Knowledge" tracks at WWW and ISWC conferences.
Address
Professor Amit P. Sheth
Artificial Intelligence Institute
Department of Computer Science & Engineering
University of South Carolina
FOAF UofSC