|
Wednesday, 10 January 2007 |
|
Integror is a group of researchers at Department of Information Systems of Poznan University of Economics interested in different levels of data and information extraction and integration on the Web, varying from integrating structured and semi-structured data from information-intensive web sites and Hidden Web sources to the intuitive and semi-automatic visual integration of unstructured content blocks. The foundations of the integration task are innovative and robust formalisms capable of in-page content addressing and navigational paths description.
First of the formalisms is based on relative XPath addressing and displays visibly better stability than state-of-the-art absolute-XPath based schemes. Its application to information integration task made possible creation of myPortal – intuitive, user-friendly and robust application allowing creation of personalized portals based on logical content blocks extracted of pre-defined pages. With myPortal two clicks are enough to create content block extraction rules; thereafter extracted blocks can be composed into integrated information view. The method was proved highly resilient to changes (tested on content form several portals home pages). Several publications (including demonstrations at VLDB and WWW conferences) describe myPortal in more details.
Second of the formalisms – based on FSA description of user navigation and pumping lemma – altogether with relative XPath gave birth to DWDI application allowing integration of data from semi-structural and structural navigation (browsing) and forms based Web sources. With DWDI, recorded user navigation path can be used to create description of Web or Hidden Web source navigation pattern and relative XPath is used to describe location of data blocks on the page.
Current research directions aim at creation of mechanisms enabling more robust and adaptive addressing of different types of Web objects in dynamic environment involving changing Web sites and Web pages structures. They include the use of visual characteristics-based and 2D addressing of content blocks in Web pages, automatic detection of relative XPath reference points, handling of conflicts between multiple addresses of the same Web object as well as work on capability to operate in presence of technical problems (non standard compliant code, 404s, etc.).
Project’s research includes also surveys on the applications and business models of enhanced Web objects addressing, on the nature and topicality of Deep Web sources as well as on the visual and navigational ways of presenting databases content on the Web. Future research plans include also using F-Webs project experience to implement QoS-based Web sources evaluation and selection schemes.
Keywords: information integration, data integration, Hidden Web, Web objects, adaptive content blocks addressing on the Web, XPath, FSAs, Web navigation.
|