WEBIST 2011 Abstracts


Area 1 - Web Security

Short Papers
Paper Nr: 96
Title:

EMPLOYING MULTI-CORE PROCESSOR ARCHITECTURES TO ACCELERATE JAVA CRYPTOGRAPHY EXTENSIONS

Authors:

Mario Ivkovic and Thomas Zefferer

Abstract: For many years, the increase of clock frequencies has been the preferred approach to raise computational power. Due to physical limitations and cost-effectiveness reasons, hardware vendors were forced to change their strategy. Instead of increasing clock frequencies, processors are nowadays supplied with a growing number of independent cores to increase the overall computational power. This major paradigm shift needs to be considered in software design processes as well. Software needs to be parallelized to exploit the full computing power provided by multi-core architectures. Due to their intrinsic computational complexity, cryptographic algorithms require efficient implementations. On multi-core architectures this comprises the need for parallelism and concurrent execution. To meet this challenge, we have enhanced an existing JavaTM based cryptographic library by parallelizing a subset of its algorithms. Made measurements have shown speed-ups from 1.35 up to 1.78 resulting from the applied modifications. In this paper we show that regardless of their complexity, several cryptographic algorithms can be parallelized to a certain extent with reasonable effort. The applied parallelization of the JavaTM based cryptographic library has significantly enhanced its performance on multi-core architectures and has therefore made a valuable contribution to its sustainability.
Download

Paper Nr: 100
Title:

(role)CAST: A FRAMEWORK FOR ON-LINE SERVICE TESTING

Authors:

Guglielmo De Angelis, Antonia Bertolino and Andrea Polini

Abstract: Engineering of service-oriented systems is still an immature discipline. Traditional software engineering approaches do not adequately fit development needs arising in this widely adopted paradigm. In particular, because of dynamic service composition, several engineering activities typically performed off-line (i.e., predeployment) have to be carried on also on-line (i.e., during real usage). In this paper, we present a framework called (role)CAST which supports an instantiation of the concept of on-line testing of services, for the purpose of validating their compliance to role-based service access policies.
Download

Paper Nr: 154
Title:

THE IMPACT OF SOCIAL NETWORKS ON USER PRIVACY - What Social Networks Really Learn about their Users!

Authors:

Steffen Ortmann and Peter Langendörfer

Abstract: Millions of users voluntarily release private and business data at community platforms without considering potential impacts on their real lives that may come along with that. Being used for personalized advertisement or profiling, user data are of utmost importance for economic success of the platform. Hence, platform providers exploit all promising options to gather data while privacy seems partially to be a pain for them. Beside data voluntarily released by the user, there are techniques and methods to secretly gather more user data, e.g., by proper fusion of miscellaneous information such as analysis of websites visited or social games played. In this article we investigate obvious as well as concealed data gathering options of platform providers. By that we uncover the true detailedness of user data collected by social networks to document our key message, i.e., social networks know EVERYTHING about their users. Finally, we discuss why existing privacy protecting solutions cannot stand up with the threats and risks resulting from easygoing use of social networks.
Download

Area 2 - Internet Technology

Full Papers
Paper Nr: 92
Title:

A GRAPH DECOMPOSITION APPROACH TO WEB SERVICE MATCHMAKING

Authors:

S. Lagraa, H. Seba, R. Khennoufa and H. Kheddouci

Abstract: Web service discovery is becoming a critical issue in several fields. The current approaches for service discovery and mainly semantic web services such as OWL-S are limited primarily to the matching of their inputs/outputs at the level of an ontology. Recent studies show that this approach does not suffice to discover web services and that the structure of web services has an important and necessary weight in the efficiency of the matching. The structure of a web service can be represented by a graph. The problem of matching semantic web services is then translated into a problem of graph matching. In this work, we propose a matching approach that first decomposes the web service graph into more simple sub-structures then, the matching process is achieved onto these substructures. The proposed algorithms achieve better time complexity than existing ones. We also propose a semantic similarity to enhance our structural similarity.
Download

Paper Nr: 93
Title:

CHARACTERIZING DISTRIBUTED XML PROCESSING - Moving XML Processing from Servers to Networking Nodes

Authors:

Yoshiyuki Uratani, Hiroshi Koide, Dirceu Cavendish and Yuji Oie

Abstract: This study characterizes distributed XML processing on networking nodes. XML documents are sent from a client node to a server node through relay nodes, which process the documents before arriving at the server. When the relay nodes are connected tandem, the XML documents are processed in a pipelining manner. When the relay nodes are connected parallel, the XML documents are processed in a parallel fashion. Well-formedness and grammar validation pipelining and parallel processing characterization reveals inherent advantages of the parallel processing model.
Download

Paper Nr: 124
Title:

COFOCUS - Compact and Expanded Restful Services for Mobile Environments

Authors:

Li Li and Wu Chou

Abstract: In this paper, we present an approach to enable collaborative endpoint network for mobile phones. In particular, we expose functions on mobile phones as REST web services that make mobile phones as web service providers for rapid integration with communication and collaboration applications. Because mobile phones have distinct features and constraints, this paper describes a lightweight and efficient protocol, Compact HTTP, which consists of a small subset of HTTP 1.1 to reduce the footprint of REST services. We expand bindings of HTTP to multiple messaging protocols, including SMS and XMPP, and make the REST services invariant to network and protocol changes. These expanded bindings enforce asynchrony into REST, a desired property for many communication and collaboration services. Furthermore, HTTP over XMPP described in our approach introduces the concept of hyperlink presence in collaboration, and it is used to mitigate the broken link issue which is critical in mobile environments. To provide end-to-end message security, a symmetric key based security scheme is described for service authentication and authorization. A prototype system based on the proposed approach is developed that allows both local operators and remote directors to control and monitor the camera, camcorder, location, telephony, motion, power, etc. on Android phones in a secure manner. Experimental results indicate that the proposed approach is feasible, lightweight, and has satisfactory performance.
Download

Paper Nr: 127
Title:

GRAPH-BASED MANAGEMENT OF COMMUNITIES OF WEB SERVICE REGISTRIES

Authors:

Olfa Bouchaala, Mohamed Sellami, Walid Gaaloul, Samir Tata and Mohamed Jmaiel

Abstract: Web services discovery represents a prominent step in the development process of B2B applications. In such environment, the number of Web services as well as registries that are made available for use can be as many as the large number of companies. Thus, the Web services discovery process will be time consuming. To deal with this issue, one obvious solution is to group Web service registries into communities. However, this solution can raise a managing issue since registries and communities are dynamic by nature. Our major contribution in this paper is an approach for managing registries and communities to reconcile conflicts result of the dynamic change aspect of Web service registries.
Download

Paper Nr: 132
Title:

GUARANTEEING STRONG (X)HTML COMPLIANCE FOR DYNAMIC WEB APPLICATIONS

Authors:

Paul G. Talaga and Steve J. Chapin

Abstract: We report on the embedding of a domain specific language, (X)HTML, into Haskell and demonstrate how this superficial context-free language can be represented and rendered to guarantee World Wide Web Consortium (W3C) compliance. Compliance of web content is important for the health of the Internet, accessibility, visibility, and reliable search. While tools exist to verify web content is compliant according to the W3C, few systems guarantee that all dynamically produced content is compliant. We present CH-(X)HTML, a library for generating compliant (X)HTML content for all dynamic content by using Haskell to encode the non-trivial syntax of (X)HTML set forth by theW3C. Any compliant document can be represented with this library, while a compilation or run-time error will occur if non-compliant markup is attempted. To demonstrate our library we present examples and performance measurements.
Download

Paper Nr: 141
Title:

ENHANCING XML-CACHE EFFICIENCY BY XML COMPRESSION

Authors:

Stefan Böttcher, Lars Fernhomberg and Rita Hartel

Abstract: Whenever a client with limited resources frequently has to retrieve large parts of a huge XML document that is stored on a remote web server, data exchange from the server to the client in combination with restricted bandwidth may become a serious bottleneck. We present an approach that combines the advantages of caching with the advantages of query-capable and updatable XML compression. We provide a performance evaluation that demonstrates that the combination of the two techniques – caching and compression – yields a benefit in terms of less data volume to be transferred as well as in terms of a reduced answer time for the user. The performance evaluation demonstrates that combining both technique yields an even stronger benefit than each technique alone.
Download

Short Papers
Paper Nr: 73
Title:

INTEGRATION OF VARIOUS IT SYSTEMS AND SENSOR INFORMATION FOR THE HANDLING OF RFID-ENABLED CATERING GOODS IN THE AVIATION DOMAIN

Authors:

Falk Scheiding, Kresimir Vidackovic, Felix Tönnies and Reinhard Becker

Abstract: The paper describes the main results of a research project as a use case for cross-company system integration in the airline catering domain. Especially the connections of IT systems of different process participants and sensor information along the catering supply chain as well as the replacement of paper-based lists are among the achievements. Today, airline caterers suffer the lack of information and the laborious data transfer to their corresponding IT systems. On the other side, airlines have no overview about the quality of the catering process. Besides, they also have a lack of information concerning important facts like the stock of trolleys or high-value catering goods. The paper shows a solution for the integration and better analysis options by using RFID tags on the catering trolleys, middleware software and XML files for the information flow and a monitoring dashboard to visualize up-to-date statistics for the management of resources.
Download

Paper Nr: 91
Title:

LazyDOM - Transparent Partial DOM Loading and Unloading for Memory Restricted Environments

Authors:

Daniel Peintner, Richard Kuntschke, Jörg Heuer and Harald Kosch

Abstract: Processing XML documents using the Document Object Model (DOM) usually requires loading the entire document into an in-memory DOM prior to processing. Since the in-memory size of a DOM generally is a multiple of the original XML document size, the resulting DOM often consumes a lot of memory and might not even fit into the available memory on memory restricted devices. The LazyDOM approach presented in this paper divides a DOM into XML fragments and loads or unloads these fragments transparently on demand during processing. Thus, the LazyDOMonly loads the parts of a DOMthat are actually currently needed by an application and unloads them if they are no longer required and memory needs to be freed for other processing tasks. Besides enabling DOM-based processing of large XML documents on memory restricted devices, this approach is able to reduce the amount of memory required for DOMprocessing at any given time and can also increase the performance of DOM loading if only parts of a DOM are actually needed by an application.
Download

Paper Nr: 135
Title:

DIVERSITY OF THE MASHUP ECOSYSTEM

Authors:

Michael Weiss and Solange Sari

Abstract: Mashups allow users to develop applications from a variety of open APIs. The creation of mashups is supported by a complex ecosystem of interconnected data providers, mashup platforms, and users. A sign of a healthy ecosystem is that the number and diversity of APIs and mashups in the ecosystem increases continuously. In this paper, we describe a model of the evolution of the mashup ecosystem that allows us to estimate the diversification of the mashup ecosystem over time. In this model we show the evolutionary relationships between mashups as branches in a phylogenetic tree. We discuss how the diversification rate of the mashup ecosystem can be estimated by fitting this tree to a birth-death process model. The results of our research show that the diversity of the mashup ecosystem is increasing with time, however, not monotonically.
Download

Paper Nr: 10
Title:

W3C WIDGETS - A Solution for Implementing Platform-independent Mobile Applications

Authors:

Stefan Schäfer, Stefan Christmann and Svenja Hagenhoff

Abstract: The mobile internet is very heterogeneous at all levels - this is true for end devices as well as for operating systems and runtime environments of mobile applications. Therefore all software providers face the challenge of developing platform-independent applications and making them usable on a maximum number of end devices. One possible way of doing this is provided by Widgets - small software programs with a limited functional scope that are executed by a Widget-engine. This article looks at the state of development of the most important W3C Widget-standards and analyses their prospects of success.
Download

Paper Nr: 20
Title:

WEB WORKLOAD GENERATORS - A Survey Focusing on User Dynamism Representation

Authors:

Raúl Peña-Ortiz, Julio Sahuquillo, José A. Gil and Ana Pont

Abstract: The evolution of the World Wide Web from hypermedia information repositories to hypermedia distributed applications and services oriented architectures (SOA) has introduced new features in the current and incoming web. An important feature is the dynamism of its contents and services, which induces the dynamism of the client behavior. This feature represents a major constrain when modeling and generating current web workload. In this paper, we first review the state of the art for web workload generation, focusing on the approach based on workload models. After that, we analyze a representative subset of the state of the art workload generators that use workload models, concentrating on those model characteristics that represent dynamism in the workload generation. The study reveals that five generators present some capabilities to reproduce this dynamism, but only the GUERNICA approach improves the dynamic workload generation by using users’ behavior models. Finally, we discuss GUERNICA and describe how it generates dynamic workload by addressing both user behavior and workload distribution.
Download

Paper Nr: 28
Title:

THE MVC-WEB DESIGN PATTERN

Authors:

Ralph F. Grove and Eray Ozkan

Abstract: The Model-View-Controller design pattern is cited as the architectural basis for many web development frameworks. However, the version of MVC used for web development has changed as it has evolved from the original Smalltalk MVC. This paper presents an analysis of those changes, and proposes a separate Web-MVC pattern that more accurately describes how MVC is implemented in web frameworks.
Download

Paper Nr: 64
Title:

INTERFACE EXPRESSIONS MONITORING FOR BPEL PROCESSES

Authors:

Wassim Jendoubi, Guy Tremblay and Aziz Salah

Abstract: In this paper, we show how Web services descriptions can be extended with simple declarative behavior specification using interface expressions, a form of regular expressions that describe the possible sequences of externally observable events that a WS-BPEL process can perform. We describe how a concrete (executable) WS-BPEL process can be monitored with respect to such interface expressions, to ensure that it satisfies its associated abstract specification or to detect the occurrence, or non-occurrence, of some particular sequences of events. More specifically, we describe the implementation of such a run-time monitor, called BPEL.RPM, that uses the Open ESB BPEL service engine.
Download

Paper Nr: 84
Title:

IDENTIFYING CONTEXT SOURCES TOWARDS CONTEXT-AWARE ADAPTED WEB SERVICES

Authors:

Georgia M. Kapitsaki

Abstract: Context-awareness refers to the ability of services and applications to proactively adapt their behavior to the characteristics of the execution environment, such as weather conditions, location, etc., namely context. Web services as the most popular implementation of service-oriented applications are usually exploited in this field. During the development process of such services an interesting challenge lies in identifying reusable context properties or services that constitute potential context sources for the adaptation to context. In this paper a solution for this context source matchmaking is proposed. The identification is performed by matching the WSDL specifications participating either as business services or as context sources. The procedure is demonstrated through an evaluation exploiting descriptions from online service registries.
Download

Paper Nr: 85
Title:

MODEL-DRIVEN APPLICATION DEVELOPMENT ENABLING INFORMATION INTEGRATION

Authors:

Georgios Voulalas and Georgios Evangelidis

Abstract: Interoperability is the capability of different software systems to exchange data via a common set of exchange formats. Interoperability between two products is often developed post-facto, due to insufficient adherence to standardization during the software design process. In this paper we present a mechanism that enables the dynamic communication of different software systems at database level, based on the principles of the Enterprise Information Integration architectural framework. The mechanism is built on the top of a database schema (meta-model) and extends the framework we elaborate on for the dynamic development and deployment of web-based applications.
Download

Paper Nr: 89
Title:

AN INTEGRATED E-RECRUITMENT SYSTEM FOR CV RANKING BASED ON AHP

Authors:

Evanthia Faliagka, Konstantinos Ramantas, Athanasios Tsakalidis, Manolis Viennas, Eleanna Kafeza and Giannis Tzimas

Abstract: In the last decades the explosion of Information and Communication Technologies has led to a whole new scenario concerning peoples’ accessibility to new job opportunities and companies’ options for employing the right person for the right job. But, is there a way to exploit today’s technological advances as well as people’s web presence in order to achieve this goal? In this work we present a set of techniques that makes the whole recruitment process more effective. We have implemented a system that models the candidate’s CVs in HR-XML, and ranks the candidates based on AHP (Analytic Hierarchy Process). Finally, it presents the results to the recruiter who evaluates the top candidates and takes the final decision.
Download

Paper Nr: 95
Title:

CLASSIFICATION AND MODELLING OF WEB TECHNOLOGIES

Authors:

Mark Wallis, Frans Henskens and Michael Hannaford

Abstract: The World Wide Web is a constantly changing environment in which academia, industry and interest groups participate to innovate and design the next evolution of online user interaction. The ad-hoc nature in which new web-based systems and technologies have been developed has led to an increasingly diverse environment, with ill defined interactions and fuzzy classification systems. Recently, business pioneers in the industry have attempted to classify web applications into large groupings based on certain key characteristics. The high-level taxonomy presented in this paper provides a way to scientifically classify web applications. By classifying applications and studying the progression from one classification to the next, predictions can be made as to the direction of future web application development. After presenting a formal classification model this research discusses how this model can be used to compare existing web technologies and design the next generation of the World Wide Web.
Download

Paper Nr: 136
Title:

A DISTRIBUTED SELF-HEALING ARCHITECTURE SUPPORTING WS-BASED APPLICATIONS

Authors:

Francisco Moo-Mena, Fernando Curi-Quintal, Juan Garcilazo-Ortiz, Luis Basto-Díaz and Roberto Koh-Dzul

Abstract: A Self-healing infrastructure allows to observe the behavior of a system, determining its health status, and applying measures to restore the correct state of the application. In recent years our work has focused on the design and implementation of Self-healing architectures, which support applications based on Web Services (WS). This paper presents an improved Self-healing architecture, which proposes a distributed control in its components. The results obtained by applying this new Self-healing architecture to a distributed Digital Library application show a trend towards a better availability of resources.
Download

Paper Nr: 147
Title:

XBRL AND THE MULTIDIMENSIONAL DATA MODEL

Authors:

Ignacio Santos and Elena Castro

Abstract: Over the past ten years, there has been a significant increasing of the development of XML and Data Warehouse (DW) applications, and, in turn, more and more applications need to interact, and with different technologies. In parallel, the economic data in the last ten years have also evolved, increasingly companies and financial institutions need more information, in addition, this information must be reliable and on time. Nowadays, it is taking a significant rise for XBRL standard, based on XML. This language is mainly used in accounting reports and this consists of a set of taxonomies, which define different accounting regulations of a specific report. XBRL is becoming a global de facto standard. XBRL reports are created from various sources and are validated at source, so that, this is syntactically correct. XBRL represents business information, and it is multidimensional, and therefore the logical destination is a DW. This paper aims to analyze the data model of XBRL and its semantics, and how to map this data model to the Multidimensional Data Model (Conceptual Model) and in turn to the Logical Model, either ROLAP (Relational OLAP), MOLAP (Multidimensional OLAP), or HOLAP (Hybrid OLAP), so they can be analyzed by business users.
Download

Paper Nr: 151
Title:

INTEGRATING ASYNCHRONOUS COMMUNICATION INTO THE OSGI SERVICE PLATFORM

Authors:

Marc Schaaf, Volker Ahlers, Arne Koschel, Irina Astrova, Ahto Kalja, David Bosschaert and Roman Roelofsen

Abstract: OSGi is a popular Java-based platform that was originally intended for embedded systems. But today OSGi is used more and more in enterprise systems. To fit this new application area, OSGi is continuously extended by the OSGi Enterprise Expert Group (EEG). For example, recently, support for remote services has been added to OSGi. But this support implies only synchronous communication of remote services, thus limiting the application of OSGi in the area of enterprise systems, as enterprise systems typically embody both synchronous and asynchronous communication. To fill this gap, we propose a novel approach to integrating asynchronous communication into OSGi.
Download

Area 3 - Web Interfaces and Applications

Full Papers
Paper Nr: 17
Title:

ADDRESSING E-GOVERNMENT WEBSITE USABILITY - A Users’ Centred Evaluation

Authors:

Zhao Huang and Laurence Brooks

Abstract: The challenge of greater users’ participation with e-government remains a significant issue. E-government website usability has been found to be one of the major reasons influencing users’ interaction, which needs to be addressed when developing e-government. This empirical study evaluates a current e-government website, to identify specific usability problems. Based on the identified problems, the design solutions are proposed, redesigned and evaluated in order to develop a more usable e-government website. The results indicate that with the proposed design solutions provision, the identified usability problems have been addressed. Such improvements may promote overall usability of e-government and make better users’ task performance. These are beneficial for designers to further develop their e-government usability.
Download

Paper Nr: 21
Title:

OP2A - Assessing the Quality of the Portal of Open Source Software Products

Authors:

Gabriele Basilico, Luigi Lavazza, Sandro Morasca, Davide Taibi and Davide Tosi

Abstract: Open Source Software (OSS) communities do not often invest in marketing strategies to promote their products in a competitive way. Even the home pages of the web portals of well-known OSS products show technicalities and details that are not relevant for a fast and effective evaluation of the product’s qualities. So, final users and even developers, who are interested in evaluating and potentially adopting an OSS product, are often negatively impressed by the quality perception they have from the web portal of the product and turn to proprietary software solutions or fail to adopt OSS that may be useful in their activities. In this paper, we define an evaluation model and we derive a checklist that OSS developers and web masters can use to design their web portals with all the contents that are expected to be of interest for OSS final users. We exemplify the use of the model by applying it to the Apache Tomcat web portal and we apply the model to 22 well-known OSS portals.
Download

Paper Nr: 27
Title:

MANAGING USER INTERACTION IN AN ONTOLOGY-BASED SYSTEM

Authors:

Anna Goy and Diego Magro

Abstract: In this paper, we present an approach to the management of user interaction within an ontology-based system. In particular, we discuss the case of ARNEIS, an “intelligent” web-based repository of software solutions, that enables software houses to upload a description of their software solutions for business automation, and small-to-medium sized enterprises to find software products or services supporting their activities. To this purpose, as argued in the paper, a particularly meaningful field is Customer Relationship Management, that is thus exploited as a test bed domain. The functionality of ARNEIS is based on an ontological representation of the domain knowledge, which represents the shared conceptual vocabulary to express software descriptions and technological requirements. In this paper we describe our proposal for the management of a user-friendly interaction enabling software houses to upload the semantic representation of the description of their offers. The approach we propose within the ARNEIS scenario represents an example of a more general solution to face the issue of how to build formal representations of resources in an ontology-based IR system.
Download

Paper Nr: 35
Title:

VIZGR - Combining Data on a Visual Level

Authors:

Daniel Hienert, Benjamin Zapilko, Philipp Schaer and Brigitte Mathiak

Abstract: In this paper we present a novel method to connect data on the visualization level. In general, visualizations are a dead end, when it comes to reusability. Yet, users prefer to work with visualizations as evidenced by WYSIWYG editors. To enable users to work with their data in a way that is intuitive to them, we have created Vizgr. Vizgr.com offers basic visualization methods, like graphs, tag clouds, maps and time lines. But unlike normal data visualizations, these can be re-used, connected to each other and to web sites. We offer a simple opportunity to combine diverse data structures, such as geo-locations and networks, with each other by a mouse click. In an evaluation, we found that over 85 % of the participants were able to use and understand this technology without any training or explicit instructions.
Download

Paper Nr: 40
Title:

EXPLORING MULTI-FACTOR TAGGING ACTIVITY FOR PERSONALIZED SEARCH

Authors:

Frederico Durao, Ricardo Lage, Peter Dolog and Nilay Coşkun

Abstract: Coping with ambiguous queries has long been an important part in the research of Web Information Systems and Retrieval, but still remains to be a challenging task. Personalized search has recently got significant attention to address this challenge in the web search community, based on the premise that a user’s general preference may help the search engine disambiguate the true intention of a query. However, studies have shown that users are reluctant to provide any explicit input on their personal preference. In this paper, we study how a search engine can learn a user’s preference automatically based on a user’s tagging activity and how it can use the user preference to personalize search results. Our experiments show that users’ preferences can be learned from a multi-factor tagging data and personalized search based on user preference yields significant precision improvements over the existing ranking mechanisms in the literature.
Download

Paper Nr: 112
Title:

ACCESSIBILITY EVALUATION OF MOODLE CENTRED IN VISUAL IMPAIREMENTS

Authors:

Rocío Calvo, Ana Iglesias and Lourdes Moreno

Abstract: The use of new technologies has been increasing during last years in education. Specifically, the use of e-learning systems provides for students more freedom to learn through Internet when and where they prefer in each moment. However, e-learning tools are not perfect tools. Most of these e-learning tools present accessibility barriers, so not all students are able to use them completely in their studies. The goal of this paper is to evaluate if Moodle e-learning web tool is accessible for visually impaired people using assistive technologies like screen readers. The evaluation has been divided in two main objectives. Firstly, Moodle was evaluated from a user perspective.The interaction of a blind student with the system was simulated and two screen readers were used. Secondly, this tool was evaluated from an accessibility expert perspective and it was analysed if Moodle is in accordance to ATAG and WCAG W3C guidelines.
Download

Short Papers
Paper Nr: 6
Title:

AN EVENT DISTRIBUTION PLATFORM FOR RECOMMENDING CULTURAL ACTIVITIES

Authors:

Toon De Pessemier, Sam Coppens, Erik Mannens, Simon Dooms, Luc Martens and Kristof Geebelen

Abstract: Today, people have limited leisure time which they want to fill in according to their interests. At the same time, cultural organisations offer an enormous amount of activities via their websites. This scarcity of time and the abundance of cultural events reinforce the necessity of recommender systems that assist end-users in discovering events which they are likely to enjoy. However, traditional recommender systems can not cope with event-specific restrictions such as the availability, time and location of cultural activities. Moreover, aggregating the events, collecting consistent metadata, and enriching these metadata with cross-domain knowledge pose additional challenges for the conventional distribution and recommender systems. In this paper, we show how personalised recommendation, content-based filtering, and distribution of events can be enabled by the enrichment of events metadata via open linked data sets available on the web of data. For consistency across several events providers, we propose an event model using an RDF/OWL representation of the EventsML-G2 standard. Integrating these various functionalities as an extendable bus architecture provides an open, userfriendly event distribution platform that offers the end-user a tool to access useful event information that goes beyond basic information retrieval.
Download

Paper Nr: 7
Title:

CONTENT-BASED RECOMMENDATION ALGORITHMS ON THE HADOOP MAPREDUCE FRAMEWORK

Authors:

Toon De Pessemier, Kris Vanhecke, Simon Dooms and Luc Martens

Abstract: Content-based recommender systems are widely used to generate personal suggestions for content items based on their metadata description. However, due to the required (text) processing of these metadata, the computational complexity of the recommendation algorithms is high, which hampers their application in large-scale. This computational load reinforces the necessity of a reliable, scalable and distributed processing platform for calculating recommendations. Hadoop is such a platform that supports data-intensive distributed applications based on map and reduce tasks. Therefore, we investigated how Hadoop can be utilized as a cloud computing platform to solve the scalability problem of content-based recommendation algorithms. The various MapReduce operations, necessary for keyword extraction and generating content-based suggestions for the end-user, are elucidated in this paper. Experimental results on Wikipedia articles prove the appropriateness of Hadoop as an efficient and scalable platform for computing content-based recommendations.
Download

Paper Nr: 22
Title:

INNOVATION MINING - Supporting Web Mining in Early Innovation Phases

Authors:

Jan Finzen and Maximilien Kintz

Abstract: Fraunhofer IAO conducted a study among 1,000 German innovation professionals regarding their web-based information acquisition needs. The study showed the need for search tools and methods optimised for the target group of innovation professionals. In this paper we deduce accordant concepts from the study’s results. We suggest an “Innovation Mining Process” as a structured approach to web-based information acquisition for early innovation phases. Our software prototype - the Innovation Mining Cockpit (IMC) - picks up essential concepts of this process and implements them as an easy-to-use web portal. The IMC is intended as a central point of contact for innovation-related search activities.
Download

Paper Nr: 33
Title:

QUALITY ASSESSMENT OF WIKIPEDIA EXTERNAL LINKS

Authors:

Paraskevi Tzekou, Sofia Stamou, Nikos Kirtsis and Nikos Zotos

Abstract: Wikipedia is a unique source of information that has been collectively supplied by thousands of people. Since its nascence in 2001, Wikipedia is continuously evolving and like most websites it is interconnected via hyperlinks to other web information sources. Wikipedia articles contain two types of links: internal and external. Internal links point to other Wikipedia articles, while external links point outside Wikipedia and normally they are not used in the body of the article. Although there exist specific guidelines about both the style and the purpose of the article external links, no approach has been recorded that tries to capture in a systematic manner the quality of Wikipedia external links. In this paper, we study the quality of Wikipedia external links by assessing the degree to which these conform to their intended purpose; that is to formulate a comprehensive list of accurate information sources about the article contents. For our study, we estimate the decay of Wikipedia external links and we investigate their distribution in the Wikipedia articles. Our measurements give perceptible evidence for the value of external links and may imply their corresponding articles' quality in a holistic Wikipedia evaluation.
Download

Paper Nr: 37
Title:

BIODIVERSITY INFORMATION SYSTEMS USING GEOREFERENCING AND PHYSICAL IDENTIFICATION

Authors:

Tiago Lira, Cristina Ribeiro and João Correia Lopes

Abstract: Biodiversity information can be found in diverse information systems, ranging from large technical repositories for research purposes to dedicated databases supporting the web presence of natural sites. To deal with individual botanical objects, it is important to adopt the established scientific taxonomy. Providing a satisfying experience with current interface devices is also mandatory. Two problems have been addressed in this work: the unique identification of the objects and the representation of their physical location. The former has been tackled with QR code, a two-dimensional barcode standard, and the latter with the use of a PostgreSQL/PostGIS spatial database. The paper describes the development and test of a Web information system integrating physical identification and geo-referencing. The application in biodiversity information management shows that low-cost, state-of-the-art technologies can provide appropriate solutions even for small- and medium-sized natural sites, incorporating existing scientific information on species and contributing for a growing record of their occurrences. The system was tested on a university garden and can be used by site maintainers, visitors and researchers. The dataset collected in the site can be exported according to the Darwin Core biodiversity standard for research purposes.
Download

Paper Nr: 56
Title:

THE STONE AGE IS BACK - HCI Effects on Recommender Systems

Authors:

Yuval Dan-Gur

Abstract: We addressed HCI and social aspects of recommender systems by studying the uncharted domain of the advising group and the user's control over it. We conducted a longitudinal field study in which, for two years, our research tool, QSIA (which means QUESTION in Hebrew language), was free for use on the web and was adopted by various institutions and classes of heterogeneous learning domains. QSIA enables the user to be involved in the formation of the advising group. The user was free to choose advising group for each recommendation sought, while the default choice is the common 'neighbors group'. QSIA yielded high internal validity of acceptance and rejection ratios due to the immediate "usage actions" that followed the recommendation outputs. Although the objective amount of data in QSIA's logs are fairly large (31,000 records, 10,000 items, 3,000 users), the relevant figures for analysis of recommendations are modest – 895 recommendations seeking records, accepted from 108 users, 3,000 rankings by 300 users, and 1,043 "usage actions" by 51 users. Our findings suggest that the perceived quality of the recommendations (measured in terms of "usage actions") is 14% to 24% higher (α≤0.05) for user-controlled 'friends group' than for machine-computed 'neighbors group'. We almost felt that the ancient tribal friends "revived" in modern Information Systems.
Download

Paper Nr: 57
Title:

IMPROVING HOW USERS ORGANIZE WEB INFORMATION FOR GATHERING TASKS

Authors:

Anwar Alhenshiri, Michael Shepherd and Carolyn Watters

Abstract: This paper presents a Visual Information Management System (VIMS) for the Web. In this prototype system, the task of Web information gathering was taken into consideration with respect to how users locate information for the task, organize task information, preserve and re-find task information, and compare information for effective reasoning and decision making. VIMS was designed and built based on recommendations from previous studies in a larger research. A user study was conducted to evaluate VIMS. The results of the study show promising indications of the improvements achieved in VIMS with respect to Web information gathering tasks.
Download

Paper Nr: 61
Title:

ONLINE WEB GENRE CLASSIFICATION, IS IT DOABLE?

Authors:

Hoda Badesh, James Blustein and Anwar Alhenshiri

Abstract: This paper investigates the feasibility and effectiveness of online clustering of Web search results by genre. Although there are several research studies that have investigated the accuracy of classifying Web pages by genres, research has focused only on off-line clustering and classification due to the large number of documents on the Web. This research intends to investigate the feasibility of creating sets of Web pages to represent main genres on the Web. Each genre, as identified in the work of Santini (2006), will be represented by a set of Web pages. Web search results will be compared to those sets and classified accordingly. Search results will be grouped according to their similarities to which set of genre representatives. The resulting clusters of Web search results will be rendered to the user. A user study will be conducted to examine the validity and accuracy of online clustering based on Web genres.
Download

Paper Nr: 78
Title:

A MULTILEVEL UNL CONCEPT BASED SEARCHING AND RANKING

Authors:

E. Umamaheswari, T. V. Geetha, Ranjani Parthasarathi and Madhan Karky

Abstract: The recent advances in search engines have resulted in a huge explosion of available web documents. Understanding the content on the web and providing meaningful search results to the user have become essential for any search engine. This paper proposes CoReS, a multilevel Concept based Searching and Ranking Algorithm which retrieves and ranks the documents based on the concepts and relationships between the concepts. The search and rank methodology is based on Universal Networking Language (UNL) representation of the documents. The UNL Index based query expansion technique is used to provide more meaningful results to the user. The algorithm has been evaluated on a corpus of tourism documents, and its performance compared with keyword based search. The mean average precision of the concept based search is found to be 0.75 while the keyword based search has a MAP score of 0.45.
Download

Paper Nr: 83
Title:

MANAGING MULTIMODAL AND MULTILINGUAL SEMANTIC CONTENT

Authors:

Michael Martin, Daniel Gerber, Norman Heino, Sören Auer and Timofey Ermilov

Abstract: With the advent and increasing popularity of Semantic Wikis and the Linked Data the management of semantically represented knowledge became mainstream. However, certain categories of semantically enriched content, such as multimodal documents as well as multilingual textual resources are still difficult to handle. In this paper, we present a comprehensive strategy for managing the life-cycle of both multimodal and multilingual semantically enriched content. The strategy is based on extending a number of semantic knowledge management techniques such as authoring, versioning, evolution, access and exploration for semantically enriched multimodal and multilingual content. We showcase an implementation and user interface based on the semantic wiki paradigm and present a use case from the e-tourism domain.
Download

Paper Nr: 88
Title:

TAG CLOUDS FOR SITUATED INTERACTION AND PLACE PROFILING

Authors:

Rui José, Bruno Silva and Fernando Reinaldo Ribeiro

Abstract: Tag clouds have become very popular as visual representations of the main topics in document sets or as navigation tools that can provide quick access to resources related with specific topics. However, their ability to represent the information environment associated with any meaningful reality in a way that is collectively visible, actionable and easily understood may also be very relevant, even when the reality being represented is no longer a set of documents or resources, but a stream of interactions occurring within a particular ubiquitous computing environment. In this paper, we explore the use of tag clouds within the context of situated displays and services. We hypothesise that such tag clouds may have a role as dynamic representations of place and also as interaction controls, supporting the same comprehension and navigation functions of classical tag clouds. We describe two case studies in which this concept of situated tag cloud has been experimented in real-world settings. The case studies demonstrate two different applications of the tag cloud concept as the basis for place description and situated interaction. The results obtained from the case studies suggest that situated tag clouds can indeed provide valuable representations of place and situations and can also support simple interaction models, allowing people to reason about the system behaviour and how it is being influenced by new interactions.
Download

Paper Nr: 97
Title:

NEWS RECOMMENDING BASED ON TEXT SIMILARITY AND USER BEHAVIOUR

Authors:

Dušan Zeleník and Mária Bieliková

Abstract: In this paper we describe a method for recommending news on a news portal based on our novel representation by a similarity tree. Our method for recommending articles is based on their content. The recommendation employs a hierarchical incremental clustering which is used to discover additional information for effective recommending. The important and novel part of our method is an approach to discovering the interests of individual readers using tree structure created according to similarity of articles. We concentrate on enabling the recommendations in any time, i.e. we discover user’s interests real-time. Our method discovers specific interests of the reader using information gained from monitoring his activities in the news portal. We describe the mechanisms for recommending up-to-date and relevant articles. It is based on known solutions, but incorporates unique representation of user interests by binary tree. Moreover, our aim was to provide recommendations in real-time. Recommendations are thus generated depending on the actual reader’s interest. We also present an evaluation of recommendations in the experiment where we use accounts of real readers and their history of reading.
Download

Paper Nr: 101
Title:

SEARCHING WEB 3.0 CONTENT - A Semantic Search Approach

Authors:

Kinga Schumacher and Michael Sintek

Abstract: The Web 3.0 is commonly understood as the combination of the Semantic Web and Web 2.0, where conventional and social websites and data sources (e. g., DBs, XML, HTML, plain text) should be integrated and linked as well. Thus, there is a plethora of information in various representation forms which can be mapped to an information pool composed of a knowledge base (in RDF/S) and a text index. In doing so, structured data (e. g., DBs) is usually mapped to the knowledge base while unstructured data (e. g., plain text) to the text index and semi-structured (e. g., XHTML) data to both. Therefore, a search method is required which is able to explore both the knowledge base and the text index exploiting the cross-linking of data. For this purpose, we propose a search approach which combines fact retrieval and semantic document retrieval. It is able to answer queries with facts and documents as well as documents together with facts, and it supports free text and formal queries as well as queries composed of free text and formal parts.
Download

Paper Nr: 105
Title:

BRIDGING NAVIGATION, SEARCH AND ADAPTATION - Adaptive Hypermedia Models Evolution

Authors:

Evgeny Knutov, Paul De Bra, David Smits and Mykola Pechenizkiy

Abstract: Adaptive Hypermedia Systems (AHS) have long been concentrating on adaptive guidance of links between domain concepts. In this paper we first study parallels between navigation and linking in hypertext on the one hand and information searching or querying on the other hand. We show that to a large extent linking and searching can be modeled in the same way. Secondly we present a transition towards search in AHS by aligning the web search process with the layered structure of AHS and link adaptation process. In the end we sketch the on-going implementation of an open corpus adaptation carried out in the context of the ’Grapple’ adaptive e-learning environment.
Download

Paper Nr: 111
Title:

ONTOCOMPO - An Ontology-based Interactive System to Compose Applications

Authors:

Christian Brel, Anne-Marie Dery-Pinna, Catherine Faron-Zucker, Philippe Renevier-Gonin and Michel Riveill

Abstract: In this paper, we present an ontology-based approach and a semantic web system to compose applications while preserving their ergonomic properties. Our composition process relies on the manipulation of User Interfaces (UI) and is intended to assist by a knowledge based system which exploits semantic annotations of applications on their users' aims, UIs and functionalities through semantic queries and inference rules.
Download

Paper Nr: 116
Title:

AN OPEN, EXTENDABLE SERVICE PLATFORM FOR THE ELDERLY

Authors:

Ingo Zinnikus, Christian Husodo-Schulz, Patrick Kapahnke and Darren Maughan

Abstract: Addressing the needs of the elderly is a challenging but imperative task in most Western societies. There is general consensus that the best way to improve this situation is the progressive introduction of technical aids to allow users achieving a more independent life. However, senior citizens frequently have difficulties to make use of the advances in our modern information society technology because it requires that they need to learn from scratch how to use some sort of specific terminal. Elderly users demand specially designed interfaces that are: simple, affordable, always ready, integrate easily in their living environment and offer a single access point for all services. In order to cope with these difficulties, the VITAL project proposes a combination of advanced information and communication technologies that uses a familiar device like the TV as the main vehicle for the delivery of internet services to elderly users in home environments. In this paper, we describe the platform architecture, as well as the web-based applications and their integration into the platform.
Download

Paper Nr: 128
Title:

WEB SERVICE WRAPPING, DISCOVERY AND CONSUMPTION - More Power to the End-user

Authors:

Ismael Rivera, Knud Hinnerk Möller, Siegfried Handschuh and Albert Zündorf

Abstract: In B2B systems integration and web services, many companies see the advantage of increased operational efficiencies and a reduction of costs. In this scenario, highly qualified software developers are responsible for the integration of services with other systems. However, this model fails when targeting the long tail of enterprise software demand, the end-users. Discovery and consumption of web services are difficult tasks for end-users. This means that potential long tail of end-users creating task-specific applications from existing services is as of yet completely untapped. This paper presents an approach to facilitate the discovery and consumption of business web services by end-users, closing the gap between the two. The approach includes: (a) a catalogue which users can browse to search for web services fitting their needs, and (b) a method to generate ready-to-use web service wrappers to use in the catalogue.
Download

Paper Nr: 129
Title:

SEARCH ENGINE RESULT PAGE WITH VISUAL CONTEXT AND ALREADY RENDERED SNIPPETS

Authors:

Kazutaka Maruyama, Masato Igeta and Minoru Terada

Abstract: Search engine result pages, aka SERP, provide page titles, URLs, and text based snippets of the searched pages. These help users to decide to click or not each searched page. Text based snippets, however, are different from the real page rendered by web browsers, after a user clicks one of the searched pages, the user may see an unexpected one. Even if the page is the expected one, the user cannot find immediately where the information they want to know is described in the page. In this paper, we propose visual patch, which is provided as a graphical snippet in SERP and is rendered by web browsers already. Visual patch is a visual context based snippet, not a thumbnail of the whole web page in a smaller size, and is trimmed around the occurrence of the query keyword in the same size as the original page rendered by web browsers. Users can see the layouts around the keyword and read the concrete text just before clicking the searched pages. Thus visual patch in SERP shows not web pages which may include the information users want to know, but the information they want to know or not.
Download

Paper Nr: 130
Title:

SUPPORTING LEARNER MODEL EXCHANGE IN EDUCATIONAL WEB SYSTEMS

Authors:

Eddie Walsh, Alexander O'Connor and Vincent Wade

Abstract: The heterogeneity of learner models in structure, syntax and semantics makes sharing them a significant challenge for existing educational web systems. Creating mappings between the different types of learner models is one technique that is used when attempting to overcome these issues. This paper presents an overview of research currently being conducted in the area of learner model exchange and defines a categorization, derived from existing educational web systems, of the different mapping types that are required for learner model mapping. Following this, a framework is presented that supports the creation and validation of these different mapping types and the exchange of learner information between multiple heterogeneous educational web systems.
Download

Paper Nr: 133
Title:

WEB-BASED EXPLORATION OF TEMPORAL DATA IN BIOMEDICINE

Authors:

Martin J. O’Connor, Mike Bingen, Amanda Richards, Samson W. Tu and Amar Das

Abstract: A growing number of biomedical researchers must analyze complex patterns in time-course or longitudinal data. Researchers commonly use Excel, SAS or custom programs to query for relevant patterns in their data. However, these approaches typically entail low-level method development. Analyses are tailored to a specific data set and research question, and they cannot be easily modified or shared. As a result, there is a need for tools to facilitate the specification of temporal analyses at a high level. Such tools would hasten the development process and produce reusable methods. We have developed a web-based application called SWEETInfo to meet this need. Investigators can use it to collaboratively manipulate, explore, and visualize temporal data. Our tool combines semantic web technologies such as OWL and SWRL with a variety of standard web technologies. SWEETInfo can generate complex temporal analyses interactively. It also supports publication of descriptions of analyses, and allows them to be easily shared with others and adapted by them. We evaluated this tool by replicating a longitudinal study of drug effectiveness for HIV anti-retroviral therapy.
Download

Paper Nr: 148
Title:

A PERSONALIZED FORUM ENVIRONMENT

Authors:

Anna Stavrianou and Magdalini Eirinaki

Abstract: Web2.0 has resulted in an increasing popularity of personalized recommender systems, especially in the context of social networking applications. Although there exist design approaches available for such systems, most of them make very explicit assumptions on the application domain as well as on the availability and data types to be used as input. In this position paper, we discuss the requirements and challenges of Forum Recommender Systems. Such systems aim at generating automatically posting recommendations for the different user profiles that deal with a forum. Despite the fact that these systems share characteristics with other social media, they have hardly been explored due to the particularities they present in terms of structure, context and user differences. Here, we discuss the particularities of Forum Recommender Systems and we propose a framework that enables the gathering of profile data and the generation of posting recommendations. The proposed framework can also be adjusted to other social networks.
Download

Paper Nr: 155
Title:

CONTENT-BOOSTED COLLABORATIVE FILTERING USING SEMANTIC SIMILARITY MEASURE

Authors:

Ugur Ceylan and Aysenur Birturk

Abstract: Collaborative filtering is one of the most used recommendation approaches in recommender systems. However, collaborative filtering systems have some major problems such as sparsity, scalability and cold-start problems. In this paper we focus on the sparsity and item cold-start problems in collaborative filtering in order to improve the quality of recommendations. We propose an approach that uses semantic similarities between items based on a priori defined ontology-based metadata in the movie domain. According to the semantic similarities between items and past user preferences, recommendations are made. The results of the evaluation phase show that our approach improves the quality of recommendations.
Download

Paper Nr: 157
Title:

NAVIGATION ALONG DATABASE RELATIONSHIPS - An Adaptive Framework for Presenting Database Contents as Object Graphs

Authors:

Ahmet Atli, Katja Nau and Andreas Schmidt

Abstract: In this paper, we will present an approach to visualizing arbitrary relational database contents in the form of object graphs via the World Wide Web. The focus is on the relationships between the datasets rather than on the data itself. The tool allows definition of different node types representing the datasets and edges representing the foreign keys and relationship tables in the database schema. Each node type has a label, and optionally a short description and a user definable image associated with it. The information for these fields can be extracted 1 : 1 from the corresponding database tables or otherwise be aggregated from different tables. Along the edges, it is possible to navigate through the content of the database. At any time, exactly one node represents the center of the object graph. Starting from this graph, edges and other nodes down to a user-definable depth n are visualized. The depth n may vary along different node and edge types, so that it is possible to customize the representation of the object graph. The graphical representation of arbitrary database contents has been of great help to us. In addition to using it in the initial application area, we intend to use it in some other areas we had left unconsidered. From these, we can infer a number of suggestions as to how to improve our tool and make it more universal.
Download

Paper Nr: 25
Title:

TOWARD REUSABILITY IN WEB MODELING - Using QVT Relations

Authors:

Ali Fatolahi, Stéphane S. Somé and Timothy C. Lethbridge

Abstract: In this paper, a model-driven approach for web development is presented. The approach contains two important elements that serve reusability: an abstract model and a set of transformations. Transformations act as the chaining feature of model-driven development (MDD); that is transformations add to the value of models by transforming them to those of the desired type. As a standard for developing transformations, QVT relations are used in this paper to specify mappings from a high-level model to an abstract model of web-based applications. This model is abstract since it does not rely on any specific web platform but on the common features of web applications. Having this model and its corresponding transformations, model-driven web development for specific platforms becomes faster and more reusable.
Download

Paper Nr: 54
Title:

SALO - Sharable Auto-adaptive Learning Object

Authors:

Ignacio Gutiérrez Menéndez, Mª del Puerto Paule Ruiz and Juan Ramón Pérez Pérez

Abstract: New pervasive computing scenarios such as ubiquitous learning (u-learning) and mobile learning (m-learning) demands real time adaptation. Getting such adaptation couldn’t be possible with the current technologies. It is necessary a change in the development of pervasive e-learning systems using dynamic technologies and including them in both: platforms and e-learning content specifications. In this paper we define and develop the concept of Sharable Auto-Adaptive Learning Object (SALO) like a Learning Object which includes content and describe its behaviour thanks to dynamic languages. Such features allow it to change and include new resources and behaviour, in a dynamic way and using the user’s context at 'anytime, anywhere, from any device'.
Download

Paper Nr: 69
Title:

AN ONLINE EVALUATION OF EXPLICIT FEEDBACK MECHANISMS FOR RECOMMENDER SYSTEMS

Authors:

Simon Dooms, Toon De Pessemier and Luc Martens

Abstract: The success of a recommender system is not only determined by smart algorithm design, but also by the quality of user data and user appreciation. User data are collected by the feedback system that acts as the communication link between the recommender and the user. The proper collection of feedback is thus a key component of the recommender system. If designed incorrectly, worthless or too little feedback may be collected, leading to low-quality recommendations. There is however little knowledge on the influence that design of feedback mechanisms has on the willingness for users to give feedback. In this paper we study user behavior towards four different explicit feedback mechanisms that are most commonly used in online systems, 5-star rating (static and dynamic) and thumbs up/down (static and dynamic). We integrated these systems into a popular (10,000 visitors a day) cultural events website and monitored the interaction of users. In 6 months over 8000 ratings were collected and analyzed. Current results show that the distinct feedback systems resulted in different user interaction patterns. Finding the right technique to encourage user interaction may be one of the next big challenges recommender systems have to face.
Download

Paper Nr: 79
Title:

SCALESEM - Evaluation of Semantic Graph based on Model Checking

Authors:

Mahdi Gueffaz, Sylvain Rampacek and Christophe Nicolle

Abstract: Semantic interoperability problems have found their solutions using languages and techniques from the Semantic Web. The proliferation of ontologies and meta-information has improved the understanding of information and the relevance of search engine responses. However, the construction of semantic graphs is a source of numerous errors of interpretation or modelling and scalability remains a major problem. The processing of large semantic graphs is a limit to the use of semantics in current information systems. The work presented in this paper is part of a new research at the border of two areas: the semantic web and the model checking. This line of research concerns the adaptation of model checking techniques to semantic graphs. In this paper, we present a first method of converting RDF graphs into NμSMV and PROMELA languages.
Download

Paper Nr: 81
Title:

A WEB-BASED TOOL FOR SPATIOTEMPORAL FILTERING AND CONTINUOUS ANIMATION

Authors:

Alex Vakaloudis and Simeon Veloudis

Abstract: We describe MoveMap, a front-end tool for spatiotemporal databases with moving objects. Built over the Google Maps technology, is independent of any underlying data model or query language. It accommodates continuous temporal navigation and aims to both precision and abstraction by employing the Google Maps DirectionsService utility. For filtering and controlling the display, it includes a set of spatiotemporal operators that can be dynamically triggered, as the navigation proceeds in time. Spatiotemporal querying is thus performed in two different layers; first at the server level which can be accomplished by any underlying framework and second on the client through this mechanism of associating query conditions to browser events.
Download

Paper Nr: 82
Title:

AN ACCESSIBLE WEB-BASED INFORMATION BROADCASTING SYSTEM FOR BLIND PEOPLE

Authors:

Hsiao-Ping Lee, Chien-Hsing Chen and Tzu-Fang Sheu

Abstract: Generally, audio data is most accessible to the blind. One of the possible ways to improve the web accessibility for the blind is to provide them with the information in audio format. It is essential to develop an automatic and customized system for blind people that converts the information on the Internet to audio in real-time. In this paper, we propose a novel accessible information broadcasting system, which is specific designed for blind people. The proposed system provides an audio-based information broadcasting service with an accessible web interface. The proposed system searches and converts the customized information on the Internet or user-specified articles to audio. Via modern text-to-speech technology, the conversion can be done automatically. By subscribing preferred subjects, blind people can access the information that they are interested in periodically without barriers. Moreover, the user interface of the proposed system is approved at A+ certification level under web accessibility guidelines. That is, blind people can access abundant information on the Internet on demand by using the proposed system. The proposed system approaches the goal of realizing friendly access of the Internet for blind people.
Download

Paper Nr: 99
Title:

TOWARD THE CREATION OF A GREEN CONTENT MANAGEMENT SYSTEM

Authors:

Matteo Ciman, Ombretta Gaggi and Marco Sbrignadello

Abstract: In this paper we discuss the problem of efficient use of Content Management Systems in the development of a web site, and we propose a new approach towards the definition of a Green Content Management System (GCMS). Our GCMS distinguishes between pages which contain dynamic data and pages which are not supposed to change frequently. This second kind of documents is generated, off-line, every time that a change occurs, thus avoiding waste of CPU time, and improving the response time of pages. For this reason, we call our CMS “green”. Moreover, our GCMS supports the creation of fully accessible web sites.
Download

Paper Nr: 110
Title:

FOUR ASPECTS OF RELEVANCE IN SHARING LOCATION-BASED MEDIA: CONTENT, TIME, LOCATION AND NETWORK

Authors:

Pasi Fränti, Jinhua Chen and Andrei Tabarcea

Abstract: Sharing information via internet is popular but the key problem is how to find relevant information. Two new features are becoming more popular: location and the social network of the user. We hypothesize that the relevance of data is defined by four aspects: content, time, location, and user network. We study how the location aspect is used in a media-sharing service called MOPSI.
Download

Paper Nr: 131
Title:

REQUIREMENTS ENGINEERING OF WEB APPLICATION PRODUCT LINES

Authors:

Hernán Casalánguida and Juan Eduardo Durán

Abstract: Application families (AF) are usually developed to reduce time to market and development costs of applica-tions. Therefore, it is attractive to investigate the development of web AF (WAF) and to have an adequate requirements engineering (RE) method for WAF. A problem little studied is how to classify use cases (UC) for rich internet applications (RIA); to give better guidance to the developer we define a taxonomy for RIA UCs. UCs are described using UML activity diagrams (AD) in some web methods, but the taxonomies for actions proposed in them can be improved, because there are not enough action classes or they are too monolithic; for this reason we define a new action classification for RIA ADs. Studying the AD variability notations, we found a set of requirements for them; we define a notation for ADs satisfying these require-ments and fulfilling some of them in a better way than in the literature. Non-functional requirements (NFR) for WAF must be documented. There are some goal-based approaches with variabilities; but they do not consider the modeling of NFRs and only consider examples for hard goals. We extend the NFR framework for the description of NFRs adding to it modeling elements for expressing variabilities.
Download

Paper Nr: 149
Title:

FINDING AND REFINDING WEB PAGES IN CONTEXT - A Tree-based Model of Web History

Authors:

David Briffa and Chris Staff

Abstract: A modern challenge for the World Wide Web (Web) is not of just finding information without getting ‘lost’ in hyperspace, but also re-finding it efficiently. Web Nav is an integrated navigation system that combines both history and page recommendations into one context based tool. Web Nav’s framework signifies a paradigm shift in the viewing of history from a linear stack-based system to a hierarchal tree-based system. Web Nav was evaluated qualitatively and quantitatively, analysing 13 users’ activity over a seven day period. The results are mixed but there is sufficient evidence to suggest tree-based views of history can be beneficial: to allow users to revisit web pages in context; to show user sessions as trees; and to automatically generate queries based on session contexts to recommend web pages.
Download

Paper Nr: 152
Title:

ADAPTING WEB IMAGES FOR BLIND PEOPLE

Authors:

A. Fatzilah Misman and Peter Blanchfield

Abstract: One way to remedy the gap that evidently exists between the image element on the web and the web user who is visually blind is by redefining connection between the image and the abundant element of the web itself i.e. text. Studies on the exploitation are done largely related to the fields like the HCI, the semantic web, the information retrieval or even a new hybrid approach. However, often many see the problem from the perspective of the third party. This position paper posits that the problem can also be seen from the fundamental reasons for an image being on a web page without neglecting the connection that develops from the web user’s perspective. Effective and appropriate image tagging may consider this view.
Download

Paper Nr: 159
Title:

BIOPEN: A PROPOSED WEB APPLICATION FOR IMPLEMENTING THE BIOLOGICAL ELECTRONIC PROTOCOL SYSTEM OF ENTEROPATHOGENS CHARACTERIZATION - Personalized Web Sites and Services

Authors:

Terumi Paula Bonfim Kamada, Danielle Forbeci Suzuki, Maicon Silva Costa, Leonardo Magalhaes Cruz and Luiz Antônio Pereira Neves

Abstract: In the present work, we have developed a Biological Electronic Protocol System for Enteropathogens Characterization (BIOPEN) with a Web Application interface that allows users to query data for characterization and analysis of pathogens through an integrated management of accurate collection, storage and retrieval of data, which aids decisions in health actions to combat diseases caused by these microorganisms. The proposed tool is an electronic protocol that aims to facilitate data sharing in health care, using a structured database to store information concerning about clinical assays and molecular characterizations of pathogenic strains and clinical isolates. BIOPEN was developed using open source facilities and is freely distributed, allowing a particular laboratory create a customized, reliable, and low cost database. Thus, the principal contribution of this work is to provide a tool that store and integrate results of physiological, biochemical, and molecular clinical tests that can help researchers in taxonomic identification of clinical isolates and prospective epidemiological studies.
Download

Area 4 - Society, e-Business and e-Government

Full Papers
Paper Nr: 80
Title:

DEVELOPING A MODEL OF CITIZENS’ PERCEPTION OF E-GOVERNMENT SYSTEM PERFORMANCE AND THEIR ATTAINED BENEFIT

Authors:

Shatha Al-Haddad and Peter N. Hyland

Abstract: Governments world wide have been, increasingly, implementing e-government initiatives for their potential significant benefits; among which is delivering better services to citizens through increasing citizens’ convenience, satisfaction, and independency; and saving their time, effort, and cost. Achieving each benefit is an objective to these governments; and fulfilling each objective is considered a critical success factor. Hence, governments need to assess whether they were able to obtain their preset goals, and to which degree they were able to do so. This study merely focuses on the citizens’ perspective of the evaluation. However, the relevant literature seem to lack adequate studies that propose such evaluation tool that is sufficient and has been reliably validated. Therefore, the purpose of this study is to fill this gap by proposing a conceptual model which measures the e-government performance from citizens’ perspective and their psychological and tangible benefits. While developing the model we also consider the attributes which impact citizens’ perceptions and obtained values which, in turn, influence their adoption.
Download

Short Papers
Paper Nr: 29
Title:

POST-ADOPTION BEHAVIOUR, COMMUNITY SATISFACTION AND PCS - An Analysis of Interaction Effects in the Tuenti Community

Authors:

Manuel J. Sánchez-Franco, Félix A. Martín-Velicia and Borja Sanz-Altamira

Abstract: Our research contributes to the existing literature by examining the community drivers (i.e., participation, organisation and satisfaction) and their effects on the sense of belongingness to a social networking site. Our analysis also emphasises the importance of continuance over initial acceptance; indeed, post-adoption phenomena have traditionally received scarce attention. In particular, our study will consider the interaction effects of routinisation on the research model. A survey is conducted for data collection. Partial Least Square (PLS) is proposed to assess the relationships between the constructs together with the predictive power of the research model. Overall, the results reveal that members’ attachment to an online community is determined by community satisfaction, participation and organisation. Moreover, higher routinisation reduces the impact of community organisation on integration, and in turn increases the impact of satisfaction on integration. The model and results can consequently be used to assess different strategic proposals related to participation, organisation and satisfaction during the implementation process.
Download

Paper Nr: 45
Title:

DATA PRIVACY IN WEB ANALYTICS - An Empirical Study and Declaration Model of Data Collection on Websites

Authors:

Darius Zumstein, Aleksandar Drobnjak and Andreas Meier

Abstract: Web analytics has become a useful instrument for electronic business and website management to analyze and optimize website usage. However, different concerns arise in web analytics regarding the collection, storage and usage of web data. To maintain user confidence in websites, operators need to comply with privacy and to inform truthfully about data collection. One way to achieve transparency would be by means of informing users on the purpose, methods and processes of data collection taking place and on the subsequent analysis and use of such. Results of an empirical study conducted show that 16% of the Forbes 500 listed companies, do not declare their data collection practices at all. Moreover, 35% do not declare the usage of cookies and 61% do not declare the recording of IP addresses. Surprisingly, 91% of the websites do not name the web analytics system (provider) used to track web usage and user data. A large backlog exists regarding transparency of data collection in privacy policies, especially when Google Analytics is used. This paper demands more transparency in web analytics and proposes a declaration model with seven criteria for the evaluation of data collection on websites: type of data being collected, purpose of data collection, method of data collection as well as the technology and software used for data collection. In addition, websites should provide the right to access personal data and to deactivate data collection during visits.
Download

Paper Nr: 59
Title:

UNDERSTANDING DETERMINANTS OF COMPLAINT INTENTIONS IN ONLINE SHOPPING - The Perspectives of Justice and Technology

Authors:

Ing-Long Wu, Chi-Ying Huang and Chu-Ying Fu

Abstract: Consumers’ complaint behaviors are critical in determining repurchase behaviors in online shopping. An understanding of complaint behaviors can provide insight to the failed service experience with consumers and in turn, effectively redress consumers’ problems. Therefore, it is important to comprehend the antecedents of complaint intentions in online shopping. The major issue is two-fold: behavior and technology. This study thus integrates justice theory and expectation-confirmation model to examine the antecedents of complaint intentions in terms of these two issues. Moreover, customer satisfaction is an important mediator in the relationship structure. Data are collected from online shoppers with dissatisfied experience. Structural equation modeling is used to analyze this model. The results indicate that distributive justice and interactional justice are important in influencing customer satisfaction and complaint intentions while interactional justice is not. Technology-based antecedents, such as perceived usefulness, are all important in determining customer satisfaction and complaint intentions. Implications for managers and scholars are discussed.
Download

Paper Nr: 118
Title:

APPLYING LOGISTIC REGRESSION TO RANK CREDIBILITY IN WEB APPLICATIONS

Authors:

Rafael Lima and Adriano Pereira

Abstract: The popularization of the World Wide Web (WWW) has given rise to new services every day, demanding mechanisms to ensure the credibility of these online services. Since now, little has been done to measure and understand the credibility of this complex Web environment, which itself is a major research challenge. In this work, we use logistic regression to design and evaluate the credibility of a Web application. We call a credibility model a function capable of assigning a credibility value to transaction of a Web application, considering different criteria of this service and its supplier. In order to validate our proposed methodology, we perform experiments using an actual dataset, from which we evaluated different credibility models using distinct types of information sources, and it allows to compare and evaluate these credibility models. The obtained results are very good, showing representative gains, when compared to a baseline. The results show that the proposed methodology are promising and can be used to enforce trust to users of services on the Web.
Download

Paper Nr: 119
Title:

SELF-ORGANIZATION OF VIRTUAL COMMUNITIES - Need and Members’ Participation

Authors:

Juliana de Melo Bezerra and Celso Massaki Hirata

Abstract: In a self-organizing virtual community, the members are responsible for defining the norms that govern the community. A model for self-organizing virtual communities is proposed. The model is useful to study a self-organizing community and understand how it is structured in order to improve its self-organization feature. The model is used in two investigations. We investigate the need of self-organization feature by virtual communities. We also investigate the participation of members in the self-organization process, by identifying the categories of members that contribute significantly during the elaboration of norms.
Download

Paper Nr: 158
Title:

CITIZEN CONTROLLED EXCHANGE OF INFORMATION IN E-GOVERNMENT

Authors:

Helder Gomes, André Zúquete and Gonçalo Paiva Dias

Abstract: The online provision of public services to citizens, e-government, is here to stay. Its advantages are huge, both for the government and for the citizens. Life-event service is a sound concept in which services are designed to cater with citizen real needs instead of government departments needs. But this type of services requires interoperability between government departments. Among other things, interoperability implies the exchange of information between government departments which traditionally has been implemented by direct communication. This direct communication raises privacy concerns on citizens since their personal information is potentially exchanged without their knowledge and control. In this paper we propose an e-government model where the citizen controls the exchange of his personal information between government departments.
Download

Paper Nr: 12
Title:

INTERCULTURAL LEARNING AND COLLABORATION ASPECTS IN COMMUNITIES OF PRACTICE

Authors:

Ileana Hamburg

Abstract: Communities of Practice (CoPs) and intercultural learning are approaches empowering people with the skills and knowledge needed to turn change to an advantage d to help them to integarte in social life. In this paper we firstly discuss CoPs to support learning, sharing, transferring and developing knowledge and then present some statements for building CoPs based on Social media and using intercultural learning. We give examples of projects where we develope CoPs and present some conclusions.
Download

Paper Nr: 34
Title:

SUPPLY INFORMATION VISIBILITY FROM SUBCONTRACTOR NETWORK TO MANUFACTURER TO DECREASE SUPPLY RISK

Authors:

Salmela Erno and Happonen Ari

Abstract: This paper presents a web-based operational model which improves the visibility of inventories in a subcontractor network to the manufacturer, and consequently, reduces the supply risk for the manufacturer. There is a need for a new operational model because the supply problems in a subcontractor network disrupt the activity of the manufacturer. In practice, the manufacturer sees these disturbances especially as an increase in production costs, but they also impact the manufacturer's product sales because delayed delivery to end-customers leads to potential losses in sales. The operating model developed offers the manufacturer advance information on the possible supply problems of subcontractors. Consequently, delays in deliveries can be reduced or better anticipated. At present, subcontractors in the supply chain studied mainly manage inventory and production information with Excel, and transfer information by e-mail. A centralised, web-based system would make the activity more organised and systematic, and would make the information flows uniform.

Paper Nr: 36
Title:

BLENDED SHOPPING - Evaluation Method and Interactive Approaches

Authors:

Britta Fuchs, Thomas Ritz, Bastian Halbach and Florian Hartl

Abstract: In this paper we sketch the sales channels retail and eCommerce. Afterwards the integration of both channels to blended shopping is explained. The behaviour of consumers is influenced by technology driven trends like the development of web 2.0, especially social networks. The combination of the presented topics is the basis for a method to structure the development of blended shopping concepts for retailer. Two prototypes based on the basis of this method are sketched. The paper closes with a future outlook.
Download

Paper Nr: 58
Title:

FACILITATING CONSTRUCTIVISM STUDY IN ONLINE MUSEUM VIA SOCIAL MEDIA

Authors:

Cui Binyue and Yokoi Shigeki

Abstract: With the development of Web 2.0 technologies, online museums generally attract online visitors to constructivism study which is related but not limited to the physical experience in a real museum. Constructivism study theory indicates that the study will be happened most effectively when learners create physical artifacts or discuss with fellow learners around certain topics or knowledge. In this paper, we address integrating social media with online museum, forming the circle of attracting users, supporting communication, accumulating knowledge, disseminating knowledge via social media and then attracting more users. During this circle, facilitate users’ constructivism study by providing contextual knowledge relevant to a certain collection and make learning be a social activity via social media.
Download

Paper Nr: 70
Title:

COMMUNITY DRIVEN REQUESTS FOR PROPOSALS - Applying Semantics to Match Customer Purchase Intents to Vendor Offers

Authors:

Christophe Debruyne, Davor Meersman, Mathias Baert and Rami Hansenne

Abstract: This paper presents a platform for requests for proposals and describes how ontologies drive the different components: the creation of a proposal, the annotation of vendor data, the transformation of vendor data into other formats and the semantic matching of a proposal against annotated vendor data. The ontology construction started from DOGMA, a methodology with its grounding in the linguistic representation of knowledge that is suitable for community participation in the creation process. The ontologies were created in a modular way, with general product and meta-models that can be extended depending on the domain. In the case of the pilot, the product were holiday packages, more precisely winter sports holiday packages.
Download

Paper Nr: 74
Title:

CHARACTERISTICS OF TRUST IN ONLINE SOCIAL NETWORKS AND COMMUNITY OF TRUST AS A SPECIAL CASE OF ONLINE COMMUNITY

Authors:

David Zejda

Abstract: With boost of interest in Web 2.0 technologies, appropriate trust models are increasingly more important. First section the paper contains state of the art about trust characteristics, in particular multidimensionality, contextuality, scope of relevance, transitivity and asymmetry. Transitivity as a key aspect utilized in most models is described in a slightly greater detail. Discussion on scope of relevance allowed us to introduce taxonomy of trust from the scope point of view. Based on the general foundation, in the second section we introduce community of trust as a niche type of online community where users trust each other as default and where the trust loses most of its subjective flavour.
Download

Paper Nr: 77
Title:

AGMI - AN AGENT-MINING TOOL AND ITS APPLICATION TO BRAZILIAN GOVERNMENT AUDITING

Authors:

Carlos Vinícius Sarmento Silva and Célia Ghedini Ralha

Abstract: This paper presents research combining two originally separated areas increasingly interrelated: distributed multi-agent systems and data mining. In our approach, we prove the interaction features in a bilateral and complementary way, since we have defined an integrated architecture and developed a prototype, which has been used in a government auditing study case. In Brazil, government auditing is performed by the Office of the Comptroller General (CGU), where several approaches are being used to prevent and fight corruption. However, some activities such as government purchasing fraud detection are limited by the difficulty in finding effective ways to implement. Considering data mining perspective, we have used different model functions, such as clusterization and link analysis with association rules. Our approach integrating multi-agent and data mining techniques resulted in expressive discovered knowledge, which would help detection of cartels acting in public bidding processes at CGU.
Download

Paper Nr: 87
Title:

WEB-BASED SYSTEM FOR AUTOMATICALLY COLLECTING INFORMATION ABOUT LOCATIONS OF VOLUNTEER ACTIVITIES OF CITIZEN GROUPS

Authors:

Akira Hattori and Haruo Hayami

Abstract: A large number of citizen groups, many of which work in a community setting, publish information about their missions and activities on their Websites. However, it is difficult to understand where and what types of activities they do because such information is distributed throughout the Web. We show how citizen groups are currently using maps on their Websites and propose a system for automatically collecting information from theirWebsites about locations of their volunteer activities. Our system selects numerous URLs of citizen group Websites and extracts information about locations of volunteer activities from each group based on the content and structure of each page on the site. We developed and evaluated a prototype system and found that our proposed system has great potential for understanding volunteer activities of citizen groups in a local community.
Download

Paper Nr: 138
Title:

PUBLICATION AND REUSE OF OPEN GOVERNMENT DATA - A Practical Approach

Authors:

T. Cerdeña Hernández, L. M. Moreno de Antonio, F. Fumero Batista, D. Pérez Barbudo and J. L. Roda García

Abstract: Web 2.0 has changed the way that information is presented to people. Public administrations have an important role to play in this new era. Global institutions, central, regional and local governments gather and produce a wide variety of information that is potentially reusable by citizens and the digital content industry. Gov 2.0 follows the approach of giving access to open public data to citizens. Open Government Data (OGD) establishes the principles for providing public data to the public. The development of innovative applications by companies or individuals from these public records, will meet the demand of information from citizens, as well as developing the basic principles of transparency, publication and reutilization. We present a local administration experience showing the steps to implement the OGD strategy. Benefits and responsibilities of those involved in the full process are presented. We have developed a real case of publicly available data from the population census of the municipality.
Download

Area 5 - Web Intelligence

Full Papers
Paper Nr: 31
Title:

ENTROPY ON ONTOLOGY AND INDEXING IN INFORMATION RETRIEVAL

Authors:

Yevgeniy Guseynov

Abstract: In this paper, we present a formalization of an Index Assignment process that was used against documents stored in a text database. The process uses key phrases or terms from a hierarchical thesaurus or ontology and is based on the new notion of entropy on ontology for terms and their weights that is an extension of the Shannon concept of entropy in Information Theory and the Resnik semantic similarity measure for terms on ontology. Introduced notion provides a measure of closeness or semantic similarity for a set of terms in ontology and their weights and allows creation of a clustering algorithm that constructively resolves index assignment task. The algorithm was tested on 30,000 documents randomly extracted from MEDLINE biomedicine database that are manually indexed by professional indexers. The main output from experiments shows that after all 30,000 documents were processed in seven topics out of ten the presented algorithm and human indexers have the same understanding of documents.
Download

Short Papers
Paper Nr: 30
Title:

EXPLOITING THE LANGUAGE OF MODERATED SOURCES FOR CROSS-CLASSIFICATION OF USER GENERATED CONTENT

Authors:

Avaré Stewart and Wolfgang Nejdl

Abstract: Recent pandemics such as Swine Flu have caused concern for public health officials. Given the ever increasing pace at which infectious diseases can spread globally, officials must be prepared to react sooner and with greater epidemic intelligence gathering capabilities. However, state-of-the-art systems for Epidemic Intelligence have not kept the pace with the growing need for more robust public health event detection. Existing systems are limited in that they rely on template-driven approaches to extract information about public health events from human language text. In this paper, we propose a new approach to support Epidemic Intelligence. We tackle the problem of detecting relevant information from unstructured text from a statistical pattern recognition viewpoint. In doing so, we also address the problems associated with the noisy and dynamic nature of blogs by exploiting the language in moderated sources, to train a classifier for detecting victim reporting sentences in blog social media. We refer to this as Cross-Classification. Our experiments show that without using manually labeled data, and with a simple set of features, we are able to achieve a precision as high as 88% and an accuracy of 77%, comparable with the state-of-the-art approaches for the same task.
Download

Paper Nr: 39
Title:

HOW INTEGRATE STRUCTURAL DIMENSION IN RELEVANCE FEEDBACK IN XML RETRIEVAL

Authors:

Inès Kamoun Fourati, Mohamed Tmar and Abdelmajid Ben Hamadou

Abstract: Contrarily to classical information retrieval systems, the systems that treat structured documents include the structural dimension through the document and query comparison. Thus, the retrieval of relevant results means the retrieval of document fragments that match the user need rather than the whole document. So, the structure notion should be taken into account during the retrieval process as well as during the reformulation by relevance feedback way. In this paper we propose an approach of query reformulation based on structural relevance feedback. We start from the original query on one hand and the fragments judged as relevant by the user on the other. Structure hints analysis allows us to identify nodes that match the user query and to rebuild it during the relevance feedback step. The main goal of this paper is to show the impact of structural hints in XML query optimization. Some experiments have been undertaken into a dataset provided by INEX (INitiative for the Evaluation of XML retrieval, an evaluation forum that aims at promoting retrieval capabilities on XML documents) to show the effectiveness of our proposals.
Download

Paper Nr: 65
Title:

ORGANOGRAPHS - Multi-faceted Hierarchical Categorization of Web Documents

Authors:

Rodrigo Dias Arruda Senra and Claudia Bauzer Medeiros

Abstract: The data deluge of information in the Web challenges internauts to organize their references to interesting content in theWeb as well as in their private storage space off-line. Having an automatically managed personal index to content acquired from theWeb is useful for everybody, but critical to researchers and scholars. In this paper, we discuss concepts and problems related to organizing information through multi-faceted hierarchical categorization. We introduce the organograph as a mechanism to specify multiple views of how content is organized. Organographs can help scientists to automatically organize their documents along multiple axes, improving sharing and navigation through themes and concepts according to a particular research objective.
Download

Paper Nr: 115
Title:

CLASSIFYING WEB PAGES BY GENRE - Dealing with Unbalanced Distributions, Multiple Labels and Noise

Authors:

Jane E. Mason, Michael Shepherd, Jack Duffy and Vlado Kešelj

Abstract: Web page genre classification is a potentially powerful tool for filtering the results of online searches. The goal of this research is to develop an approach to the problem of Web page genre classification that is effective not only on balanced, single-label corpora, but also on unbalanced and multi-label corpora, and in the presence of noise, in order to better represent a real world environment. The approach is based on n-gram representations of the Web pages and centroid representations of the genre classes. Experimental results compare very favorably with those of other researchers.
Download

Paper Nr: 41
Title:

A SIMPLE METHOD FOR MINING AND VISUALIZING COMPANY RELATIONS BASED ON WEB SOURCES

Authors:

Maximilien Kintz and Jan Finzen

Abstract: One of the important aspects of market and competitive intelligence is the observation and analysis of a partner, customer or competitor’s relations with other companies. Using Web-based sources such as press releases, corporate Web sites or news articles and text mining technologies such as Named Entity Recognition, it is possible to automatically extract company relations out of Web content and to build network graphs showing how companies interact. Visualization software that can be integrated in a Web-based application offers means to explore, search, and analyse these networks and their meaning for a company. In this paper we demonstrate how to build a powerful company relation mining application with very little effort by effectively connecting open source toolkits.
Download

Paper Nr: 86
Title:

EVALUATION OF COLLABORATIVE FILTERING ALGORITHMS USING A SMALL DATASET

Authors:

Fabio Roda, Leo Liberti and Franco Raimondi

Abstract: In this paper we report our experience in the implementation of three collaborative filtering algorithms (user-based k-nearest neighbour, Slope One and TMW, our original algorithm) to provide a recommendation service on an existing website. We carry out the comparison by means of a typical metric, namely the accuracy (RMSE). Usually, evaluations for these kinds of algorithms are carried out using off-line analysis, withholding values from a dataset, and trying to predict them again using the remaining portion of the dataset (the so-called “leave-n-out approach”). We adopt a “live” method on an existing website: when a user rates an item, we also store in parallel the predictions of the algorithms on the same item. We got some unexpected results. In the next sections we describe the algorithms, the benchmark, the testing method, and discuss the outcome of this exercise. Our contribution is a report of the initial phase of a Recommender Systems project with a focus on some possible difficulties on the interpretation of the initial results.
Download

Paper Nr: 150
Title:

FREQUENCY OF SENTENTIAL CONTEXTS VS. FREQUENCY OF QUERY TERMS IN OPINION RETRIEVAL

Authors:

Sylvester Olubolu Orimaye, Saadat M. Alhashmi and Siew Eu-Gene

Abstract: Many opinion retrieval techniques use frequency of query terms as a measurement for detecting documents that contain opinion. However, using frequency of query terms leads to bias in context-dependent opinion retrieval such that all documents containing query terms are retrieved, regardless of contextual relevance to the intent of the human seeking the opinion. This can be described as non-contextual relevance problem in opinion retrieval systems such as Google Blogs Search and Technorati Blog Directory. Sentence-level contextual understanding and grammatical dependencies need be considered to ensure documents retrieved contain large proportion of textual contents that have the same underlying meaning with the given query instead of using frequency of individual query terms. Thus, we present specific challenges with state-of-the-art opinion retrieval techniques that rely on frequency of query terms and we propose a grammar-based technique for efficient context-dependent opinion retrieval. We believe our proposed technique can solve the non-contextual relevance problem common to opinion retrieval systems, and can be used for context-dependent retrieval such as expert search systems, faceted-opinion retrieval, opinion trend analytic, and personalized web search.
Download