University of Borås

Borås Academic Digital Archive (BADA) >
Forskningspublikationer / Research Publications >
Institutionen Handels- och IT-högskolan / School of Business and IT (HIT) >
Informatik / Informatics >
Licentiatavhandlingar / Licentiate theses (Informatics) >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2320/4976

Files in This Item:

File Description SizeFormat
Utilizing Diversity and Performance Measures in Ensemble Creation 1.0.pdfFull text1.23 MBAdobe PDFView/Open
Presentation - Utilizing Diversity and Performance Measures in Ensemble Creation 1.0.pdfPresentation429.05 kBAdobe PDFView/Open
Title: Utilizing Diversity and Performance Measures for Ensemble Creation
Authors: Löfström, Tuve
Department: University of Borås. School of Business and Informatics
Issue Date: 24-Mar-2009
Media type: text
Publication type: licentiate thesis
Keywords: ensemble learning
machine learning
diversity
artificial neural networks
data mining
information fusion
Subject Category: Subject categories::Engineering and Technology::Computer and Information Science::Computer Science::Computer Science
Subject categories::Social Sciences::Computer and Information Science::Computer and Information Science::Information Systems
Area of Research: Computer Science
Abstract: An ensemble is a composite model, aggregating multiple base models into one predictive model. An ensemble prediction, consequently, is a function of all included base models. Both theory and a wealth of empirical studies have established that ensembles are generally more accurate than single predictive models. The main motivation for using ensembles is the fact that combining several models will eliminate uncorrelated base classifier errors. This reasoning, however, requires the base classifiers to commit their errors on different instances – clearly there is no point in combining identical models. Informally, the key term diversity means that the base classifiers commit their errors independently of each other. The problem addressed in this thesis is how to maximize ensemble performance by analyzing how diversity can be utilized when creating ensembles. A series of studies, addressing different facets of the question, is presented. The results show that ensemble accuracy and the diversity measure difficulty are the two individually best measures to use as optimization criterion when selecting ensemble members. However, the results further suggest that combinations of several measures are most often better as optimization criteria than single measures. A novel method to find a useful combination of measures is proposed in the end. Furthermore, the results show that it is very difficult to estimate predictive performance on unseen data based on results achieved with available data. Finally, it is also shown that implicit diversity achieved by varied ANN architecture or by using resampling of features is beneficial for ensemble performance.
Sponsorship: This work was supported by the Information Fusion Research Program (www.infofusion.se) at the University of Skövde, Sweden, in partnership with the Swedish Knowledge Foundation under grant 2003/0104.
URI: http://hdl.handle.net/2320/4976
Appears in Collections:Licentiatavhandlingar / Licentiate theses (Informatics)

SFX Query

All items in Borås Academic Digital Archive are protected by copyright, with all rights reserved.

 

DSpace Software Copyright © 2002-2010  The DSpace Foundation