|
Borås Academic Digital Archive (BADA) >
Forskningspublikationer / Research Publications >
Institutionen Handels- och IT-högskolan / School of Business and IT (HIT) >
Informatik / Informatics >
Licentiatavhandlingar / Licentiate theses (Informatics) >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2320/4976
|
| Title: | Utilizing Diversity and Performance Measures for Ensemble Creation |
| Authors: | Löfström, Tuve |
| Department: | University of Borås. School of Business and Informatics |
| Issue Date: | 24-Mar-2009 |
| Media type: | text |
| Publication type: | licentiate thesis |
| Keywords: | ensemble learning machine learning diversity artificial neural networks data mining information fusion |
| Subject Category: | Subject categories::Engineering and Technology::Computer and Information Science::Computer Science::Computer Science Subject categories::Social Sciences::Computer and Information Science::Computer and Information Science::Information Systems |
| Area of Research: | Computer Science |
| Abstract: | An ensemble is a composite model, aggregating multiple base models into one
predictive model. An ensemble prediction, consequently, is a function of all
included base models. Both theory and a wealth of empirical studies have
established that ensembles are generally more accurate than single predictive
models. The main motivation for using ensembles is the fact that combining
several models will eliminate uncorrelated base classifier errors. This reasoning,
however, requires the base classifiers to commit their errors on different instances
– clearly there is no point in combining identical models. Informally, the key term
diversity means that the base classifiers commit their errors independently of each
other. The problem addressed in this thesis is how to maximize ensemble
performance by analyzing how diversity can be utilized when creating ensembles.
A series of studies, addressing different facets of the question, is presented. The
results show that ensemble accuracy and the diversity measure difficulty are the
two individually best measures to use as optimization criterion when selecting
ensemble members. However, the results further suggest that combinations of
several measures are most often better as optimization criteria than single
measures. A novel method to find a useful combination of measures is proposed
in the end. Furthermore, the results show that it is very difficult to estimate
predictive performance on unseen data based on results achieved with available
data. Finally, it is also shown that implicit diversity achieved by varied ANN
architecture or by using resampling of features is beneficial for ensemble
performance. |
| Sponsorship: | This work was supported by the Information Fusion Research Program (www.infofusion.se) at the University of Skövde, Sweden, in partnership with the Swedish Knowledge Foundation under grant 2003/0104. |
| URI: | http://hdl.handle.net/2320/4976 |
| Appears in Collections: | Licentiatavhandlingar / Licentiate theses (Informatics)
|
All items in Borås Academic Digital Archive are protected by copyright, with all rights reserved.
|