University of Borås

Borås Academic Digital Archive (BADA) >
Forskningspublikationer / Research Publications >
Institutionen Handels- och IT-högskolan / School of Business and IT (HIT) >
Informatik / Informatics >
Artiklar och rapporter / Articles and reports (Informatics) >

Please use this identifier to cite or link to this item:

Files in This Item:

There are no files associated with this item.

Title: Obtaining accurate and comprehensible classifiers using oracle coaching
Authors: Johansson, Ulf
Sönströd, Cecilia
Löfström, Tuwe
Boström, Henrik
Department: University of Borås. School of Business and IT
Issue Date: 2012
Journal Title: Intelligent Data Analysis
ISSN: 1088-467X
Volume: Volume 16
Issue: Number 2
Pages: 247-263
Publisher: IOS Press
Media type: text
Publication type: article, peer reviewed scientific
Keywords: Classification
Decision trees
Decision lists
Oracle coaching
Subject Category: Subject categories::Engineering and Technology::Computer and Information Science::Computer Science
Subject categories::Social Sciences::Computer and Information Science::Computer and Information Science::Computer Science
Research Group: CSL@BS
Area of Research: Machine learning
Data mining
Strategic Research Area: Business and IT
Abstract: While ensemble classifiers often reach high levels of predictive performance, the resulting models are opaque and hence do not allow direct interpretation. When employing methods that do generate transparent models, predictive performance typically has to be sacrificed. This paper presents a method of improving predictive performance of transparent models in the very common situation where instances to be classified, i.e., the production data, are known at the time of model building. This approach, named oracle coaching, employs a strong classifier, called an oracle, to guide the generation of a weaker, but transparent model. This is accomplished by using the oracle to predict class labels for the production data, and then applying the weaker method on this data, possibly in conjunction with the original training set. Evaluation on 30 data sets from the UCI repository shows that oracle coaching significantly improves predictive performance, measured by both accuracy and area under ROC curve, compared to using training data only. This result is shown to be robust for a variety of methods for generating the oracles and transparent models. More specifically, random forests and bagged radial basis function networks are used as oracles, while J48 and JRip are used for generating transparent models. The evaluation further shows that significantly better results are obtained when using the oracle-classified production data together with the original training data, instead of using only oracle data. An analysis of the fidelity of the transparent models to the oracles shows that performance gains can be expected from increasing oracle performance rather than from increasing fidelity. Finally, it is shown that further performance gains can be achieved by adjusting the relative weights of training data and oracle data.
DOI: 10.3233/IDA-2012-0522
Sustainable development: -
Appears in Collections:Artiklar och rapporter / Articles and reports (Informatics)

SFX Query

All items in Borås Academic Digital Archive are protected by copyright, with all rights reserved.


DSpace Software Copyright © 2002-2010  The DSpace Foundation