Project Synopsis
Title:Rule extraction for tree ensembles
  • Development of post-hoc explainability method for accurate tree ensemble models
  • The size of the models is significantly reduced without decreasing the model performance
Period: 2018 – Present
  • Ensemble learning
  • Rule-based classification
  • Explainable Machine Learning
  • Software engineering
  • Academic writing

Project details

Nowadays, Machine Learning is widely used in practical applications for solving problems that require predictive analytics. Several new methods are constantly presented in the field, incrementally improving the performance of the older models. However, the improvement in predictive performance usually comes with an increment in the model complexity, making the decision mechanisms of the models difficult to be understood by human intuition. Therefore, the purpose of this research project was to increase the interpretability of tree ensembles for classification, as it is shown in Figure 1.

Figure 1. Trade off between accuracy and interpretability of machine learning models.

Interpretability is the degree to which a human can understand the cause of a decision

Miller, 2017

For this purpose, RuleCOSI (Rule COmbination and SImplification) a novel heuristic method that extracts, combines and simplifies decision rules from ensembles was presented. The initial algorithm was published in this academic paper [1] in 2019. My research evolved since then and it was the main topic of my doctoral dissertation in 2020. In this short post I introduce the main characteristics of the algorithm and show a small example result.

RuleCOSI algorithm

The algorithm has three basic steps as it is depicted in Figure 2.

Figure 2. Overview of the algorithm

The first step is to extract a ruleset from each of the trees forming the tree ensemble. this is done with a simple procedure in which a rule is created from the paths from the node root in the tree to each of the leaf nodes.

The second step is to make a combination of all the feature space of the rules. The final step is to generalize and simplify the rules based on pessimistic error.

The output of the algorithm is a single set of decision rules that are much simpler and have a similar performance to that of the tree ensemble.

RuleCOSI is able to handle two types of tree ensembles: Boosting and Bagging. The python library can work with several implementations of this ensemble types, such as Random Forests, XGBoost, CatBoost and Light GBM.


Here I present a example using the the UCI steel plates faults dataset. The dataset contains 27 indicators that approximately describe the geometric shape of the defect and its outline. The task is to classify the type of surface defect. Because RuleCOSI can only work with binary classification problems, I considered for this example the dirtiness fault type.

Figure 3. Steel plate

The first step is to train a tree ensemble. In this case I trained an XGBoost ensemble with 50 trees. The F-measure of this model is 0.9958. The first 10 trees are shown in Figure 4.

Figure 4. First 10 trees of the XGBoost model for classifying dirtiness fault type in the steel pleats faults dataset.

After applying RuleCOSI to the result, the simplified ruleset has just 7 rules, with an F-measure value of 0.9926. The generated rules are presented in Figure 5.

Figure 5. Combined and simplified ruleset obtained from the XGBoost tree ensemble from Figure 4.

Python library

The library was implemented as a python package available in GitHub. The documentation of the library is available in this link.


Tree ensembles are widely used methods used for improving classification performance in many domains, including fault detection using manufacturing data. However the complexity of the ensembles makes it very hard to be interpreted by humans.

The results of RuleCOSI were satisfactory in improving the interpretability of tree ensembles without decreasing its classification performance.


[1] Obregon, J., Kim, A., & Jung, J. Y. (2019). RuleCOSI: Combination and simplification of production rules from boosted decision trees for imbalanced classification. Expert Systems with Applications, 126, 64-82.