Rule-based explanations for manufacturing

Project Synopsis
Title:Rule-based explanations based on ensemble machine learning for detecting sink mark defects in the injection moulding process
Highlights:
  • Ensemble learning models can predict the quality in the injection molding process.
  • Rule-based explanations are developed for interpreting the ensemble models.
  • The method generates decision rules and visualizes them with PDP and ICE plots.
Period: August 2020- July 2021
Outputs:
Skills
&
Technologies:
  • Ensemble learning
  • Explainable Machine Learning
  • Injection Molding Process
  • Academic writing

Project details

Manufacturing quality control (QC) in plastic injection moulding is of the upmost importance since almost one third of plastic products are manufactured via the injection moulding process. Moreover, smart manufacturing technologies are enabling the generation of huge amounts of data in production lines. This data can be used for predicting the quality of manufactured plastic products using machine learning methods, allowing companies to save costs and improve their production efficiency. However, high-performance machine learning models are usually too complicated to be understood by human intuition. Therefore, we have introduced a rule-based explanations (RBE) framework that combines several machine learning interpretation methods to help to understand the decision mechanisms of accurate and complex predictive models – specifically tree ensemble models. These generated rules can be used to visually and easily understand the main factors that affect the quality in the
manufacturing process. To demonstrate the applicability of RBE, we present two experiments with real industrial data gathered from a plastic injection moulding machine in a Singapore model factory. The collected datasets
contain condition data for several manufacturing processes as well as the QC results for sink mark defects in the production of small plastic products. The experiments revealed that it is possible to extract meaningful explanations in the form of simple decision rules that are enhanced with partial dependence plots and feature importance rankings for a better understanding of the underlying mechanisms and data relationships of accurate tree ensembles.

Figure 1. Plastic parts obtained using the injection molding process.

Machine learning explainability for tree ensembles

Project Synopsis
Title:Rule extraction for tree ensembles
Highlights:
  • Development of post-hoc explainability method for accurate tree ensemble models
  • The size of the models is significantly reduced without decreasing the model performance
Period: 2018 – Present
Outputs:
Skills
&
Technologies:
  • Ensemble learning
  • Rule-based classification
  • Explainable Machine Learning
  • Software engineering
  • Academic writing

Project details

Nowadays, Machine Learning is widely used in practical applications for solving problems that require predictive analytics. Several new methods are constantly presented in the field, incrementally improving the performance of the older models. However, the improvement in predictive performance usually comes with an increment in the model complexity, making the decision mechanisms of the models difficult to be understood by human intuition. Therefore, the purpose of this research project was to increase the interpretability of tree ensembles for classification, as it is shown in Figure 1.

Figure 1. Trade off between accuracy and interpretability of machine learning models.

Interpretability is the degree to which a human can understand the cause of a decision

Miller, 2017

For this purpose, RuleCOSI (Rule COmbination and SImplification) a novel heuristic method that extracts, combines and simplifies decision rules from ensembles was presented. The initial algorithm was published in this academic paper [1] in 2019. My research evolved since then and it was the main topic of my doctoral dissertation in 2020. In this short post I introduce the main characteristics of the algorithm and show a small example result.

RuleCOSI algorithm

The algorithm has three basic steps as it is depicted in Figure 2.

Figure 2. Overview of the algorithm

The first step is to extract a ruleset from each of the trees forming the tree ensemble. this is done with a simple procedure in which a rule is created from the paths from the node root in the tree to each of the leaf nodes.

The second step is to make a combination of all the feature space of the rules. The final step is to generalize and simplify the rules based on pessimistic error.

The output of the algorithm is a single set of decision rules that are much simpler and have a similar performance to that of the tree ensemble.

RuleCOSI is able to handle two types of tree ensembles: Boosting and Bagging. The python library can work with several implementations of this ensemble types, such as Random Forests, XGBoost, CatBoost and Light GBM.

Example

Here I present a example using the the UCI steel plates faults dataset. The dataset contains 27 indicators that approximately describe the geometric shape of the defect and its outline. The task is to classify the type of surface defect. Because RuleCOSI can only work with binary classification problems, I considered for this example the dirtiness fault type.

Figure 3. Steel plate

The first step is to train a tree ensemble. In this case I trained an XGBoost ensemble with 50 trees. The F-measure of this model is 0.9958. The first 10 trees are shown in Figure 4.

Figure 4. First 10 trees of the XGBoost model for classifying dirtiness fault type in the steel pleats faults dataset.

After applying RuleCOSI to the result, the simplified ruleset has just 7 rules, with an F-measure value of 0.9926. The generated rules are presented in Figure 5.

Figure 5. Combined and simplified ruleset obtained from the XGBoost tree ensemble from Figure 4.

Python library

The library was implemented as a python package available in GitHub. The documentation of the library is available in this link.

Conclusions

Tree ensembles are widely used methods used for improving classification performance in many domains, including fault detection using manufacturing data. However the complexity of the ensembles makes it very hard to be interpreted by humans.

The results of RuleCOSI were satisfactory in improving the interpretability of tree ensembles without decreasing its classification performance.

References

[1] Obregon, J., Kim, A., & Jung, J. Y. (2019). RuleCOSI: Combination and simplification of production rules from boosted decision trees for imbalanced classification. Expert Systems with Applications, 126, 64-82.