rulecosi package

Module contents

class rulecosi.Condition(att_index, op, value, att_name=None)

Bases: object

Class representing a Rule Condition.

A condition is an evaluation of an operator with a value. The operator could be any of the ones contained in op_dict = {‘eq’: ‘=’, ‘gt’: ‘>’, ‘lt’: ‘<’, ‘ge’: ‘≥’, ‘le’: ‘≤’, ‘ne’: ‘!=’}

satisfies(value)

Evaluates if the condition is satisfied or not using the provided value.

Parameters

value – int The value used for evaluating the condition

Returns

boolean True if the condition is satisfied and False otherwise

satisfies_array(arr)

Evaluates if the condition is satisfied or not for all the records in the provided array.

It applies the operator to the values in arr of the column equal to the index of the attribute of this condition and returns an array of bool

Parameters

arr – array-like, shape (n_samples, n_features) The input samples.

Returns

array of booleans Array of booleans denoting if the

condition was satisfied on each of the elements of arr

class rulecosi.Rule(conditions, class_dist=None, logit_score=None, y=None, y_class_index=None, n_samples=None, n_outputs=1, classes=None, weight=0)

Bases: object

Represents a single rule wich has the form r: A -> y.

A is a set of conditions also called body of the rule. y is the predicted class, also called head of the rule.

add_condition(condition)

Add a condition to the rule

Parameters

condition – Condition to be added

get_condition(att_index)

Returns the condition belonging to the att_index

Parameters

att_index – int The attribute index that wants to be retrieved.

Returns

If the attribute exists, returns the attribute, otherwise

returns None

predict(X, condition_map, proba=False)

Make predictions for the input values in X.

Parameters
  • X – X : array-like, shape (n_samples, n_features) The input samples.

  • condition_map – dictionary of <condition_id, Condition>

Dictionary of Conditions used in the ruleset. condition_id is an integer uniquely identifying the Condition.

Parameters

proba – boolean, default=False Determines if the predictions

are the target values or target probability values.

Returns

(y_pred, covered_mask): tuple containing: 0 - array-like, shape (n_samples,) or ndarray of shape

(n_samples, n_classes)

1 - array of booleans containing the mask of covered masks by

this RuleSet

set_heuristics(heuristics_dict)

Set the heuristics of this rule contained in the heuristics dictionary.

Parameters

heuristics_dict – A dictionary containing the heuristics values

for this Rule

class rulecosi.RuleCOSIClassifier(base_ensemble=None, n_estimators=5, tree_max_depth=3, cov_threshold=0.0, conf_threshold=0.5, min_samples=1, early_stop=0.3, metric='gmean', column_names=None, random_state=None, rule_order='cov', verbose=0)

Bases: sklearn.base.ClassifierMixin, rulecosi._rulecosi.BaseRuleCOSI

Tree ensemble Rule COmbiantion and SImplification algorithm for classification

RuleCOSI extract, combines and simplify rules from a variety of tree ensembles and then constructs a single rule-based model that can be used for classification [1]. The ensemble is simpler and have a similar classification performance compared than that of the original ensemble. Currently only accept binary classification (March 2021)

Parameters
  • base_ensemble (BaseEnsemble object, default = None) –

    A BaseEnsemble estimator object. The supported types are: - sklearn.ensemble.RandomForestClassifier - sklearn.ensemble.BaggingClassifier - sklearn.ensemble.GradientBoostingClassifier - xgboost.XGBClassifier - catboost.CatBoostClassifier - lightgbm.LGBMClassifier

    If the estimator is already fitted, then the parameters n_estimators and max_depth used for fitting the ensemble are used for the combination process. If the estimator is not fitted, then the estimator will be first fitted using the provided parameters in the RuleCOSI object. Default value is None, which uses a sklearn.ensemble.GradientBoostingClassifier ensemble.

  • n_estimators (int, default=5) – The number of estimators used for fitting the ensemble, if it is not already fitted.

  • tree_max_depth (int, default=3) – The maximum depth of the individual tree estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.

  • cov_threshold (float, default=0.0) – Coverage threshold of a rule to be considered for further combinations. The greater the value the more rules are discarded. Default value is 0.0, which it only discards rules with null coverage.

  • conf_threshold (float, default=0.5) – Confidence or rule accuracy threshold of a rule to be considered for further combinations. The greater the value, the more rules are discarded. Rules with high confidence are accurate rules. Default value is 0.5, which represents rules with higher than random guessing accuracy.

  • min_samples (int, default=1) – The minimum number of samples required to be at rule in the simplified ruleset.

  • early_stop (float, default=0.30) – This parameter allows the algorithm to stop if a certain amount of iterations have passed without improving the metric. The amount is obtained from the truncated integer of n_estimators * ealry_stop.

  • metric (string, default='gmean') –

    Metric that is optimized in the combination process. The default is gmean because the algorithm was developed specially for imbalanced classification problems. Other accepted measures are:

    • ’f1’ for F-measure

    • ’roc_auc’ for AUC under the ROC curve

    • ’accuracy’ for Accuracy

  • column_names (array of string, default=None) – Array of strings with the name of the columns in the data. This is useful for displaying the name of the features in the generated rules.

  • random_state (int, RandomState instance or None, default=None) – Controls the random seed given to the ensembles when trained. RuleCOSI does not have any random process, so it affects only the ensemble training.

  • rule_order (string, default 'cov') – Defines the way in the rules are ordered on each iteration. ‘cov’ order the rules first by coverage and ‘conf’ order the rules first by confidence or rule accuracy. This parameter affects the combination process and can be chosen conveniently depending the desired results.

  • verbose (int, default=0) –

    Controls the output of the algorithm during the combination process. It

    can have the following values:

    • 0 is silent

    • 1 output only the main stages of the algorithm

    • 2 output information for each iteration

X_

The input passed during fit().

Type

ndarray, shape (n_samples, n_features)

y_

The labels passed during fit().

Type

ndarray, shape (n_samples,)

classes_

The classes seen at fit().

Type

ndarray, shape (n_classes,)

original_rulesets_

The original rulesets extracted from the base ensemble.

Type

array of RuleSet, shape (n_estimators,)

simplified_ruleset_

Combined and simplified ruleset extracted from the base ensemble.

Type

RuleSet

n_combinations_

Number of rule-level combinations performed by the algorithm.

Type

int

combination_time_

Time spent for the combination and simplification process

Type

float

ensemble_training_time_

Time spent for the ensemble training. If the ensemble was already trained, this is 0.

Type

float

References

1

Obregon, J., Kim, A., & Jung, J. Y., “RuleCOSI: Combination and simplification of production rules from boosted decision trees for imbalanced classification”, 2019.

Examples

>>> from sklearn.ensemble import GradientBoostingClassifier
>>> from sklearn.datasets import make_classification
>>> from rulecosi import RuleCOSIClassifier
>>> X, y = make_classification(n_samples=1000, n_features=4,
...                            n_informative=2, n_redundant=0,
...                            random_state=0, shuffle=False)
>>> clf = RuleCOSIClassifier(base_ensemble=GradientBoostingClassifier(),
...                          n_estimators=100, random_state=0)
>>> clf.fit(X, y)
RuleCOSIClassifier(base_ensemble=GradientBoostingClassifier(),
                   n_estimators=100, random_state=0)
>>> clf.predict([[0, 0, 0, 0]])
array([1])
>>> clf.score(X, y)
0.966...
fit(X, y, sample_weight=None)

Combine and simplify the decision trees from the base ensemble and builds a rule-based classifier using the training set (X,y)

Parameters
  • X (array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,)) – The target values. An array of int.

  • sample_weight (Currently this is not supported and it is here just for) –

  • reasons (compatibility) –

Returns

self

Return type

object

predict(X)

Predict classes for X.

The predicted class of an input sample. The prediction use the simplified ruleset and evaluate the rules one by one. When a rule covers a sample, the head of the rule is returned as predicted class.

Parameters

X (array-like, shape (n_samples, n_features)) – The input samples.

Returns

y – The predicted class. The class with the highest value in the class distribution of the fired rule.

Return type

ndarray, shape (n_samples,)

predict_proba(X)

Predict class probabilities for X.

The predicted class probabilities of an input sample is obtained from the class distribution of the fired rule.

Parameters

X (array-like, shape (n_samples, n_features)) – The input samples.

Returns

p – The class probabilities of the input samples. The order of outputs is the same of that of the classes_ attribute.

Return type

ndarray of shape (n_samples, n_classes)

class rulecosi.RuleSet(rules=None, condition_map=None, ruleset=None)

Bases: object

A set of ordered rules that can be used to make predictions.

Parameters
  • rules (array of Rules, default=None) – Rules belonging to the ruleset

  • condition_map (dictionary of <condition_id, Condition>, default=None) –

  • of Conditions used in the ruleset. condition_id is an integer (Dictionary) –

  • identifying the Condition. (uniquely) –

  • ruleset (Ruleset) – If different than None, copy that ruleset properties into this object

compute_all_classification_performance(X, y_true)

Compute all the classification performance measures of this RuleSet

:param Xarray-like, shape (n_samples, n_features)

The input samples.

Parameters

y_true – array-like, shape (n_samples,) The real target value

compute_classification_performance(X, y_true, metric='gmean')

Compute the classification performance measures of this RuleSet

:param Xarray-like, shape (n_samples, n_features)

The input samples.

Parameters
  • y_true – array-like, shape (n_samples,) The real target value

  • metric

    string, default=’gmean’

    Metric that is computed for this RuleSet. Other accepted measures are:

    • ’f1’ for F-measure

    • ’roc_auc’ for AUC under the ROC curve

    • ’accuracy’ for Accuracy

compute_interpretability_measures()

Compute the following interpretability measures:

  • n_rules: number of rules

  • n_uniq_ant: number of unique antecedents or conditions

  • n_total_ant: total number of antecedents or conditions

metric(metric='gmean')

Return the metric value of this RuleSet

Parameters

metric

string, default=’gmean’ Other accepted measures are:

  • ’f1’ for F-measure

  • ’roc_auc’ for AUC under the ROC curve

  • ’accuracy’ for Accuracy

predict(X)

Make predictions for the input values in X.

Parameters

X – X : array-like, shape (n_samples, n_features) The input samples.

Returns

y_pred: array-like, shape (n_samples,) The predicted target values

predict_proba(X)

Make probability predictions for the input values in X.

Parameters

X – X : array-like, shape (n_samples, n_features) The input samples.

Returns

ndarray of shape (n_samples, n_classes) The class probabilities of the input samples.

print_rules(return_object=None, heuristics_digits=4, condition_digits=3)

Print the rules in a string format. It can also return an object containing the rules and its heuristics

Parameters
  • return_object

    string, default=None Indicates if the rules should be returned in an object. Possible values are: - ‘string’: it returns a string containing the rules in a readable

    format

    • ’dataframe’: returns a pandas.DataFrame object containing

      the rules

  • heuristics_digits – number of decimal digits to be displayed in the

heuristics of the rules

Parameters

condition_digits – number of decimal digits to be displayed in the

conditions of the rules

Returns

str or pandas.DataFrame

prune_condition_map()

Prune the condition map in this ruleset to contain only the conditions present in the rules

rulecosi.rule_extraction module

This module contains the functions used for extracting the rules for different type of base ensembles.

The module structure is the following:

  • The BaseRuleExtractor base class implements a common get_base_ruleset and recursive_extraction method for all the extractors in the module.

    • rule_extraction.DecisionTreeRuleExtractor implements rule

      extraction from a single decision tree

    • rule_extraction.ClassifierRuleExtractor implements rule

      extraction from a classifier Ensembles such as Bagging and Random Forests

    • rule_extraction.GBMClassifierRuleExtractor implements rule

      extraction from sklearn GBM classifier and works as base class for the other GBM implementations

      • rule_extraction.XGBClassifierExtractor implements rule

        extraction from XGBoost classifiers

      • rule_extraction.LGBMClassifierExtractor implements rule

        extraction from Light GBM classifiers

      • rule_extraction.CatBoostClassifierExtractor implements rule

        extraction from CatBoost classifiers

class rulecosi.rule_extraction.BaseRuleExtractor(_ensemble, _column_names, classes_, X_)

Bases: object

Base abstract class for a rule extractor from tree ensembles

abstract create_new_rule(node_index, tree_dict, condition_set=None, logit_score=None, weights=None, tree_index=None)

Creates a new rule with all the information in the parameters

Parameters
  • node_index – the index of the leaf node

  • tree_dict – a dictionary containing the information of the base_tree (arrays on:class: sklearn.tree.Tree class

  • condition_set – set of rulecosi.rule.Condition objects of the new rule

  • logit_score – logit_score of the rule (only applies for Gradient Boosting Trees)

  • weights – weight of the new rule

  • tree_index – index of the tree inside the ensemble

Returns

a rulecosi.rules.Rule object

abstract extract_rules()

Main method for extracting the rules of tree ensembles

Returns

an array of :class:`rulecosi.rules.RuleSet’

get_base_ruleset(tree_dict, class_index=None, condition_map=None, tree_index=None)
Parameters

tree_dict – a dictionary containing the information of the

base_tree (arrays on :class: sklearn.tree.Tree class

Parameters

class_index – Right now is not used but it will be used

when multiclass is supported

Parameters

condition_map – dictionary of <condition_id, Condition>,

default=None. Dictionary of Conditions extracted from all the ensembles.condition_id is an integer uniquely identifying the Condition.

Parameters

tree_index – index of the tree in the ensemble

Returns

a :class:`rulecosi.rules.RuleSet’ object

get_split_operators()

Return the operator applied for the left and right branches of the tree. This function is needed because different implementations of trees use different operators for the children nodes.

Returns

a tuple containing the left and right operator used for

creating conditions

get_tree_dict(base_tree, n_nodes=0)

Create a dictionary with the information inside the base_tree

Parameters
  • base_tree

    class

    sklearn.tree.Tree object which is an array

    representation of a tree

  • n_nodes – number of nodes in the tree

Returns

a dictionary containing the information of the base_tree

recursive_extraction(tree_dict, tree_index=0, node_index=0, condition_map=None, condition_set=None)

Recursive function for extracting a ruleset from a tree

Parameters

tree_dict – a dictionary containing the information of the

base_tree (arrays on :class: sklearn.tree.Tree class

Parameters
  • tree_index – index of the tree in the ensemble

  • node_index – the index of the leaf node

  • condition_map – condition_map: dictionary of <condition_id,

Condition>, default=None Dictionary of Conditions extracted from all the ensembles. condition_id is an integer uniquely identifying the Condition.

Parameters

condition_set – set of rulecosi.rule.Condition objects

Returns

array of rulecosi.rules.Rule objects

class rulecosi.rule_extraction.CatBoostClassifierExtractor(_ensemble, _column_names, classes_, X_)

Bases: rulecosi.rule_extraction.GBMClassifierRuleExtractor

Rule extraction for a Gradient Boosting Tree ensemble classifier. This class accept only CatBoost implementation

Parameters

base_ensemble (BaseEnsemble object, default = None) –

A BaseEnsemble estimator object. The supported types are:
  • catboost.CatBoostClassifier

column_names: array of string, default=None Array of strings with the name of the columns in the data. This is useful for displaying the name of the features in the generated rules.

classes: ndarray, shape (n_classes,)

The classes seen when fitting the ensemble.

X: array-like, shape (n_samples, n_features)

The training input samples.

extract_rules()

Main method for extracting the rules of tree ensembles

Returns

an array of :class:`rulecosi.rules.RuleSet’

get_tree_dict(base_tree, n_nodes=0)

Create a dictionary with the information inside the base_tree

Parameters

base_tree

class

sklearn.tree.Tree object wich is an array

representation of a tree

Parameters

n_nodes – number of nodes in the tree

Returns

a dictionary conatining the information of the base_tree

class rulecosi.rule_extraction.ClassifierRuleExtractor(_ensemble, _column_names, classes_, X_)

Bases: rulecosi.rule_extraction.BaseRuleExtractor

Rule extraction of a tree ensemble classifier such as Bagging or Random Forest

Parameters

base_ensemble (BaseEnsemble object, default = None) – A BaseEnsemble estimator object. The supported types are: - sklearn.ensemble.RandomForestClassifier - sklearn.ensemble.BaggingClassifier

column_names: array of string, default=None Array of strings with the name of the columns in the data. This is useful for displaying the name of the features in the generated rules.

classes: ndarray, shape (n_classes,)

The classes seen when fitting the ensemble.

X: array-like, shape (n_samples, n_features)

The training input samples.

create_new_rule(node_index, tree_dict, condition_set=None, logit_score=None, weights=None, tree_index=None)

Creates a new rule with all the information in the parameters

Parameters
  • node_index – the index of the leaf node

  • tree_dict – a dictionary containing the information of the

base_tree (arrays on :class: sklearn.tree.Tree class

Parameters

condition_set – set of rulecosi.rule.Condition objects

of the new rule

Parameters

logit_score – logit_score of the rule (only applies for

Gradient Boosting Trees)

Parameters
  • weights – weight of the new rule

  • tree_index – index of the tree inside the ensemble

Returns

a rulecosi.rules.Rule object

extract_rules()

Main method for extracting the rules of tree ensembles

Returns

an array of :class:`rulecosi.rules.RuleSet’

class rulecosi.rule_extraction.DecisionTreeRuleExtractor(_ensemble, _column_names, classes_, X_)

Bases: rulecosi.rule_extraction.BaseRuleExtractor

Rule extraction of a single decision tree classifier

Parameters
  • base_ensemble (Parameter kept just for compatibility with the other classes) –

  • column_names (array of string, default=None Array of strings with the) –

  • of the columns in the data. This is useful for displaying the name (name) –

  • the features in the generated rules. (of) –

  • classes (ndarray, shape (n_classes,)) – The classes seen when fitting the ensemble.

  • X (array-like, shape (n_samples, n_features)) – The training input samples.

create_new_rule(node_index, tree_dict, condition_set=None, logit_score=None, weights=None, tree_index=None)

Creates a new rule with all the information in the parameters

Parameters
  • node_index – the index of the leaf node

  • tree_dict – a dictionary containing the information of the

base_tree (arrays on :class: sklearn.tree.Tree class

Parameters

condition_set – set of rulecosi.rule.Condition objects

of the new rule

Parameters

logit_score – logit_score of the rule (only applies for

Gradient Boosting Trees)

Parameters
  • weights – weight of the new rule

  • tree_index – index of the tree inside the ensemble

Returns

a rulecosi.rules.Rule object

extract_rules()

Main method for extracting the rules of tree ensembles

Returns

an array of :class:`rulecosi.rules.RuleSet’

class rulecosi.rule_extraction.GBMClassifierRuleExtractor(_ensemble, _column_names, classes_, X_)

Bases: rulecosi.rule_extraction.BaseRuleExtractor

Rule extraction for a Gradient Boosting Tree ensemble classifier. This class accept just sklearn GBM implementation.

Parameters

base_ensemble (BaseEnsemble object, default = None) – A BaseEnsemble estimator object. The supported types are: - sklearn.ensemble.GradientBoostingClassifier

column_names: array of string, default=None Array of strings with the name of the columns in the data. This is useful for displaying the name of the features in the generated rules.

classes: ndarray, shape (n_classes,)

The classes seen when fitting the ensemble.

X: array-like, shape (n_samples, n_features)

The training input samples.

create_new_rule(node_index, tree_dict, condition_set=None, logit_score=None, weights=None, tree_index=None)

Creates a new rule with all the information in the parameters

Parameters
  • node_index – the index of the leaf node

  • tree_dict – a dictionary containing the information of the

base_tree (arrays on :class: sklearn.tree.Tree class

Parameters

condition_set – set of rulecosi.rule.Condition objects

of the new rule

Parameters

logit_score – logit_score of the rule (only applies for

Gradient Boosting Trees)

Parameters
  • weights – weight of the new rule

  • tree_index – index of the tree inside the ensemble

Returns

a rulecosi.rules.Rule object

extract_rules()

Main method for extracting the rules of tree ensembles

Returns

an array of :class:`rulecosi.rules.RuleSet’

class rulecosi.rule_extraction.LGBMClassifierExtractor(_ensemble, _column_names, classes_, X_)

Bases: rulecosi.rule_extraction.GBMClassifierRuleExtractor

Rule extraction for a Gradient Boosting Tree ensemble classifier. This class accept only Light GBM implementation

Parameters

base_ensemble (BaseEnsemble object, default = None) –

A BaseEnsemble estimator object. The supported types are:
  • lightgbm.LGBMClassifier

column_names: array of string, default=None Array of strings with the name of the columns in the data. This is useful for displaying the name of the features in the generated rules.

classes: ndarray, shape (n_classes,)

The classes seen when fitting the ensemble.

X: array-like, shape (n_samples, n_features)

The training input samples.

extract_rules()

Main method for extracting the rules of tree ensembles

Returns

an array of :class:`rulecosi.rules.RuleSet’

get_tree_dict(base_tree, n_nodes=0)

Create a dictionary with the information inside the base_tree

Parameters

base_tree

class

sklearn.tree.Tree object wich is an array

representation of a tree

Parameters

n_nodes – number of nodes in the tree

Returns

a dictionary conatining the information of the base_tree

class rulecosi.rule_extraction.RuleExtractorFactory

Bases: object

Factory class for getting an implementation of a BaseRuleExtractor

get_rule_extractor(column_names, classes, X)
Parameters
  • base_ensemble

    BaseEnsemble object, default = None

    A BaseEnsemble estimator object. The supported types are:

    • sklearn.ensemble.RandomForestClassifier

    • sklearn.ensemble.BaggingClassifier

    • sklearn.ensemble.GradientBoostingClassifier

    • xgboost.XGBClassifier

    • catboost.CatBoostClassifier

    • lightgbm.LGBMClassifier

  • column_names – array of string, default=None Array of strings

with the name of the columns in the data. This is useful for displaying the name of the features in the generated rules.

Parameters
  • classes – ndarray, shape (n_classes,) The classes seen when fitting the ensemble.

  • X – array-like, shape (n_samples, n_features) The training input samples.

Returns

A BaseRuleExtractor class implementation instantiated object

to be used for extracting rules from trees

class rulecosi.rule_extraction.XGBClassifierExtractor(_ensemble, _column_names, classes_, X_)

Bases: rulecosi.rule_extraction.GBMClassifierRuleExtractor

Rule extraction for a Gradient Boosting Tree ensemble classifier. This class accept only XGB implementation

Parameters
  • base_ensemble (BaseEnsemble object, default = None) –

    A BaseEnsemble estimator object. The supported types are:
    • xgboost.XGBClassifier

  • column_names (array of string, default=None Array of strings with the) –

  • of the columns in the data. This is useful for displaying the name (name) –

  • the features in the generated rules. (of) –

  • classes (ndarray, shape (n_classes,)) – The classes seen when fitting the ensemble.

  • X (array-like, shape (n_samples, n_features)) – The training input samples.

extract_rules()

Main method for extracting the rules of tree ensembles

Returns

an array of :class:`rulecosi.rules.RuleSet’

get_split_operators()

Return the operator applied for the left and right branches of the tree. This function is needed because different implementations of trees use different operators for the children nodes.

Returns

a tuple containing the left and right operator used for

creating conditions

get_tree_dict(base_tree, n_nodes=0)

Create a dictionary with the information inside the base_tree

Parameters

base_tree

class

sklearn.tree.Tree object wich is an array

representation of a tree

Parameters

n_nodes – number of nodes in the tree

Returns

a dictionary conatining the information of the base_tree

rulecosi.rule_heuristics module

Class used for for measuring the heuristics of the rules

class rulecosi.rule_heuristics.RuleHeuristics(X, y, classes_, condition_map, cov_threshold=0.0, conf_threshold=0.5, min_samples=1)

Bases: object

This class controls the computation of heuristics of the rules.

For fast computation we use the bitarray class. At the beginning, an N-size bitarray for each condition is computed, with N=n_samples. This array contains 1 if the record was satisfied by the condition and 0 otherwise. When a combination is performed, this bitarray are combined using the intersection set operation to find out how many records are covered by the new rule (which is a combination of conditions). Additionally, there are two extra bitarrays, one covering each of the classes (right now it jus support binary class). The cardinality of all these bitarrays are used to compute the coverage and confidence of the rules very fast.

Parameters

Xarray-like, shape (n_samples, n_features)

The training input samples.

yarray-like, shape (n_samples,)

The target values. An array of int.

condition_map: dictionary of <condition_id, Condition>, default=None

Dictionary of Conditions extracted from all the ensembles. condition_id is an integer uniquely identifying the Condition.

classesndarray, shape (n_classes,)

The classes seen in the ensemble fit method.

cov_threshold: float, default=0.0

Coverage threshold of a rule to be considered for further combinations. The greater the value the more rules are discarded. Default value is 0.0, which it only discards rules with null coverage.

conf_threshold: float, default=0.5

Confidence or rule accuracy threshold of a rule to be considered for further combinations. The greater the value, the more rules are discarded. Rules with high confidence are accurate rules. Default value is 0.5, which represents rules with higher than random guessing accuracy.

min_samples: int, default=1

The minimum number of samples required to be at rule in the simplified ruleset.

compute_rule_heuristics(ruleset, uncovered_mask=None, sequential_coverage=False)

Compute rule heuristics, but without the sequential_coverage parameter, and without removing the rules that do not meet the thresholds

Parameters
  • ruleset – RuleSet object representing a ruleset

  • uncovered_mask – if different than None, mask out the records that are already covered from the training set. Default is None.

:param sequential_coverage:If true, the covered examples covered by one

rule are removed. Additionally, if a rule does not meet the threshold is discarded. If false, it just compute the heuristics with all the records on the training set for all the rules. Default is False

create_empty_heuristics()

Create an empty dictionary for the heuristics to be computed.

Returns

a dictionary with the heuristics to be computed and populated

get_conditions_heuristics(conditions, uncovered_mask=None)

Compute the heuristics of the combination of conditions using the bitsets of each condition from the training set. An intersection operation is made and the cardinality of the resultant set is used for computing the heuristics

Parameters
  • conditions – set of conditions’ id

  • uncovered_mask – if different than None, mask out the records that are already covered from the training set. Default is None.

Returns

a dictionary with the following keys and form - cov_set : array of bitsets representing the coverage by class

and total coverage

  • cov: the coverage of the conditions

  • conf: array of the confidence values of the conditions by class

  • supp: array of the support values of the conditions by class

initialize_sets()

Initialize the sets that are going to be used during the combination and simplification process This includes the bitsets for the training data as well as the bitsets for each of the conditions

rule_is_accurate(rule, uncovered_instances)

Determine if a rule meet the coverage and confidence thresholds

Parameters
  • rule – a Rule object

  • uncovered_instances – mask out the records that are already covered from the training set.

Returns

boolean indicating if the rule satisfy the thresholds

rulecosi.rules module

Supporting classes for handling rulesets, rules and conditions for RuleCOSI

class rulecosi.rules.Condition(att_index, op, value, att_name=None)

Bases: object

Class representing a Rule Condition.

A condition is an evaluation of an operator with a value. The operator could be any of the ones contained in op_dict = {‘eq’: ‘=’, ‘gt’: ‘>’, ‘lt’: ‘<’, ‘ge’: ‘≥’, ‘le’: ‘≤’, ‘ne’: ‘!=’}

satisfies(value)

Evaluates if the condition is satisfied or not using the provided value.

Parameters

value – int The value used for evaluating the condition

Returns

boolean True if the condition is satisfied and False otherwise

satisfies_array(arr)

Evaluates if the condition is satisfied or not for all the records in the provided array.

It applies the operator to the values in arr of the column equal to the index of the attribute of this condition and returns an array of bool

Parameters

arr – array-like, shape (n_samples, n_features) The input samples.

Returns

array of booleans Array of booleans denoting if the

condition was satisfied on each of the elements of arr

class rulecosi.rules.Rule(conditions, class_dist=None, logit_score=None, y=None, y_class_index=None, n_samples=None, n_outputs=1, classes=None, weight=0)

Bases: object

Represents a single rule wich has the form r: A -> y.

A is a set of conditions also called body of the rule. y is the predicted class, also called head of the rule.

add_condition(condition)

Add a condition to the rule

Parameters

condition – Condition to be added

get_condition(att_index)

Returns the condition belonging to the att_index

Parameters

att_index – int The attribute index that wants to be retrieved.

Returns

If the attribute exists, returns the attribute, otherwise

returns None

predict(X, condition_map, proba=False)

Make predictions for the input values in X.

Parameters
  • X – X : array-like, shape (n_samples, n_features) The input samples.

  • condition_map – dictionary of <condition_id, Condition>

Dictionary of Conditions used in the ruleset. condition_id is an integer uniquely identifying the Condition.

Parameters

proba – boolean, default=False Determines if the predictions

are the target values or target probability values.

Returns

(y_pred, covered_mask): tuple containing: 0 - array-like, shape (n_samples,) or ndarray of shape

(n_samples, n_classes)

1 - array of booleans containing the mask of covered masks by

this RuleSet

set_heuristics(heuristics_dict)

Set the heuristics of this rule contained in the heuristics dictionary.

Parameters

heuristics_dict – A dictionary containing the heuristics values

for this Rule

class rulecosi.rules.RuleSet(rules=None, condition_map=None, ruleset=None)

Bases: object

A set of ordered rules that can be used to make predictions.

Parameters
  • rules (array of Rules, default=None) – Rules belonging to the ruleset

  • condition_map (dictionary of <condition_id, Condition>, default=None) –

  • of Conditions used in the ruleset. condition_id is an integer (Dictionary) –

  • identifying the Condition. (uniquely) –

  • ruleset (Ruleset) – If different than None, copy that ruleset properties into this object

compute_all_classification_performance(X, y_true)

Compute all the classification performance measures of this RuleSet

:param Xarray-like, shape (n_samples, n_features)

The input samples.

Parameters

y_true – array-like, shape (n_samples,) The real target value

compute_classification_performance(X, y_true, metric='gmean')

Compute the classification performance measures of this RuleSet

:param Xarray-like, shape (n_samples, n_features)

The input samples.

Parameters
  • y_true – array-like, shape (n_samples,) The real target value

  • metric

    string, default=’gmean’

    Metric that is computed for this RuleSet. Other accepted measures are:

    • ’f1’ for F-measure

    • ’roc_auc’ for AUC under the ROC curve

    • ’accuracy’ for Accuracy

compute_interpretability_measures()

Compute the following interpretability measures:

  • n_rules: number of rules

  • n_uniq_ant: number of unique antecedents or conditions

  • n_total_ant: total number of antecedents or conditions

metric(metric='gmean')

Return the metric value of this RuleSet

Parameters

metric

string, default=’gmean’ Other accepted measures are:

  • ’f1’ for F-measure

  • ’roc_auc’ for AUC under the ROC curve

  • ’accuracy’ for Accuracy

predict(X)

Make predictions for the input values in X.

Parameters

X – X : array-like, shape (n_samples, n_features) The input samples.

Returns

y_pred: array-like, shape (n_samples,) The predicted target values

predict_proba(X)

Make probability predictions for the input values in X.

Parameters

X – X : array-like, shape (n_samples, n_features) The input samples.

Returns

ndarray of shape (n_samples, n_classes) The class probabilities of the input samples.

print_rules(return_object=None, heuristics_digits=4, condition_digits=3)

Print the rules in a string format. It can also return an object containing the rules and its heuristics

Parameters
  • return_object

    string, default=None Indicates if the rules should be returned in an object. Possible values are: - ‘string’: it returns a string containing the rules in a readable

    format

    • ’dataframe’: returns a pandas.DataFrame object containing

      the rules

  • heuristics_digits – number of decimal digits to be displayed in the

heuristics of the rules

Parameters

condition_digits – number of decimal digits to be displayed in the

conditions of the rules

Returns

str or pandas.DataFrame

prune_condition_map()

Prune the condition map in this ruleset to contain only the conditions present in the rules

rulecosi.helpers module

rulecosi.helpers.count_keys(dict_, key)

Return the number of times that key occurs in a dictionary and its sub dictionaries

Parameters
  • dict – dictionary to be explored

  • key – key that should be found

Returns

the number of occurrences of the key in _dict

rulecosi.helpers.one_bitarray(size)

Return a bitarray of 1’s of the size given by the parameter

Parameters

size – size of the returning array

Returns

bitarray of 1’s

rulecosi.helpers.total_n_rules(list_of_rulesets)

Returns the total number of rules inside each ruleset on a list of RuleSet

Parameters

list_of_rulesets – list of RuleSet

Returns

total number of rules