sklearn.estimators

Submodule of khiops.sklearn

Scikit-Learn Estimator Classes for the Khiops AutoML Suite

Class Overview

The diagram below describes the relationships in this module:

KhiopsEstimator(ABC, BaseEstimator)
    |
    +- KhiopsCoclustering(ClusterMixin)
    |
    +- KhiopsSupervisedEstimator
       |
       +- KhiopsPredictor
       |  |
       |  +- KhiopsClassifier(ClassifierMixin)
       |  |
       |  +- KhiopsRegressor(RegressorMixin)
       |
       +- KhiopsEncoder(TransformerMixin)

Classes

KhiopsClassifier

Khiops Selective Naive Bayes Classifier

KhiopsCoclustering

A Khiops Coclustering model

KhiopsEncoder

Khiops supervised discretization/grouping encoder

KhiopsEstimator

Base class for Khiops Scikit-learn estimators

KhiopsPredictor

Abstract Khiops Selective Naive Bayes Predictor

KhiopsRegressor

Khiops Selective Naive Bayes Regressor

KhiopsSupervisedEstimator

Abstract Khiops Supervised Estimator

class khiops.sklearn.estimators.KhiopsClassifier(n_features=100, n_pairs=0, n_trees=10, n_selected_features=0, n_evaluated_features=0, specific_pairs=None, all_possible_pairs=True, construction_rules=None, group_target_value=False, verbose=False, output_dir=None, auto_sort=True, key=None, internal_sort=None)

Bases: KhiopsPredictor, ClassifierMixin

Khiops Selective Naive Bayes Classifier

This classifier supports automatic feature engineering on multi-table datasets. See Multi-Table Learning Primer for more details.

Note

Visit the Khiops site to learn abouth the automatic feature engineering algorithm.

Parameters:
n_featuresint, default 100

Multi-table only : Maximum number of multi-table aggregate features to construct. See Multi-Table Learning Primer for more details.

n_pairsint, default 0

Maximum number of pair features to construct. These features represent a 2D grid partition of the domain of a pair of features in which is optimized in a way that the cells are the purest possible with respect to the target. Only pairs which jointly are more informative that its univariate components may be taken into account in the classifier.

n_treesint, default 10

Maximum number of decision tree features to construct. The constructed trees combine other features, either native or constructed. These features usually improve the classifier’s performance at the cost of interpretability of the model.

n_selected_featuresint, default 0

Maximum number of features to be selected in the SNB predictor. If equal to 0 it selects all the features kept in the training.

n_evaluated_featuresint, default 0

Maximum number of features to be evaluated in the SNB predictor training. If equal to 0 it evaluates all informative features.

specific_pairslist of tuple, optional

User-specified pairs as a list of 2-tuples of feature names. If a given tuple contains only one non-empty feature name, then it generates all the pairs containing it (within the maximum limit n_pairs).

all_possible_pairsbool, default True

If True tries to create all possible pairs within the limit max_pairs. The pairs and features given in specific_pairs have priority.

construction_ruleslist of str, optional
Allowed rules for the automatic feature construction. If not set, it uses all

possible rules.

group_target_valuebool, default False

Allows grouping of the target values in classification. It can substantially increase the training time.

verbosebool, default False

If True it prints debug information and it does not erase temporary files when fitting, predicting or transforming.

output_dirstr, optional

Path of the output directory for the AllReports.khj report file and the Modeling.kdic modeling dictionary file. By default these files are deleted.

auto_sortbool, default True

Advanced. Only for multi-table inputs: If True input tables are pre-sorted by their key before executing Khiops. If the input tables are already sorted by their keys set this parameter to False to speed up the processing. This affects the fit, predict and predict_proba methods. Note The sort by key is performed in a left-to-right, hierarchical, lexicographic manner.

keystr, optional

Multi-table only : The name of the column to be used as key. Deprecated will be removed in Khiops 11. Use dict dataset specifications in fit, fit_predict, predict and predict_proba.

internal_sortbool, optional

Advanced. Only for multi-table inputs: If True input tables are pre-sorted by their key before executing Khiops. If the input tables are already sorted by their keys set this parameter to False to speed up the processing. This affects the fit, predict and predict_proba methods. Note The sort by key is performed in a left-to-right, hierarchical, lexicographic manner. Deprecated will be removed in Khiops 11. Use the auto_sort estimator parameter instead.

Attributes:
n_classes_int

The number of classes seen in training.

classes_ndarray of shape (n_classes_,)

The list of classes seen in training. Depending on the training target, the contents are int or str.

n_features_evaluated_int

The number of features evaluated by the classifier.

feature_evaluated_names_ndarray of shape (n_features_evaluated_,)

Names of the features evaluated by the classifier.

feature_evaluated_importances_ndarray of shape (n_features_evaluated_,)

Level of the features evaluated by the classifier. See below for a definition of the level.

n_features_used_int

The number of features used by the classifier.

feature_used_names_ndarray of shape (n_features_used_, )

Names of the features used by the classifier.

feature_used_importances_ndarray of shape (n_features_used_, 3)

Level, Weight and Importance of the features used by the classifier:

  • Level: A measure of the predictive importance of the feature taken individually. It ranges between 0 (no predictive interest) and 1 (optimal predictive importance).

  • Weight: A measure of the predictive importance of the feature taken relative to all features selected by the classifier. It ranges between 0 (little contribution to the model) and 1 (large contribution to the model).

  • Importance: The geometric mean between the Level and the Weight.

is_fitted_bool

True if the estimator is fitted.

is_multitable_model_bool

True if the model was fitted on a multi-table dataset.

model_DictionaryDomain

The Khiops dictionary domain for the trained classifier.

model_main_dictionary_name_str

The name of the main Khiops dictionary within the model_ domain.

model_report_AnalysisResults

The Khiops report object.

model_report_raw_dict

JSON object of the Khiops report. Deprecated will be removed in Khiops 11. Use the json_data attribute of the model_report_ estimator attribute instead.

Examples

See the following functions of the samples_sklearn.py documentation script:
fit(X, y, **kwargs)

Fits a Selective Naive Bayes classifier according to X, y

Parameters:
Xarray-like of shape (n_samples, n_features_in) or dict

Training dataset. Either an array-like or a dict specification for multi-table datasets (see Multi-Table Learning Primer).

Deprecated input types (will be removed in Khiops 11):

  • tuple: A pair (path_to_file, separator).

  • list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using key estimator parameter.

yarray-like of shape (n_samples,)

The target values.

Deprecated input types (will be removed in Khiops 11):

  • str: A path to a data table file for file-based dict dataset specifications.

Returns:
selfKhiopsClassifier

The calling estimator instance.

predict(X)

Predicts the most probable class for the test dataset X

The predicted class of an input sample is the arg-max of the conditional probabilities P(y|X) for each value of y.

Parameters:
Xarray-like of shape (n_samples, n_features_in) or dict

Training dataset. Either an array-like or a dict specification for multi-table datasets (see Multi-Table Learning Primer).

Deprecated input types (will be removed in Khiops 11):

  • tuple: A pair (path_to_file, separator).

  • list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using key estimator parameter.

Returns:
ndarray

An array containing the encoded columns. A first column containing key column ids is added in multi-table mode. The numpy.dtype of the array is integer if the classifier was learned with an integer y. Otherwise it will be str.

The key columns are added for multi-table tasks.

predict_proba(X)

Predicts the class probabilities for the test dataset X

Parameters:
Xarray-like of shape (n_samples, n_features_in) or dict

Training dataset. Either an array-like or a dict specification for multi-table datasets (see Multi-Table Learning Primer).

Deprecated input types (will be removed in Khiops 11):

  • tuple: A pair (path_to_file, separator).

  • list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using key estimator parameter.

Returns:
numpy.array or str

The probability of the samples for each class in the model. The columns are named with the pattern Prob<class> for each <class> found in the training dataset. The output data container depends on X:

  • Dataframe or dataframe-based dict dataset specification: numpy.array

  • File-based dict dataset specification: A CSV file (the method returns its path).

The key columns are added for multi-table tasks.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KhiopsClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:
selfobject

The updated object.

class khiops.sklearn.estimators.KhiopsCoclustering(verbose=False, output_dir=None, auto_sort=True, build_name_var=True, build_distance_vars=False, build_frequency_vars=False, max_part_numbers=None, key=None, variables=None, internal_sort=None)

Bases: KhiopsEstimator, ClusterMixin

A Khiops Coclustering model

A coclustering is a non-supervised piecewise constant density estimator.

Parameters:
build_distance_varsbool, default False

If True includes a cluster distance variable in the deployment

build_frequency_varsbool, default False

If True includes the frequency variables in the deployment.

build_name_varbool, default False

If True includes a cluster id variable in the deployment.

verbosebool, default False

If True it prints debug information and it does not erase temporary files when fitting, predicting or transforming.

output_dirstr, optional

Path of the output directory for the Coclustering.khcj report file and the Coclustering.kdic modeling dictionary file.

auto_sortbool, default True

Advanced. Only for multi-table inputs: If True input tables are automatically sorted by their key before executing Khiops. If the input tables are already sorted by their keys set this parameter to False to speed up the processing. This affects the predict method. Note The sort by key is performed in a left-to-right, hierarchical, lexicographic manner.

max_part_numbersdict, optional

Maximum number of clusters for each of the co-clustered column. Specifically, a key-value pair of this dictionary represents the column name and its respective maximum number of clusters. If not specified there is no maximun number of clusters is imposed on any column. Deprecated will be removed in Khiops 11. Use the max_part_number parameter of the fit method.

variableslist of str, optional

A list of column names/indexes to use in the coclustering. Deprecated will be removed in Khiops 11. Use the columns parameter of the fit method.

keystr, optional

Multi-table only : The name of the column to be used as key. Deprecated will be removed in Khiops 11. Use id_column parameter of the fit method.

internal_sortbool, optional

Advanced. Only for multi-table inputs: If True input tables are automatically sorted by their key before executing Khiops. If the input tables are already sorted by their keys set this parameter to False to speed up the processing. This affects the predict method. Note The sort by key is performed in a left-to-right, hierarchical, lexicographic manner. Deprecated will be removed in Khiops 11. Use the auto_sort parameter of the estimator instead.

Attributes:
is_fitted_bool

True if the estimator is fitted.

is_multitable_model_bool

True if the model was fitted on a multi-table dataset.

model_DictionaryDomain

The Khiops dictionary domain for the trained coclustering. For coclustering it is a multi-table dictionary even though the model is single-table.

model_main_dictionary_name_str

The name of the main Khiops dictionary within the model_ domain.

model_report_CoclusteringResults

The Khiops report object.

model_report_raw_dict

JSON object of the Khiops report. Deprecated will be removed in Khiops 11. Use the json_data attribute of the model_report_ estimator attribute instead.

Examples

See the following functions of the samples_sklearn.py documentation script:
fit(X, y=None, **kwargs)

Trains a Khiops Coclustering model

Parameters:
Xarray-like of shape (n_samples, n_features_in) or dict

Training dataset. Either an array-like or a dict specification for multi-table datasets (see Multi-Table Learning Primer).

Deprecated input types (will be removed in Khiops 11):

  • tuple: A pair (path_to_file, separator).

  • list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using key estimator parameter.

id_columnstr

The column that contains the id of the instance.

columnslist, optional

The columns to be co-clustered. If not specified it uses all columns.

max_part_numbersdict, optional

Maximum number of clusters for each of the co-clustered column. Specifically, a key-value pair of this dictionary represents the column name and its respective maximum number of clusters. If not specified, then no maximum number of clusters is imposed on any column. Deprecated (will be removed in Khiops 11). Use the simplify method instead.

Returns:
selfKhiopsCoclustering

The calling estimator instance.

fit_predict(X, y=None, **kwargs)

Performs clustering on X and returns result (instead of labels)

predict(X)

Predicts the most probable cluster for the test dataset X

Parameters:
Xarray-like of shape (n_samples, n_features_in) or dict

Training dataset. Either an array-like or a dict specification for multi-table datasets (see Multi-Table Learning Primer).

Deprecated input types (will be removed in Khiops 11):

  • tuple: A pair (path_to_file, separator).

  • list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using key estimator parameter.

Returns:
ndarray

An array containing the encoded columns. A first column containing key column ids is added in multi-table mode.

simplify(max_preserved_information=0, max_cells=0, max_total_parts=0, max_part_numbers=None)

Creates a simplified coclustering model from the current instance

Parameters:
max_preserved_informationint, default 0

Maximum information preserve in the simplified coclustering. If equal to 0 there is no limit.

max_cellsint, default 0

Maximum number of cells in the simplified coclustering. If equal to 0 there is no limit.

max_total_partsint, default 0

Maximum number of parts totaled over all variables. If equal to 0 there is no limit.

max_part_numbersdict, optional

Maximum number of clusters for each of the co-clustered column. Specifically, a key-value pair of this dictionary represents the column name and its respective maximum number of clusters. If not specified, then no maximum number of clusters is imposed on any column.

Returns:
selfKhiopsCoclustering

A new, simplified KhiopsCoclustering estimator instance.

class khiops.sklearn.estimators.KhiopsEncoder(categorical_target=True, n_features=100, n_pairs=0, n_trees=0, specific_pairs=None, all_possible_pairs=True, construction_rules=None, informative_features_only=True, group_target_value=False, keep_initial_variables=False, transform_type_categorical='part_id', transform_type_numerical='part_id', transform_pairs='part_id', verbose=False, output_dir=None, auto_sort=True, key=None, internal_sort=None)

Bases: KhiopsSupervisedEstimator, TransformerMixin

Khiops supervised discretization/grouping encoder

Parameters:
categorical_targetbool, default True

True if the target column is categorical.

n_featuresint, default 100

Multi-table only : Maximum number of multi-table aggregate features to construct. See Multi-Table Learning Primer for more details.

n_pairsint, default 0

Maximum number of pair features to construct. These features represent a 2D grid partition of the domain of a pair of features in which is optimized in a way that the cells are the purest possible with respect to the target.

n_treesint, default 10

Maximum number of decision tree features to construct. The constructed trees combine other features, either native or constructed. These features usually improve a predictor’s performance at the cost of interpretability of the model.

specific_pairslist of tuple, optional

User-specified pairs as a list of 2-tuples of feature names. If a given tuple contains only one non-empty feature name, then it generates all the pairs containing it (within the maximum limit n_pairs).

all_possible_pairsbool, default True

If True tries to create all possible pairs within the limit max_pairs. The pairs and features given in specific_pairs have priority.

construction_ruleslist of str, optional
Allowed rules for the automatic feature construction. If not set, it uses all

possible rules.

informative_features_onlybool, default True

If True keeps only informative features.

group_target_valuebool, default False

Allows grouping of the target values in classification. It can substantially increase the training time.

keep_initial_variablesbool, default False

If True the original columns are kept in the transformed data.

transform_type_categoricalstr, default “part_id”
Type of transformation for categorical features. Valid values:
  • “part_id”

  • “part_label”

  • “dummies”

  • “conditional_info”

See the documentation for the categorical_recoding_method parameter of the train_recoder function for more details.

transform_type_numericalstr, default “part_id”
One of the following strings are valid:
  • “part_id”

  • “part_label”

  • “dummies”

  • “conditional_info”

  • “center_reduction”

  • “0-1_normalization”

  • “rank_normalization”

See the documentation for the numerical_recoding_method parameter of the train_recoder function for more details.

transform_pairs: str, default “part_id”
Type of transformation for bivariate features. Valid values:
  • “part_id”

  • “part_label”

  • “dummies”

  • “conditional_info”

verbosebool, default False

If True it prints debug information and it does not erase temporary files when fitting, predicting or transforming.

output_dirstr, optional

Path of the output directory for the AllReports.khj report file and the Modeling.kdic modeling dictionary file. By default these files are deleted.

auto_sortbool, default True

Advanced. Only for multi-table inputs: If True input tables are pre-sorted by their key before executing Khiops. If the input tables are already sorted by their keys set this parameter to False to speed up the processing. This affects the fit and transform methods. Note The sort by key is performed in a left-to-right, hierarchical, lexicographic manner.

keystr, optional

Multi-table only : The name of the column to be used as key. Deprecated will be removed in Khiops 11. Use dict dataset specifications in fit and transform.

internal_sortbool, optional

Advanced. Only for multi-table inputs: If True input tables are pre-sorted by their key before executing Khiops. If the input tables are already sorted by their keys set this parameter to False to speed up the processing. This affects the fit and transform methods. Note The sort by key is performed in a left-to-right, hierarchical, lexicographic manner. Deprecated will be removed in Khiops 11. Use the auto_sort estimator parameter instead.

Attributes:
n_features_evaluated_int

The number of features evaluated by the classifier.

feature_evaluated_names_ndarray of shape (n_features_evaluated_,)

Names of the features evaluated by the classifier.

feature_evaluated_importances_ndarray of shape (n_features_evaluated_,)

Level of the features evaluated by the classifier. The Level is measure of the predictive importance of the feature taken individually. It ranges between 0 (no predictive interest) and 1 (optimal predictive importance).

is_fitted_bool

True if the estimator is fitted.

is_multitable_model_bool

True if the model was fitted on a multi-table dataset.

model_DictionaryDomain

The Khiops dictionary domain for the trained encoder.

model_main_dictionary_name_str

The name of the main Khiops dictionary within the model_ domain.

model_report_AnalysisResults

The Khiops report object.

model_report_raw_dict

JSON object of the Khiops report. Deprecated will be removed in Khiops 11. Use the json_data attribute of the model_report_ estimator attribute instead.

Examples

See the following functions of the samples_sklearn.py documentation script:
fit(X, y=None, **kwargs)

Fits the Khiops Encoder according to X, y

Parameters:
Xarray-like of shape (n_samples, n_features_in) or dict

Training dataset. Either an array-like or a dict specification for multi-table datasets (see Multi-Table Learning Primer).

Deprecated input types (will be removed in Khiops 11):

  • tuple: A pair (path_to_file, separator).

  • list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using key estimator parameter.

yarray-like of shape (n_samples,)

The target values.

Deprecated input types (will be removed in Khiops 11):

  • str: A path to a data table file for file-based dict dataset specifications.

Returns:
selfKhiopsEncoder

The calling estimator instance.

fit_transform(X, y=None, **kwargs)

Fit and transforms its inputs

Parameters:
Xarray-like of shape (n_samples, n_features_in) or dict

Training dataset. Either an array-like or a dict specification for multi-table datasets (see Multi-Table Learning Primer).

Deprecated input types (will be removed in Khiops 11):

  • tuple: A pair (path_to_file, separator).

  • list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using key estimator parameter.

yarray-like of shape (n_samples,)

The target values.

Deprecated input types (will be removed in Khiops 11):

  • str: A path to a data table file for file-based dict dataset specifications.

Returns:
selfKhiopsEncoder

The calling estimator instance.

transform(X)

Transforms X with a fitted Khiops supervised encoder

Note

Numerical features are encoded to categorical ones. See the transform_type_numerical parameter for details.

Parameters:
Xarray-like of shape (n_samples, n_features_in) or dict

Training dataset. Either an array-like or a dict specification for multi-table datasets (see Multi-Table Learning Primer).

Deprecated input types (will be removed in Khiops 11):

  • tuple: A pair (path_to_file, separator).

  • list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using key estimator parameter.

Returns:
ndarray

An array containing the encoded columns. A first column containing key column ids is added in multi-table mode.

class khiops.sklearn.estimators.KhiopsEstimator(key=None, verbose=False, output_dir=None, auto_sort=True, internal_sort=None)

Bases: ABC, BaseEstimator

Base class for Khiops Scikit-learn estimators

Parameters:
verbosebool, default False

If True it prints debug information and it does not erase temporary files when fitting, predicting or transforming.

output_dirstr, optional

Path of the output directory for the resulting artifacts of Khiops learning tasks. See concrete estimator classes for more information about this parameter.

auto_sortbool, default True

Advanced.: See concrete estimator classes for information about this parameter.

keystr, optional

The name of the column to be used as key. Deprecated will be removed in Khiops 11.

internal_sortbool, optional

Advanced.: See concrete estimator classes for information about this parameter. Deprecated will be removed in Khiops 11. Use the auto_sort estimator parameter instead.

export_dictionary_file(dictionary_file_path)

Export the model’s Khiops dictionary file (.kdic)

export_report_file(report_file_path)

Exports the model report to a JSON file

Parameters:
report_file_pathstr

The location of the exported report file.

Raises:
ValueError

When the instance is not fitted.

fit(X, y=None, **kwargs)

Fit the estimator

Returns:
selfKhiopsEstimator

The fitted estimator instance.

class khiops.sklearn.estimators.KhiopsPredictor(n_features=100, n_pairs=0, n_trees=10, n_selected_features=0, n_evaluated_features=0, specific_pairs=None, all_possible_pairs=True, construction_rules=None, verbose=False, output_dir=None, auto_sort=True, key=None, internal_sort=None)

Bases: KhiopsSupervisedEstimator

Abstract Khiops Selective Naive Bayes Predictor

predict(X)

Predicts the target variable for the test dataset X

See the documentation of concrete subclasses for more details.

class khiops.sklearn.estimators.KhiopsRegressor(n_features=100, n_pairs=0, n_trees=0, n_selected_features=0, n_evaluated_features=0, specific_pairs=None, all_possible_pairs=True, construction_rules=None, verbose=False, output_dir=None, auto_sort=True, key=None, internal_sort=None)

Bases: KhiopsPredictor, RegressorMixin

Khiops Selective Naive Bayes Regressor

This regressor supports automatic feature engineering on multi-table datasets. See Multi-Table Learning Primer for more details.

Note

Visit the Khiops site to learn about the automatic feature engineering algorithm.

Parameters:
n_featuresint, default 100

Multi-table only : Maximum number of multi-table aggregate features to construct. See Multi-Table Learning Primer for more details.

n_pairsint, default 0

Maximum number of pair features to construct. These features represent a 2D grid partition of the domain of a pair of features in which is optimized in a way that the cells are the purest possible with respect to the target. Only pairs which jointly are more informative that its univariate components may be taken into account in the regressor.

n_selected_featuresint, default 0

Maximum number of features to be selected in the SNB predictor. If equal to 0 it selects all the features kept in the training.

n_evaluated_featuresint, default 0

Maximum number of features to be evaluated in the SNB predictor training. If equal to 0 it evaluates all informative features.

specific_pairslist of tuple, optional

User-specified pairs as a list of 2-tuples of feature names. If a given tuple contains only one non-empty feature name, then it generates all the pairs containing it (within the maximum limit n_pairs).

all_possible_pairsbool, default True

If True tries to create all possible pairs within the limit max_pairs. The pairs and features given in specific_pairs have priority.

construction_ruleslist of str, optional
Allowed rules for the automatic feature construction. If not set, it uses all

possible rules.

verbosebool, default False

If True it prints debug information and it does not erase temporary files when fitting, predicting or transforming.

output_dirstr, optional

Path of the output directory for the AllReports.khj report file and the Modeling.kdic modeling dictionary file. By default these files are deleted.

auto_sortbool, default True

Advanced. Only for multi-table inputs: If True input tables are pre-sorted by their key before executing Khiops. If the input tables are already sorted by their keys set this parameter to False to speed up the processing. This affects the fit and predict methods. Note The sort by key is performed in a left-to-right, hierarchical, lexicographic manner.

keystr, optional

Multi-table only : The name of the column to be used as key. Deprecated will be removed in Khiops 11. Use dict dataset specifications in fit, fit_predict and predict.

internal_sortbool, optional

Advanced. Only for multi-table inputs: If True input tables are pre-sorted by their key before executing Khiops. If the input tables are already sorted by their keys set this parameter to False to speed up the processing. This affects the fit and predict methods. Note The sort by key is performed in a left-to-right, hierarchical, lexicographic manner. Deprecated will be removed in Khiops 11. Use the auto_sort estimator parameter instead.

Attributes:
n_features_evaluated_int

The number of features evaluated by the classifier.

feature_evaluated_names_ndarray of shape (n_features_evaluated_,)

Names of the features evaluated by the classifier.

feature_evaluated_importances_ndarray of shape (n_features_evaluated_,)

Level of the features evaluated by the classifier. See below for a definition of the level.

n_features_used_int

The number of features used by the classifier.

feature_used_names_ndarray of shape (n_features_used_, )

Names of the features used by the classifier.

feature_used_importances_ndarray of shape (n_features_used_, 3)

Level, Weight and Importance of the features used by the classifier:

  • Level: A measure of the predictive importance of the feature taken individually. It ranges between 0 (no predictive interest) and 1 (optimal predictive importance).

  • Weight: A measure of the predictive importance of the feature taken relative to all features selected by the classifier. It ranges between 0 (little contribution to the model) and 1 (large contribution to the model).

  • Importance: The geometric mean between the Level and the Weight.

is_fitted_bool

True if the estimator is fitted.

is_multitable_model_bool

True if the model was fitted on a multi-table dataset.

model_DictionaryDomain

The Khiops dictionary domain for the trained regressor.

model_main_dictionary_name_str

The name of the main Khiops dictionary within the model_ domain.

model_report_AnalysisResults

The Khiops report object.

model_report_raw_dict

JSON object of the Khiops report. Deprecated will be removed in Khiops 11. Use the json_data attribute of the model_report_ estimator attribute instead.

Examples

See the following functions of the samples_sklearn.py documentation script:
fit(X, y=None, **kwargs)

Fits a Selective Naive Bayes regressor according to X, y

Warning

Make sure that the type of y is float. This is easily done with y = y.astype(float).

Parameters:
Xarray-like of shape (n_samples, n_features_in) or dict

Training dataset. Either an array-like or a dict specification for multi-table datasets (see Multi-Table Learning Primer).

Deprecated input types (will be removed in Khiops 11):

  • tuple: A pair (path_to_file, separator).

  • list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using key estimator parameter.

yarray-like of shape (n_samples,)

The target values.

Deprecated input types (will be removed in Khiops 11):

  • str: A path to a data table file for file-based dict dataset specifications.

Returns:
selfKhiopsRegressor

The calling estimator instance.

predict(X)

Predicts the regression values for the test dataset X

The predicted value is estimated by the Selective Naive Bayes Regressor learned during fit step.

Parameters:
Xarray-like of shape (n_samples, n_features_in) or dict

Training dataset. Either an array-like or a dict specification for multi-table datasets (see Multi-Table Learning Primer).

Deprecated input types (will be removed in Khiops 11):

  • tuple: A pair (path_to_file, separator).

  • list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using key estimator parameter.

Returns:
numpy.ndarray or str

An array containing the encoded columns. A first column containing key column ids is added in multi-table mode. The key columns are added for multi-table tasks. The array is in the form of:

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KhiopsRegressor

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:
selfobject

The updated object.

class khiops.sklearn.estimators.KhiopsSupervisedEstimator(n_features=100, n_pairs=0, n_trees=10, specific_pairs=None, all_possible_pairs=True, construction_rules=None, verbose=False, output_dir=None, auto_sort=True, key=None, internal_sort=None)

Bases: KhiopsEstimator

Abstract Khiops Supervised Estimator

fit(X, y=None, **kwargs)

Fits a supervised estimator according to X,y

Called by the concrete sub-classes KhiopsEncoder, KhiopsClassifier, KhiopsRegressor.

Parameters:
Xarray-like of shape (n_samples, n_features_in) or dict

Training dataset. Either an array-like or a dict specification for multi-table datasets (see Multi-Table Learning Primer).

Deprecated input types (will be removed in Khiops 11):

  • tuple: A pair (path_to_file, separator).

  • list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using key estimator parameter.

yarray-like of shape (n_samples,)

The target values.

Deprecated input types (will be removed in Khiops 11):

  • str: A path to a data table file for file-based dict dataset specifications.

Returns:
selfKhiopsSupervisedEstimator

The calling estimator instance.