sklearn.estimators¶
Submodule of khiops.sklearn
Scikit-Learn Estimator Classes for the Khiops AutoML Suite
Class Overview¶
The diagram below describes the relationships in this module:
KhiopsEstimator(ABC, BaseEstimator)
|
+- KhiopsCoclustering(ClusterMixin)
|
+- KhiopsSupervisedEstimator
|
+- KhiopsPredictor
| |
| +- KhiopsClassifier(ClassifierMixin)
| |
| +- KhiopsRegressor(RegressorMixin)
|
+- KhiopsEncoder(TransformerMixin)
Classes¶
Khiops Selective Naive Bayes Classifier |
|
A Khiops Coclustering model |
|
Khiops supervised discretization/grouping encoder |
|
Base class for Khiops Scikit-learn estimators |
|
Abstract Khiops Selective Naive Bayes Predictor |
|
Khiops Selective Naive Bayes Regressor |
|
Abstract Khiops Supervised Estimator |
- class khiops.sklearn.estimators.KhiopsClassifier(n_features=100, n_pairs=0, n_trees=10, n_selected_features=0, n_evaluated_features=0, specific_pairs=None, all_possible_pairs=True, construction_rules=None, group_target_value=False, verbose=False, output_dir=None, auto_sort=True, key=None, internal_sort=None)¶
Bases:
KhiopsPredictor
,ClassifierMixin
Khiops Selective Naive Bayes Classifier
This classifier supports automatic feature engineering on multi-table datasets. See Multi-Table Learning Primer for more details.
Note
Visit the Khiops site to learn abouth the automatic feature engineering algorithm.
- Parameters:
- n_featuresint, default 100
Multi-table only : Maximum number of multi-table aggregate features to construct. See Multi-Table Learning Primer for more details.
- n_pairsint, default 0
Maximum number of pair features to construct. These features represent a 2D grid partition of the domain of a pair of features in which is optimized in a way that the cells are the purest possible with respect to the target. Only pairs which jointly are more informative that its univariate components may be taken into account in the classifier.
- n_treesint, default 10
Maximum number of decision tree features to construct. The constructed trees combine other features, either native or constructed. These features usually improve the classifier’s performance at the cost of interpretability of the model.
- n_selected_featuresint, default 0
Maximum number of features to be selected in the SNB predictor. If equal to 0 it selects all the features kept in the training.
- n_evaluated_featuresint, default 0
Maximum number of features to be evaluated in the SNB predictor training. If equal to 0 it evaluates all informative features.
- specific_pairslist of tuple, optional
User-specified pairs as a list of 2-tuples of feature names. If a given tuple contains only one non-empty feature name, then it generates all the pairs containing it (within the maximum limit n_pairs).
- all_possible_pairsbool, default True
If True tries to create all possible pairs within the limit max_pairs. The pairs and features given in specific_pairs have priority.
- construction_ruleslist of str, optional
- Allowed rules for the automatic feature construction. If not set, it uses all
possible rules.
- group_target_valuebool, default
False
Allows grouping of the target values in classification. It can substantially increase the training time.
- verbosebool, default
False
If
True
it prints debug information and it does not erase temporary files when fitting, predicting or transforming.- output_dirstr, optional
Path of the output directory for the
AllReports.khj
report file and theModeling.kdic
modeling dictionary file. By default these files are deleted.- auto_sortbool, default
True
Advanced. Only for multi-table inputs: If
True
input tables are pre-sorted by their key before executing Khiops. If the input tables are already sorted by their keys set this parameter toFalse
to speed up the processing. This affects thefit
,predict
andpredict_proba
methods. Note The sort by key is performed in a left-to-right, hierarchical, lexicographic manner.- keystr, optional
Multi-table only : The name of the column to be used as key. Deprecated will be removed in Khiops 11. Use
dict
dataset specifications infit
,fit_predict
,predict
andpredict_proba
.- internal_sortbool, optional
Advanced. Only for multi-table inputs: If
True
input tables are pre-sorted by their key before executing Khiops. If the input tables are already sorted by their keys set this parameter toFalse
to speed up the processing. This affects thefit
,predict
andpredict_proba
methods. Note The sort by key is performed in a left-to-right, hierarchical, lexicographic manner. Deprecated will be removed in Khiops 11. Use theauto_sort
estimator parameter instead.
- Attributes:
- n_classes_int
The number of classes seen in training.
- classes_
ndarray
of shape (n_classes_,) The list of classes seen in training. Depending on the training target, the contents are
int
orstr
.- n_features_evaluated_int
The number of features evaluated by the classifier.
- feature_evaluated_names_
ndarray
of shape (n_features_evaluated_,) Names of the features evaluated by the classifier.
- feature_evaluated_importances_
ndarray
of shape (n_features_evaluated_,) Level of the features evaluated by the classifier. See below for a definition of the level.
- n_features_used_int
The number of features used by the classifier.
- feature_used_names_
ndarray
of shape (n_features_used_, ) Names of the features used by the classifier.
- feature_used_importances_
ndarray
of shape (n_features_used_, 3) Level, Weight and Importance of the features used by the classifier:
Level: A measure of the predictive importance of the feature taken individually. It ranges between 0 (no predictive interest) and 1 (optimal predictive importance).
Weight: A measure of the predictive importance of the feature taken relative to all features selected by the classifier. It ranges between 0 (little contribution to the model) and 1 (large contribution to the model).
Importance: The geometric mean between the Level and the Weight.
- is_fitted_bool
True
if the estimator is fitted.- is_multitable_model_bool
True
if the model was fitted on a multi-table dataset.- model_
DictionaryDomain
The Khiops dictionary domain for the trained classifier.
- model_main_dictionary_name_str
The name of the main Khiops dictionary within the
model_
domain.- model_report_
AnalysisResults
The Khiops report object.
- model_report_raw_dict
JSON object of the Khiops report. Deprecated will be removed in Khiops 11. Use the
json_data
attribute of themodel_report_
estimator attribute instead.
Examples
- See the following functions of the
samples_sklearn.py
documentation script:
- fit(X, y, **kwargs)¶
Fits a Selective Naive Bayes classifier according to X, y
- Parameters:
- Xarray-like of shape (n_samples, n_features_in) or dict
Training dataset. Either an array-like or a
dict
specification for multi-table datasets (see Multi-Table Learning Primer).Deprecated input types (will be removed in Khiops 11):
tuple: A pair (
path_to_file
,separator
).list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using
key
estimator parameter.
- yarray-like of shape (n_samples,)
The target values.
Deprecated input types (will be removed in Khiops 11):
str: A path to a data table file for file-based
dict
dataset specifications.
- Returns:
- self
KhiopsClassifier
The calling estimator instance.
- self
- predict(X)¶
Predicts the most probable class for the test dataset X
The predicted class of an input sample is the arg-max of the conditional probabilities P(y|X) for each value of y.
- Parameters:
- Xarray-like of shape (n_samples, n_features_in) or dict
Training dataset. Either an array-like or a
dict
specification for multi-table datasets (see Multi-Table Learning Primer).Deprecated input types (will be removed in Khiops 11):
tuple: A pair (
path_to_file
,separator
).list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using
key
estimator parameter.
- Returns:
ndarray
An array containing the encoded columns. A first column containing key column ids is added in multi-table mode. The
numpy.dtype
of the array is integer if the classifier was learned with an integery
. Otherwise it will bestr
.The key columns are added for multi-table tasks.
- predict_proba(X)¶
Predicts the class probabilities for the test dataset X
- Parameters:
- Xarray-like of shape (n_samples, n_features_in) or dict
Training dataset. Either an array-like or a
dict
specification for multi-table datasets (see Multi-Table Learning Primer).Deprecated input types (will be removed in Khiops 11):
tuple: A pair (
path_to_file
,separator
).list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using
key
estimator parameter.
- Returns:
numpy.array
or strThe probability of the samples for each class in the model. The columns are named with the pattern
Prob<class>
for each<class>
found in the training dataset. The output data container depends onX
:Dataframe or dataframe-based
dict
dataset specification:numpy.array
File-based
dict
dataset specification: A CSV file (the method returns its path).
The key columns are added for multi-table tasks.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KhiopsClassifier ¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
- Returns:
- selfobject
The updated object.
- class khiops.sklearn.estimators.KhiopsCoclustering(verbose=False, output_dir=None, auto_sort=True, build_name_var=True, build_distance_vars=False, build_frequency_vars=False, max_part_numbers=None, key=None, variables=None, internal_sort=None)¶
Bases:
KhiopsEstimator
,ClusterMixin
A Khiops Coclustering model
A coclustering is a non-supervised piecewise constant density estimator.
- Parameters:
- build_distance_varsbool, default
False
If
True
includes a cluster distance variable in the deployment- build_frequency_varsbool, default
False
If
True
includes the frequency variables in the deployment.- build_name_varbool, default
False
If
True
includes a cluster id variable in the deployment.- verbosebool, default
False
If
True
it prints debug information and it does not erase temporary files when fitting, predicting or transforming.- output_dirstr, optional
Path of the output directory for the
Coclustering.khcj
report file and theCoclustering.kdic
modeling dictionary file.- auto_sortbool, default
True
Advanced. Only for multi-table inputs: If
True
input tables are automatically sorted by their key before executing Khiops. If the input tables are already sorted by their keys set this parameter toFalse
to speed up the processing. This affects thepredict
method. Note The sort by key is performed in a left-to-right, hierarchical, lexicographic manner.- max_part_numbersdict, optional
Maximum number of clusters for each of the co-clustered column. Specifically, a key-value pair of this dictionary represents the column name and its respective maximum number of clusters. If not specified there is no maximun number of clusters is imposed on any column. Deprecated will be removed in Khiops 11. Use the
max_part_number
parameter of thefit
method.- variableslist of str, optional
A list of column names/indexes to use in the coclustering. Deprecated will be removed in Khiops 11. Use the
columns
parameter of thefit
method.- keystr, optional
Multi-table only : The name of the column to be used as key. Deprecated will be removed in Khiops 11. Use
id_column
parameter of thefit
method.- internal_sortbool, optional
Advanced. Only for multi-table inputs: If
True
input tables are automatically sorted by their key before executing Khiops. If the input tables are already sorted by their keys set this parameter toFalse
to speed up the processing. This affects thepredict
method. Note The sort by key is performed in a left-to-right, hierarchical, lexicographic manner. Deprecated will be removed in Khiops 11. Use theauto_sort
parameter of the estimator instead.
- build_distance_varsbool, default
- Attributes:
- is_fitted_bool
True
if the estimator is fitted.- is_multitable_model_bool
True
if the model was fitted on a multi-table dataset.- model_
DictionaryDomain
The Khiops dictionary domain for the trained coclustering. For coclustering it is a multi-table dictionary even though the model is single-table.
- model_main_dictionary_name_str
The name of the main Khiops dictionary within the
model_
domain.- model_report_
CoclusteringResults
The Khiops report object.
- model_report_raw_dict
JSON object of the Khiops report. Deprecated will be removed in Khiops 11. Use the
json_data
attribute of themodel_report_
estimator attribute instead.
Examples
- See the following functions of the
samples_sklearn.py
documentation script:
- fit(X, y=None, **kwargs)¶
Trains a Khiops Coclustering model
- Parameters:
- Xarray-like of shape (n_samples, n_features_in) or dict
Training dataset. Either an array-like or a
dict
specification for multi-table datasets (see Multi-Table Learning Primer).Deprecated input types (will be removed in Khiops 11):
tuple: A pair (
path_to_file
,separator
).list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using
key
estimator parameter.
- id_columnstr
The column that contains the id of the instance.
- columnslist, optional
The columns to be co-clustered. If not specified it uses all columns.
- max_part_numbersdict, optional
Maximum number of clusters for each of the co-clustered column. Specifically, a key-value pair of this dictionary represents the column name and its respective maximum number of clusters. If not specified, then no maximum number of clusters is imposed on any column. Deprecated (will be removed in Khiops 11). Use the
simplify
method instead.
- Returns:
- self
KhiopsCoclustering
The calling estimator instance.
- self
- fit_predict(X, y=None, **kwargs)¶
Performs clustering on X and returns result (instead of labels)
- predict(X)¶
Predicts the most probable cluster for the test dataset X
- Parameters:
- Xarray-like of shape (n_samples, n_features_in) or dict
Training dataset. Either an array-like or a
dict
specification for multi-table datasets (see Multi-Table Learning Primer).Deprecated input types (will be removed in Khiops 11):
tuple: A pair (
path_to_file
,separator
).list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using
key
estimator parameter.
- Returns:
ndarray
An array containing the encoded columns. A first column containing key column ids is added in multi-table mode.
- simplify(max_preserved_information=0, max_cells=0, max_total_parts=0, max_part_numbers=None)¶
Creates a simplified coclustering model from the current instance
- Parameters:
- max_preserved_informationint, default 0
Maximum information preserve in the simplified coclustering. If equal to 0 there is no limit.
- max_cellsint, default 0
Maximum number of cells in the simplified coclustering. If equal to 0 there is no limit.
- max_total_partsint, default 0
Maximum number of parts totaled over all variables. If equal to 0 there is no limit.
- max_part_numbersdict, optional
Maximum number of clusters for each of the co-clustered column. Specifically, a key-value pair of this dictionary represents the column name and its respective maximum number of clusters. If not specified, then no maximum number of clusters is imposed on any column.
- Returns:
- self
KhiopsCoclustering
A new, simplified
KhiopsCoclustering
estimator instance.
- self
- class khiops.sklearn.estimators.KhiopsEncoder(categorical_target=True, n_features=100, n_pairs=0, n_trees=0, specific_pairs=None, all_possible_pairs=True, construction_rules=None, informative_features_only=True, group_target_value=False, keep_initial_variables=False, transform_type_categorical='part_id', transform_type_numerical='part_id', transform_pairs='part_id', verbose=False, output_dir=None, auto_sort=True, key=None, internal_sort=None)¶
Bases:
KhiopsSupervisedEstimator
,TransformerMixin
Khiops supervised discretization/grouping encoder
- Parameters:
- categorical_targetbool, default
True
True
if the target column is categorical.- n_featuresint, default 100
Multi-table only : Maximum number of multi-table aggregate features to construct. See Multi-Table Learning Primer for more details.
- n_pairsint, default 0
Maximum number of pair features to construct. These features represent a 2D grid partition of the domain of a pair of features in which is optimized in a way that the cells are the purest possible with respect to the target.
- n_treesint, default 10
Maximum number of decision tree features to construct. The constructed trees combine other features, either native or constructed. These features usually improve a predictor’s performance at the cost of interpretability of the model.
- specific_pairslist of tuple, optional
User-specified pairs as a list of 2-tuples of feature names. If a given tuple contains only one non-empty feature name, then it generates all the pairs containing it (within the maximum limit n_pairs).
- all_possible_pairsbool, default True
If True tries to create all possible pairs within the limit max_pairs. The pairs and features given in specific_pairs have priority.
- construction_ruleslist of str, optional
- Allowed rules for the automatic feature construction. If not set, it uses all
possible rules.
- informative_features_onlybool, default
True
If
True
keeps only informative features.- group_target_valuebool, default
False
Allows grouping of the target values in classification. It can substantially increase the training time.
- keep_initial_variablesbool, default
False
If
True
the original columns are kept in the transformed data.- transform_type_categoricalstr, default “part_id”
- Type of transformation for categorical features. Valid values:
“part_id”
“part_label”
“dummies”
“conditional_info”
See the documentation for the
categorical_recoding_method
parameter of thetrain_recoder
function for more details.- transform_type_numericalstr, default “part_id”
- One of the following strings are valid:
“part_id”
“part_label”
“dummies”
“conditional_info”
“center_reduction”
“0-1_normalization”
“rank_normalization”
See the documentation for the
numerical_recoding_method
parameter of thetrain_recoder
function for more details.- transform_pairs: str, default “part_id”
- Type of transformation for bivariate features. Valid values:
“part_id”
“part_label”
“dummies”
“conditional_info”
- verbosebool, default
False
If
True
it prints debug information and it does not erase temporary files when fitting, predicting or transforming.- output_dirstr, optional
Path of the output directory for the
AllReports.khj
report file and theModeling.kdic
modeling dictionary file. By default these files are deleted.- auto_sortbool, default
True
Advanced. Only for multi-table inputs: If
True
input tables are pre-sorted by their key before executing Khiops. If the input tables are already sorted by their keys set this parameter toFalse
to speed up the processing. This affects thefit
andtransform
methods. Note The sort by key is performed in a left-to-right, hierarchical, lexicographic manner.- keystr, optional
Multi-table only : The name of the column to be used as key. Deprecated will be removed in Khiops 11. Use
dict
dataset specifications infit
andtransform
.- internal_sortbool, optional
Advanced. Only for multi-table inputs: If
True
input tables are pre-sorted by their key before executing Khiops. If the input tables are already sorted by their keys set this parameter toFalse
to speed up the processing. This affects thefit
andtransform
methods. Note The sort by key is performed in a left-to-right, hierarchical, lexicographic manner. Deprecated will be removed in Khiops 11. Use theauto_sort
estimator parameter instead.
- categorical_targetbool, default
- Attributes:
- n_features_evaluated_int
The number of features evaluated by the classifier.
- feature_evaluated_names_
ndarray
of shape (n_features_evaluated_,) Names of the features evaluated by the classifier.
- feature_evaluated_importances_
ndarray
of shape (n_features_evaluated_,) Level of the features evaluated by the classifier. The Level is measure of the predictive importance of the feature taken individually. It ranges between 0 (no predictive interest) and 1 (optimal predictive importance).
- is_fitted_bool
True
if the estimator is fitted.- is_multitable_model_bool
True
if the model was fitted on a multi-table dataset.- model_
DictionaryDomain
The Khiops dictionary domain for the trained encoder.
- model_main_dictionary_name_str
The name of the main Khiops dictionary within the
model_
domain.- model_report_
AnalysisResults
The Khiops report object.
- model_report_raw_dict
JSON object of the Khiops report. Deprecated will be removed in Khiops 11. Use the
json_data
attribute of themodel_report_
estimator attribute instead.
Examples
- See the following functions of the
samples_sklearn.py
documentation script:
- fit(X, y=None, **kwargs)¶
Fits the Khiops Encoder according to X, y
- Parameters:
- Xarray-like of shape (n_samples, n_features_in) or dict
Training dataset. Either an array-like or a
dict
specification for multi-table datasets (see Multi-Table Learning Primer).Deprecated input types (will be removed in Khiops 11):
tuple: A pair (
path_to_file
,separator
).list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using
key
estimator parameter.
- yarray-like of shape (n_samples,)
The target values.
Deprecated input types (will be removed in Khiops 11):
str: A path to a data table file for file-based
dict
dataset specifications.
- Returns:
- self
KhiopsEncoder
The calling estimator instance.
- self
- fit_transform(X, y=None, **kwargs)¶
Fit and transforms its inputs
- Parameters:
- Xarray-like of shape (n_samples, n_features_in) or dict
Training dataset. Either an array-like or a
dict
specification for multi-table datasets (see Multi-Table Learning Primer).Deprecated input types (will be removed in Khiops 11):
tuple: A pair (
path_to_file
,separator
).list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using
key
estimator parameter.
- yarray-like of shape (n_samples,)
The target values.
Deprecated input types (will be removed in Khiops 11):
str: A path to a data table file for file-based
dict
dataset specifications.
- Returns:
- self
KhiopsEncoder
The calling estimator instance.
- self
- transform(X)¶
Transforms X with a fitted Khiops supervised encoder
Note
Numerical features are encoded to categorical ones. See the
transform_type_numerical
parameter for details.- Parameters:
- Xarray-like of shape (n_samples, n_features_in) or dict
Training dataset. Either an array-like or a
dict
specification for multi-table datasets (see Multi-Table Learning Primer).Deprecated input types (will be removed in Khiops 11):
tuple: A pair (
path_to_file
,separator
).list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using
key
estimator parameter.
- Returns:
ndarray
An array containing the encoded columns. A first column containing key column ids is added in multi-table mode.
- class khiops.sklearn.estimators.KhiopsEstimator(key=None, verbose=False, output_dir=None, auto_sort=True, internal_sort=None)¶
Bases:
ABC
,BaseEstimator
Base class for Khiops Scikit-learn estimators
- Parameters:
- verbosebool, default
False
If
True
it prints debug information and it does not erase temporary files when fitting, predicting or transforming.- output_dirstr, optional
Path of the output directory for the resulting artifacts of Khiops learning tasks. See concrete estimator classes for more information about this parameter.
- auto_sortbool, default
True
Advanced.: See concrete estimator classes for information about this parameter.
- keystr, optional
The name of the column to be used as key. Deprecated will be removed in Khiops 11.
- internal_sortbool, optional
Advanced.: See concrete estimator classes for information about this parameter. Deprecated will be removed in Khiops 11. Use the
auto_sort
estimator parameter instead.
- verbosebool, default
- export_dictionary_file(dictionary_file_path)¶
Export the model’s Khiops dictionary file (.kdic)
- export_report_file(report_file_path)¶
Exports the model report to a JSON file
- Parameters:
- report_file_pathstr
The location of the exported report file.
- Raises:
ValueError
When the instance is not fitted.
- fit(X, y=None, **kwargs)¶
Fit the estimator
- Returns:
- self
KhiopsEstimator
The fitted estimator instance.
- self
- class khiops.sklearn.estimators.KhiopsPredictor(n_features=100, n_pairs=0, n_trees=10, n_selected_features=0, n_evaluated_features=0, specific_pairs=None, all_possible_pairs=True, construction_rules=None, verbose=False, output_dir=None, auto_sort=True, key=None, internal_sort=None)¶
Bases:
KhiopsSupervisedEstimator
Abstract Khiops Selective Naive Bayes Predictor
- predict(X)¶
Predicts the target variable for the test dataset X
See the documentation of concrete subclasses for more details.
- class khiops.sklearn.estimators.KhiopsRegressor(n_features=100, n_pairs=0, n_trees=0, n_selected_features=0, n_evaluated_features=0, specific_pairs=None, all_possible_pairs=True, construction_rules=None, verbose=False, output_dir=None, auto_sort=True, key=None, internal_sort=None)¶
Bases:
KhiopsPredictor
,RegressorMixin
Khiops Selective Naive Bayes Regressor
This regressor supports automatic feature engineering on multi-table datasets. See Multi-Table Learning Primer for more details.
Note
Visit the Khiops site to learn about the automatic feature engineering algorithm.
- Parameters:
- n_featuresint, default 100
Multi-table only : Maximum number of multi-table aggregate features to construct. See Multi-Table Learning Primer for more details.
- n_pairsint, default 0
Maximum number of pair features to construct. These features represent a 2D grid partition of the domain of a pair of features in which is optimized in a way that the cells are the purest possible with respect to the target. Only pairs which jointly are more informative that its univariate components may be taken into account in the regressor.
- n_selected_featuresint, default 0
Maximum number of features to be selected in the SNB predictor. If equal to 0 it selects all the features kept in the training.
- n_evaluated_featuresint, default 0
Maximum number of features to be evaluated in the SNB predictor training. If equal to 0 it evaluates all informative features.
- specific_pairslist of tuple, optional
User-specified pairs as a list of 2-tuples of feature names. If a given tuple contains only one non-empty feature name, then it generates all the pairs containing it (within the maximum limit n_pairs).
- all_possible_pairsbool, default True
If True tries to create all possible pairs within the limit max_pairs. The pairs and features given in specific_pairs have priority.
- construction_ruleslist of str, optional
- Allowed rules for the automatic feature construction. If not set, it uses all
possible rules.
- verbosebool, default
False
If
True
it prints debug information and it does not erase temporary files when fitting, predicting or transforming.- output_dirstr, optional
Path of the output directory for the
AllReports.khj
report file and theModeling.kdic
modeling dictionary file. By default these files are deleted.- auto_sortbool, default
True
Advanced. Only for multi-table inputs: If
True
input tables are pre-sorted by their key before executing Khiops. If the input tables are already sorted by their keys set this parameter toFalse
to speed up the processing. This affects thefit
andpredict
methods. Note The sort by key is performed in a left-to-right, hierarchical, lexicographic manner.- keystr, optional
Multi-table only : The name of the column to be used as key. Deprecated will be removed in Khiops 11. Use
dict
dataset specifications infit
,fit_predict
andpredict
.- internal_sortbool, optional
Advanced. Only for multi-table inputs: If
True
input tables are pre-sorted by their key before executing Khiops. If the input tables are already sorted by their keys set this parameter toFalse
to speed up the processing. This affects thefit
andpredict
methods. Note The sort by key is performed in a left-to-right, hierarchical, lexicographic manner. Deprecated will be removed in Khiops 11. Use theauto_sort
estimator parameter instead.
- Attributes:
- n_features_evaluated_int
The number of features evaluated by the classifier.
- feature_evaluated_names_
ndarray
of shape (n_features_evaluated_,) Names of the features evaluated by the classifier.
- feature_evaluated_importances_
ndarray
of shape (n_features_evaluated_,) Level of the features evaluated by the classifier. See below for a definition of the level.
- n_features_used_int
The number of features used by the classifier.
- feature_used_names_
ndarray
of shape (n_features_used_, ) Names of the features used by the classifier.
- feature_used_importances_
ndarray
of shape (n_features_used_, 3) Level, Weight and Importance of the features used by the classifier:
Level: A measure of the predictive importance of the feature taken individually. It ranges between 0 (no predictive interest) and 1 (optimal predictive importance).
Weight: A measure of the predictive importance of the feature taken relative to all features selected by the classifier. It ranges between 0 (little contribution to the model) and 1 (large contribution to the model).
Importance: The geometric mean between the Level and the Weight.
- is_fitted_bool
True
if the estimator is fitted.- is_multitable_model_bool
True
if the model was fitted on a multi-table dataset.- model_
DictionaryDomain
The Khiops dictionary domain for the trained regressor.
- model_main_dictionary_name_str
The name of the main Khiops dictionary within the
model_
domain.- model_report_
AnalysisResults
The Khiops report object.
- model_report_raw_dict
JSON object of the Khiops report. Deprecated will be removed in Khiops 11. Use the
json_data
attribute of themodel_report_
estimator attribute instead.
Examples
- See the following functions of the
samples_sklearn.py
documentation script:
- fit(X, y=None, **kwargs)¶
Fits a Selective Naive Bayes regressor according to X, y
Warning
Make sure that the type of
y
is float. This is easily done withy = y.astype(float)
.- Parameters:
- Xarray-like of shape (n_samples, n_features_in) or dict
Training dataset. Either an array-like or a
dict
specification for multi-table datasets (see Multi-Table Learning Primer).Deprecated input types (will be removed in Khiops 11):
tuple: A pair (
path_to_file
,separator
).list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using
key
estimator parameter.
- yarray-like of shape (n_samples,)
The target values.
Deprecated input types (will be removed in Khiops 11):
str: A path to a data table file for file-based
dict
dataset specifications.
- Returns:
- self
KhiopsRegressor
The calling estimator instance.
- self
- predict(X)¶
Predicts the regression values for the test dataset X
The predicted value is estimated by the Selective Naive Bayes Regressor learned during fit step.
- Parameters:
- Xarray-like of shape (n_samples, n_features_in) or dict
Training dataset. Either an array-like or a
dict
specification for multi-table datasets (see Multi-Table Learning Primer).Deprecated input types (will be removed in Khiops 11):
tuple: A pair (
path_to_file
,separator
).list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using
key
estimator parameter.
- Returns:
numpy.ndarray
or strAn array containing the encoded columns. A first column containing key column ids is added in multi-table mode. The key columns are added for multi-table tasks. The array is in the form of:
numpy.ndarray
if X is array-like, or dataset spec containingpandas.DataFrame
table.str (a path for the file containing the array) if X is a dataset spec containing file-path tables.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KhiopsRegressor ¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
- Returns:
- selfobject
The updated object.
- class khiops.sklearn.estimators.KhiopsSupervisedEstimator(n_features=100, n_pairs=0, n_trees=10, specific_pairs=None, all_possible_pairs=True, construction_rules=None, verbose=False, output_dir=None, auto_sort=True, key=None, internal_sort=None)¶
Bases:
KhiopsEstimator
Abstract Khiops Supervised Estimator
- fit(X, y=None, **kwargs)¶
Fits a supervised estimator according to X,y
Called by the concrete sub-classes
KhiopsEncoder
,KhiopsClassifier
,KhiopsRegressor
.- Parameters:
- Xarray-like of shape (n_samples, n_features_in) or dict
Training dataset. Either an array-like or a
dict
specification for multi-table datasets (see Multi-Table Learning Primer).Deprecated input types (will be removed in Khiops 11):
tuple: A pair (
path_to_file
,separator
).list: A sequence of dataframes or paths, or pairs path-separator. The first element of the list is the main table and the following are secondary ones joined to the main table using
key
estimator parameter.
- yarray-like of shape (n_samples,)
The target values.
Deprecated input types (will be removed in Khiops 11):
str: A path to a data table file for file-based
dict
dataset specifications.
- Returns:
- self
KhiopsSupervisedEstimator
The calling estimator instance.
- self