
Submodule of khiops.core

API for the execution of the Khiops AutoML suite

The methods in this module allow to execute all Khiops and Khiops Coclustering tasks.

See also:



Builds a dictionary file to read the output table of a deployed model


Builds a dictionary file by analyzing a data table file


Builds a multi-table dictionary from a dictionary with a key


Checks if a data table is compatible with a dictionary file


Deploys a model on a data table


Detects the format of a data table


Evaluates the predictors in a dictionary file on a database


Exports a Khiops dictionary file to JSON format (.kdicj)


Extracts clusters to a tab separated (TSV) file


Extracts from data table unique occurrences of a key variable


Returns the Khiops version


Returns the Khiops' samples directory path


Prepares a individual-variable coclustering deployment


Simplifies a coclustering model


Sorts a data table


Trains a coclustering model from a data table


Trains a model from a data table


Trains a recoding model from a data table

khiops.core.api.build_deployed_dictionary(dictionary_file_path_or_domain, dictionary_name, output_dictionary_file_path, batch_mode=True, log_file_path=None, output_scenario_path=None, task_file_path=None, trace=False, stdout_file_path='', stderr_file_path='', max_cores=None, memory_limit_mb=None, temp_dir='', scenario_prologue='', **kwargs)

Builds a dictionary file to read the output table of a deployed model

dictionary_file_path_or_domainstr or DictionaryDomain

Path of a Khiops dictionary file or a DictionaryDomain object.


Name of the dictionary to be analyzed.


Path of the output dictionary file.

See Common Parameters.


Invalid type of an argument


See the following functions of the documentation script:
khiops.core.api.build_dictionary_from_data_table(data_table_path, output_dictionary_name, output_dictionary_file_path, detect_format=True, header_line=None, field_separator=None, batch_mode=True, log_file_path=None, output_scenario_path=None, task_file_path=None, trace=False, stdout_file_path='', stderr_file_path='', max_cores=None, memory_limit_mb=None, temp_dir='', scenario_prologue='', **kwargs)

Builds a dictionary file by analyzing a data table file


Path of the data table file.


Name dictionary to be created.


Path of the output dictionary file.

detect_formatbool, default True

If True detects automatically whether the data table file has a header and its field separator. It is set to False if header_line or field_separator are set.

header_linebool, optional (default True)

If True it uses the first line of the data as column names. Sets detect_format to False if set. Ignored if detect_format is True.

field_separatorstr, optional (default “\t”)

A field separator character. “” has the same effect as “\t”. Sets detect_format to False if set. Ignored if detect_format is True.

See Common Parameters.

khiops.core.api.build_multi_table_dictionary(dictionary_file_path_or_domain, root_dictionary_name, secondary_table_variable_name, output_dictionary_file_path, overwrite_dictionary_file=False, batch_mode=True, log_file_path=None, output_scenario_path=None, task_file_path=None, trace=False)

Builds a multi-table dictionary from a dictionary with a key


This method is deprecated since Khiops 10.1.3 and will be removed in Khiops 11. Use the build_multi_table_dictionary_domain helper function to the same effect.

dictionary_file_path_or_domainstr or DictionaryDomain

Path of a Khiops dictionary file or a DictionaryDomain object.


Name for the new root dictionary


Name, in the root dictionary, for the “table” variable of the secondary table.


Path of the output dictionary path.

overwrite_dictionary_filebool, default False

If True it will overwrite an input dictionary file.

See Common Parameters.


Invalid values of an argument

khiops.core.api.check_database(dictionary_file_path_or_domain, dictionary_name, data_table_path, detect_format=True, header_line=None, field_separator=None, sample_percentage=100.0, sampling_mode='Include sample', selection_variable='', selection_value='', additional_data_tables=None, max_messages=20, batch_mode=True, log_file_path=None, output_scenario_path=None, task_file_path=None, trace=False, stdout_file_path='', stderr_file_path='', max_cores=None, memory_limit_mb=None, temp_dir='', scenario_prologue='', **kwargs)

Checks if a data table is compatible with a dictionary file

dictionary_file_path_or_domainstr or DictionaryDomain

Path of a Khiops dictionary file or a DictionaryDomain object.


Name of the dictionary of the table to be checked.


Path of the data table file.

detect_formatbool, default True

If True detects automatically whether the data table file has a header and its field separator. It is set to False if header_line or field_separator are set.

header_linebool, optional (default True)

If True it uses the first line of the data as column names. Sets detect_format to False if set. Ignored if detect_format is True.

field_separatorstr, optional (default “\t”)

A field separator character. “” has the same effect as “\t”. Sets detect_format to False if set. Ignored if detect_format is True.

sample_percentagefloat, default 100.0

See the sampling_mode option below.

sampling_mode“Include sample” or “Exclude sample”

If equal to “Include sample” it checks sample_percentage percent of the data; if equal to “Exclude sample” it checks the complement of the data selected with “Include sample”. See also Database Sampling.

selection_variablestr, default “”

It checks only the records such that the value of selection_variable is equal to selection_value. Ignored if equal to “”.

selection_value: str or int or float, default “”

See selection_variable option above. Ignored if equal to “”.

additional_data_tablesdict, optional

A dictionary containing the data paths and file paths for a multi-table dictionary file. For more details see Multi-Table Learning Primer.

max_messagesint, default 20

Maximum number of error messages to write in the log file.

See Common Parameters.


See the following function of the documentation script:
khiops.core.api.deploy_model(dictionary_file_path_or_domain, dictionary_name, data_table_path, output_data_table_path, detect_format=True, header_line=None, field_separator=None, sample_percentage=100.0, sampling_mode='Include sample', selection_variable='', selection_value='', additional_data_tables=None, output_header_line=True, output_field_separator='\t', output_additional_data_tables=None, batch_mode=True, log_file_path=None, output_scenario_path=None, task_file_path=None, trace=False, stdout_file_path='', stderr_file_path='', max_cores=None, memory_limit_mb=None, temp_dir='', scenario_prologue='', **kwargs)

Deploys a model on a data table

dictionary_file_path_or_domainstr or DictionaryDomain

Path of a Khiops dictionary file or a DictionaryDomain object. This file/object defines the model to be deployed. Note that this model is not necessarily a predictor, it can be a generic table transformation.


Name of the dictionary to be analyzed.


Path of the data table file.


Path of the output data file.

detect_formatbool, default True

If True detects automatically whether the data table file has a header and its field separator. It is set to False if header_line or field_separator are set.

header_linebool, optional (default True)

If True it uses the first line of the data as column names. Sets detect_format to False if set. Ignored if detect_format is True.

field_separatorstr, optional (default “\t”)

A field separator character. “” has the same effect as “\t”. Sets detect_format to False if set. Ignored if detect_format is True.

sample_percentagefloat, default 100.0

See sampling_mode option below.

sampling_mode“Include sample” or “Exclude sample”

If equal to “Include sample” it deploys the model on sample_percentage percent of the data. If equal to “Exclude sample” it deploys the model on the complement of the data selected with “Include sample”. See also Database Sampling.

selection_variablestr, default “”

It deploys only the records such that the value of selection_variable is equal to selection_value. Ignored if equal to “”.

selection_value: str or int or float, default “”

See selection_variable option above. Ignored if equal to “”.

additional_data_tablesdict, optional

A dictionary containing the data paths and file paths for a multi-table dictionary file. For more details see Multi-Table Learning Primer.

output_header_linebool, default True

If True writes a header line with the column names in the output table.

output_field_separatorstr, default “\t”

The field separator character for the output table (”” counts as “\t”).

output_additional_data_tablesdict, optional

A dictionary containing the output data paths and file paths for a multi-table dictionary file. For more details see Multi-Table Learning Primer.

results_prefixstr, default “”

Prefix of the result files. Deprecated will be removed in Khiops 11.

See Common Parameters.


Invalid type of an argument.


See the following functions of the documentation script:
khiops.core.api.detect_data_table_format(data_table_path, dictionary_file_path_or_domain=None, dictionary_name=None, trace=False, stdout_file_path='', stderr_file_path='', max_cores=None, memory_limit_mb=None, temp_dir='', scenario_prologue='')

Detects the format of a data table

Runs an heuristic to detect the format of a data table. The detection heuristic is more accurate if a dictionary with the table schema is provided.


Path of the data table file.

dictionary_file_path_or_domainstr or DictionaryDomain, optional

Path of a Khiops dictionary file or a DictionaryDomain object.

dictionary_namestr, optional

Name of the dictionary.

See Common Parameters.

A 2-tuple containing:
  • the header_line boolean

  • the field_separator character

These are exactly the parameters expected in many Khiops Python API functions.


See the following function of the documentation script:
khiops.core.api.evaluate_predictor(dictionary_file_path_or_domain, train_dictionary_name, data_table_path, results_dir, detect_format=True, header_line=None, field_separator=None, sample_percentage=100.0, sampling_mode='Include sample', selection_variable='', selection_value='', additional_data_tables=None, main_target_value='', results_prefix='', batch_mode=True, log_file_path=None, output_scenario_path=None, task_file_path=None, trace=False, stdout_file_path='', stderr_file_path='', max_cores=None, memory_limit_mb=None, temp_dir='', scenario_prologue='', **kwargs)

Evaluates the predictors in a dictionary file on a database

dictionary_file_path_or_domainstr or DictionaryDomain

Path of a Khiops dictionary file or a DictionaryDomain object.


Name of the main dictionary used while training the models.


Path of the evaluation data table file.


Path of the results directory.

detect_formatbool, default True

If True detects automatically whether the data table file has a header and its field separator. It is set to False if header_line or field_separator are set.

header_linebool, optional (default True)

If True it uses the first line of the data as column names. Sets detect_format to False if set. Ignored if detect_format is True.

field_separatorstr, optional (default “\t”)

A field separator character. “” has the same effect as “\t”. Sets detect_format to False if set. Ignored if detect_format is True.

sample_percentagefloat, default 100.0

See sampling_mode option below.

sampling_mode“Include sample” or “Exclude sample”

If equal to “Include sample” it evaluates the predictor on sample_percentage percent of the data. If equal to “Exclude sample” it evaluates the predictor on the complement of the data selected with “Include sample”. See also Database Sampling.

selection_variablestr, default “”

It trains with only the records such that the value of selection_variable is equal to selection_value. Ignored if equal “”.

selection_value: str or int or float, default “”

See selection_variable option above. Ignored if equal to “”.

additional_data_tablesdict, optional

A dictionary containing the data paths and file paths for a multi-table dictionary file. For more details see Multi-Table Learning Primer.


Use the initial dictionary name in the data paths.

main_target_valuestr, default “”

If this target value is specified then it guarantees the calculation of lift curves for it.

results_prefixstr, default “”

Prefix of the result files. Deprecated will be removed in Khiops 11.

See Common Parameters.


The path of the JSON evaluation report (extension .khj).


Invalid type of an argument.


See the following functions of the documentation script:
khiops.core.api.export_dictionary_as_json(dictionary_file_path_or_domain, json_dictionary_file_path, batch_mode=True, log_file_path=None, output_scenario_path=None, task_file_path=None, trace=False, stdout_file_path='', stderr_file_path='', max_cores=None, memory_limit_mb=None, temp_dir='', scenario_prologue='')

Exports a Khiops dictionary file to JSON format (.kdicj)

dictionary_file_path_or_domainstr or DictionaryDomain

Path of a Khiops dictionary file or a DictionaryDomain object.

See Common Parameters.


See the following function of the documentation script:
khiops.core.api.extract_clusters(coclustering_file_path, cluster_variable, clusters_file_path, max_preserved_information=0, max_cells=0, batch_mode=True, log_file_path=None, output_scenario_path=None, task_file_path=None, trace=False, stdout_file_path='', stderr_file_path='', max_cores=None, memory_limit_mb=None, temp_dir='', scenario_prologue='', **kwargs)

Extracts clusters to a tab separated (TSV) file


Path of the coclustering model file (extension .khc or .khcj).


Name of the variable for which the clusters are extracted.


Path of the output clusters TSV file.

max_preserved_informationint, default 0

Maximum information preserve in the simplified coclustering. If equal to 0 there is no limit.

max_cellsint, default 0

Maximum number of cells in the simplified coclustering. If equal to 0 there is no limit.

See Common Parameters.


See the following function of the documentation script:
khiops.core.api.extract_keys_from_data_table(dictionary_file_path_or_domain, dictionary_name, data_table_path, output_data_table_path, detect_format=True, header_line=None, field_separator=None, output_header_line=True, output_field_separator='\t', batch_mode=True, log_file_path=None, output_scenario_path=None, task_file_path=None, trace=False, stdout_file_path='', stderr_file_path='', max_cores=None, memory_limit_mb=None, temp_dir='', scenario_prologue='', **kwargs)

Extracts from data table unique occurrences of a key variable

dictionary_file_path_or_domainstr or DictionaryDomain

Path of a Khiops dictionary file or a DictionaryDomain object.


Name of the dictionary of the data table.


Path of the data table file.


Path of the output data file.

detect_formatbool, default True

If True detects automatically whether the data table file has a header and its field separator. It is set to False if header_line or field_separator are set.

header_linebool, optional (default True)

If True it uses the first line of the data as column names. Sets detect_format to False if set. Ignored if detect_format is True.

field_separatorstr, optional (default “\t”)

A field separator character. “” has the same effect as “\t”. Sets detect_format to False if set. Ignored if detect_format is True.

output_header_linebool, default True

If True writes a header line with the column names in the output table.

output_field_separatorstr, default “\t”

The field separator character for the output table (”” counts as “\t”).

See Common Parameters.


Invalid type of an argument.


See the following function of the documentation script:

Returns the Khiops version


The Khiops version of the current KhiopsRunner backend.


Returns the Khiops’ samples directory path


The path of the Khiops samples directory.

khiops.core.api.prepare_coclustering_deployment(dictionary_file_path_or_domain, dictionary_name, coclustering_file_path, table_variable, deployed_variable_name, results_dir, max_preserved_information=0, max_cells=0, max_part_numbers=None, build_cluster_variable=True, build_distance_variables=False, build_frequency_variables=False, variables_prefix='', results_prefix='', batch_mode=True, log_file_path=None, output_scenario_path=None, task_file_path=None, trace=False, stdout_file_path='', stderr_file_path='', max_cores=None, memory_limit_mb=None, temp_dir='', scenario_prologue='', **kwargs)

Prepares a individual-variable coclustering deployment

dictionary_file_path_or_domainstr or DictionaryDomain

Path of a Khiops dictionary file or a DictionaryDomain object.


Name of the dictionary to be analyzed.


Path of the coclustering model file (extension .khc or .khcj).


Name of the table variable in the dictionary.


Name of the coclustering variable to deploy.


Path of the results directory.

max_preserved_informationint, default 0

Maximum information preserve in the simplified coclustering. If equal to 0 there is no limit.

max_cellsint, default 0

Maximum number of cells in the simplified coclustering. If equal to 0 there is no limit.

max_part_numbersdict, optional

Dictionary associating variable names to their maximum number of parts to preserve in the simplified coclustering. For variables not present in max_part_numbers there is no limit.

build_cluster_variablebool, default True

If True includes a cluster id variable in the deployment.

build_distance_variablesbool, default False

If True includes a cluster distance variable in the deployment.

build_frequency_variablesbool, default False

If True includes the frequency variables in the deployment.

variables_prefixstr, default “”

Prefix for the variables in the deployment dictionary.

results_prefixstr, default “”

Prefix of the result files. Deprecated will be removed in Khiops 11.

See Common Parameters.


Invalid type of an argument


See the following function of the documentation script:
khiops.core.api.simplify_coclustering(coclustering_file_path, simplified_coclustering_file_path, results_dir, max_preserved_information=0, max_cells=0, max_total_parts=0, max_part_numbers=None, results_prefix='', batch_mode=True, log_file_path=None, output_scenario_path=None, task_file_path=None, trace=False, stdout_file_path='', stderr_file_path='', max_cores=None, memory_limit_mb=None, temp_dir='', scenario_prologue='', **kwargs)

Simplifies a coclustering model


Path of the coclustering file (extension .khc, or .khcj).


Path of the output coclustering file.


Path of the results directory.

max_preserved_informationint, default 0

Maximum information preserve in the simplified coclustering. If equal to 0 there is no limit.

max_cellsint, default 0

Maximum number of cells in the simplified coclustering. If equal to 0 there is no limit.

max_total_partsint, default 0

Maximum number of parts totaled over all variables. If equal to 0 there is no limit.

max_part_numbersdict, optional

Dictionary that associate variable names to their maximum number of parts to preserve in the simplified coclustering. If not set there is no limit.

results_prefixstr, default “”

Prefix of the result files. Deprecated will be removed in Khiops 11.

See Common Parameters.


Invalid type of an argument.


See the following function of the documentation script:
khiops.core.api.sort_data_table(dictionary_file_path_or_domain, dictionary_name, data_table_path, output_data_table_path, sort_variables=None, detect_format=True, header_line=None, field_separator=None, output_header_line=True, output_field_separator='\t', batch_mode=True, log_file_path=None, output_scenario_path=None, task_file_path=None, trace=False, stdout_file_path='', stderr_file_path='', max_cores=None, memory_limit_mb=None, temp_dir='', scenario_prologue='', **kwargs)

Sorts a data table

dictionary_file_path_or_domainstr or DictionaryDomain

Path of a Khiops dictionary file or a DictionaryDomain object.


Name of the dictionary to be analyzed.


Path of the data table file.


Path of the output data file.

sort_variableslist of str, optional

The names of the variables to sort. If not set sorts the table by its key.

detect_formatbool, default True

If True detects automatically whether the data table file has a header and its field separator. It is set to False if header_line or field_separator are set.

header_linebool, optional (default True)

If True it uses the first line of the data as column names. Sets detect_format to False if set. Ignored if detect_format is True.

field_separatorstr, optional (default “\t”)

A field separator character. “” has the same effect as “\t”. Sets detect_format to False if set. Ignored if detect_format is True.

output_header_linebool, default True

If True writes a header line with the column names in the output table.

output_field_separatorstr, default “\t”

The field separator character for the output table (”” counts as “\t”).

See Common Parameters.


Invalid type of a argument.


See the following functions of the documentation script:
khiops.core.api.train_coclustering(dictionary_file_path_or_domain, dictionary_name, data_table_path, coclustering_variables, results_dir, detect_format=True, header_line=None, field_separator=None, sample_percentage=100.0, sampling_mode='Include sample', selection_variable='', selection_value='', additional_data_tables=None, frequency_variable='', min_optimization_time=0, results_prefix='', batch_mode=True, log_file_path=None, output_scenario_path=None, task_file_path=None, trace=False, stdout_file_path='', stderr_file_path='', max_cores=None, memory_limit_mb=None, temp_dir='', scenario_prologue='', **kwargs)

Trains a coclustering model from a data table

dictionary_file_path_or_domainstr or DictionaryDomain

Path of a Khiops dictionary file or a DictionaryDomain object.


Name of the dictionary to be analyzed.


Path of the data table file.

coclustering_variableslist of str

The names of variables to use in coclustering. Min length: 2. Max length: 10.


Path of the results directory.

detect_formatbool, default True

If True detects automatically whether the data table file has a header and its field separator. It is set to False if header_line or field_separator are set.

header_linebool, optional (default True)

If True it uses the first line of the data as column names. Sets detect_format to False if set. Ignored if detect_format is True.

field_separatorstr, optional (default “\t”)

A field separator character. “” has the same effect as “\t”. Sets detect_format to False if set. Ignored if detect_format is True.

sample_percentagefloat, default 100.0

See sampling_mode option below.

sampling_mode“Include sample” or “Exclude sample”

If equal to “Include sample” it trains the coclustering estimator on sample_percentage percent of the data. If equal to “Exclude sample” it trains the coclustering estimator on the complement of the data selected with “Include sample”. See also Database Sampling.

selection_variablestr, default “”

It trains with only the records such that the value of selection_variable is equal to selection_value. Ignored if equal to “”.

selection_value: str or int or float, default “”

See selection_variable option above. Ignored if equal to “”.

additional_data_tablesdict, optional

A dictionary containing the data paths and file paths for a multi-table dictionary file. For more details see Multi-Table Learning Primer.

frequency_variablestr, default “”

Name of frequency variable.

min_optimization_timeint, default 0

Minimum optimization time in seconds.

results_prefixstr, default “”

Prefix of the result files. Deprecated will be removed in Khiops 11.

See Common Parameters.


The path of the of the resulting coclustering file.


Number of coclustering variables out of the range 2-10.


Invalid type of an argument.


See the following function of the documentation script:
khiops.core.api.train_predictor(dictionary_file_path_or_domain, dictionary_name, data_table_path, target_variable, results_dir, detect_format=True, header_line=None, field_separator=None, sample_percentage=70.0, sampling_mode='Include sample', use_complement_as_test=True, selection_variable='', selection_value='', additional_data_tables=None, main_target_value='', snb_predictor=True, univariate_predictor_number=0, max_evaluated_variables=0, max_selected_variables=0, max_constructed_variables=100, construction_rules=None, max_trees=10, max_pairs=0, all_possible_pairs=True, specific_pairs=None, group_target_value=False, discretization_method=None, min_interval_frequency=0, max_intervals=0, grouping_method=None, min_group_frequency=0, max_groups=0, results_prefix='', batch_mode=True, log_file_path=None, output_scenario_path=None, task_file_path=None, trace=False, stdout_file_path='', stderr_file_path='', max_cores=None, memory_limit_mb=None, temp_dir='', scenario_prologue='', **kwargs)

Trains a model from a data table

dictionary_file_path_or_domainstr or DictionaryDomain

Path of a Khiops dictionary file or a DictionaryDomain object.


Name of the dictionary to be analyzed.


Path of the data table file.


Name of the target variable. If the specified variable is categorical it constructs a classifier and if it is numerical a regressor. If equal to “” it performs an unsupervised analysis.


Path of the results directory.

detect_formatbool, default True

If True detects automatically whether the data table file has a header and its field separator. It is set to False if header_line or field_separator are set.

header_linebool, optional (default True)

If True it uses the first line of the data as column names. Sets detect_format to False if set. Ignored if detect_format is True.

field_separatorstr, optional (default “\t”)

A field separator character. “” has the same effect as “\t”. Sets detect_format to False if set. Ignored if detect_format is True.

sample_percentagefloat, default 70.0

See the sampling_mode option below.

sampling_mode“Include sample” or “Exclude sample”

If equal to “Include sample” it trains the predictor on sample_percentage percent of the data and tests the model on the remainder of the data if use_complement_as_test is set to True. If equal to “Exclude sample” the train and test datasets above are exchanged. See also Database Sampling.

use_complement_as_testbool, default True

Uses the complement of the sampled database as test database for computing the model’s performance metrics.

fill_test_database_settingsbool, default False

It creates a test database as the complement of the train database. Deprecated will be removed in Khiops 11, use use_complement_as_test

selection_variablestr, default “”

It trains with only the records such that the value of selection_variable is equal to selection_value. Ignored if equal to “”.

selection_value: str or int or float, default “”

See selection_variable option above. Ignored if equal to “”.

additional_data_tablesdict, optional

A dictionary containing the data paths and file paths for a multi-table dictionary file. For more details see Multi-Table Learning Primer.

main_target_valuestr, default “”

If this target value is specified then it guarantees the calculation of lift curves for it.

snb_predictorbool, default True

If True it trains a Selective Naive Bayes predictor. Deprecated will be removed in Khiops 11.

univariate_predictor_numberint, default 0

Number of univariate predictors to train.**Deprecated** will be removed in Khiops 11.

map_predictorbool, default False

If True trains a Maximum a Posteriori Naive Bayes predictor. Deprecated will be removed in Khiops Python 11.

max_evaluated_variablesint, default 0

Maximum number of variables to be evaluated in the SNB predictor training. If equal to 0 it evaluates all informative variables.

max_selected_variablesint, default 0

Maximum number of variables to be selected in the SNB predictor. If equal to 0 it selects all the variables kept in the training.

max_constructed_variablesint, default 100

Maximum number of variables to construct.

construction_ruleslist of str, optional

Allowed rules for the automatic variable construction. If not set it uses all possible rules.

max_treesint, default 10

Maximum number of trees to construct. Not yet available in regression.

max_pairsint, default 0

Maximum number of variables pairs to construct.

specific_pairslist of tuple, optional

User-specified pairs as a list of 2-tuples of feature names. If a given tuple contains only one non-empty feature name, then it generates all the pairs containing it (within the maximum limit max_pairs). These pairs have top priority: they are constructed first.

all_possible_pairsbool, default True

If True tries to create all possible pairs within the limit max_pairs. Pairs specified with specific_pairs have top priority: they are constructed first.

only_pairs_withstr, default “”

Constructs only pairs with the specified variable name. If equal to the empty string “” it considers all variables to make pairs. Deprecated will be removed in Khiops Python 11, use specific_pairs.

group_target_valuebool, default False

Allows grouping of the target variable values in classification. It can substantially increase the training time.

Name of the discretization method. Its valid values depend on the task:
  • Supervised: “MODL” (default), “EqualWidth” or “EqualFrequency”

  • Unsupervised: “EqualWidth” (default), “EqualFrequency” or “None”

min_interval_frequencyint, default 0

Minimum number of instances in an interval. If equal to 0 it is automatically calculated. Deprecated will be removed in Khiops 11.

max_intervalsint, default 0

Maximum number of intervals to construct. If equal to 0 it is automatically calculated. Deprecated will be replaced by max_parts in Khiops 11.

Name of the grouping method. Its valid values depend on the task:
  • Supervised: “MODL” (default) or “BasicGrouping”

  • Unsupervised: “BasicGrouping” (default) or “None”

min_group_frequencyint, default 0

Minimum number of instances for a group. Deprecated will be removed in Khiops 11.

max_groupsint, default 0

Maximum number of groups. If equal to 0 it is automatically calculated. Deprecated will be replaced by max_parts in Khiops 11.

results_prefixstr, default “”

Prefix of the result files. Deprecated will be removed in Khiops 11.

See Common Parameters.

A 2-tuple containing:
  • The reports file path

  • The modeling dictionary file path in the supervised case.


Invalid values of an argument


Invalid type of an argument


See the following functions of the documentation script:
khiops.core.api.train_recoder(dictionary_file_path_or_domain, dictionary_name, data_table_path, target_variable, results_dir, detect_format=True, header_line=None, field_separator=None, sample_percentage=100.0, sampling_mode='Include sample', selection_variable='', selection_value='', additional_data_tables=None, max_constructed_variables=100, construction_rules=None, max_trees=0, max_pairs=0, all_possible_pairs=True, specific_pairs=None, informative_variables_only=True, max_variables=0, keep_initial_categorical_variables=False, keep_initial_numerical_variables=False, categorical_recoding_method='part Id', numerical_recoding_method='part Id', pairs_recoding_method='part Id', group_target_value=False, discretization_method=None, min_interval_frequency=0, max_intervals=0, grouping_method=None, min_group_frequency=0, max_groups=0, results_prefix='', batch_mode=True, log_file_path=None, output_scenario_path=None, task_file_path=None, trace=False, stdout_file_path='', stderr_file_path='', max_cores=None, memory_limit_mb=None, temp_dir='', scenario_prologue='', **kwargs)

Trains a recoding model from a data table

A recoding model consists in the discretization of numerical variables and the grouping of categorical variables.

If the target_variable is specified these partitions are constructed in supervised mode, meaning that each resulting discretizations/groupings best separates the target variable while maintaining a simple interval/group model of the data. Different recoding methods can be specified via the numerical_recoding_method, categorical_recoding_method and pairs_recoding_method options.

The output files of this process contain a dictionary file (.kdic) that can be used to recode databases with the deploy_model function.

dictionary_file_path_or_domainstr or DictionaryDomain

Path of a Khiops dictionary file or a DictionaryDomain object.


Name of the dictionary to be recoded.


Path of the data table file.


Name of the target variable. If equal to “” it trains an unsupervised recoder.


Path of the results directory.

detect_formatbool, default True

If True detects automatically whether the data table file has a header and its field separator. It is set to False if header_line or field_separator are set.

header_linebool, optional (default True)

If True it uses the first line of the data as column names. Sets detect_format to False if set. Ignored if detect_format is True.

field_separatorstr, optional (default “\t”)

A field separator character. “” has the same effect as “\t”. Sets detect_format to False if set. Ignored if detect_format is True.

sample_percentagefloat, default 100.0

See sampling_mode option below.

sampling_mode“Include sample” or “Exclude sample”

If equal to “Include sample” it trains the recoder on sample_percentage percent of the data. If equal to “Exclude sample” it trains the recoder on the complement of the data selected with “Include sample”. See also Database Sampling.

selection_variablestr, default “”

It trains with only the records such that the value of selection_variable is equal to selection_value. Ignored if equal to “”.

selection_value: str or int or float, default “”

See selection_variable option above. Ignored if equal to “”.

additional_data_tablesdict, optional

A dictionary containing the data paths and file paths for a multi-table dictionary file. For more details see Multi-Table Learning Primer.

max_constructed_variablesint, default 100

Maximum number of variables to construct.

construction_ruleslist of str, optional

Allowed rules for the automatic variable construction. If not set it uses all possible rules.

max_treesint, default 0

Maximum number of trees to construct. Not yet available in regression.

max_pairsint, default 0

Maximum number of variables pairs to construct.

specific_pairslist of tuple, optional

User-specified pairs as a list of 2-tuples of feature names. If a given tuple contains only one non-empty feature name, then it generates all the pairs containing it (within the maximum limit max_pairs). These pairs have top priority: they are constructed first.

all_possible_pairsbool, default True

If True tries to create all possible pairs within the limit max_pairs. Pairs specified with specific_pairs have top priority: they are constructed first.

only_pairs_withstr, default “”

Constructs only pairs with the specified variable name. If equal to the empty string “” it considers all variables to make pairs. Deprecated will be removed in Khiops Python 11, use specific_pairs.

group_target_valuebool, default False

Allows grouping of the target variable values in classification. It can substantially increase the training time.

Name of the discretization method. Its valid values depend on the task:
  • Supervised: “MODL” (default), “EqualWidth” or “EqualFrequency”

  • Unsupervised: “EqualWidth” (default), “EqualFrequency” or “None”

min_interval_frequencyint, default 0

Minimum number of instances in an interval. If equal to 0 it is automatically calculated. Deprecated will be removed in Khiops 11.

max_intervalsint, default 0

Maximum number of intervals to construct. If equal to 0 it is automatically calculated. Deprecated will be replaced by max_parts in Khiops 11.

informative_variables_onlybool, default True

If True keeps only informative variables.

max_variablesint, default 0

Maximum number of variables to keep. If equal to 0 keeps all variables.

keep_initial_categorical_variablesbool, default True

If True keeps the initial categorical variables.

keep_initial_numerical_variablesbool, default True

If True keeps initial numerical variables.

Type of recoding for categorical variables. Types available:
  • “part Id” (default): An id for the interval/group

  • “part label”: A label for the interval/group

  • “0-1 binarization”: A 0’s and 1’s coding the interval/group id

  • “conditional info”: Conditional information of the interval/group

  • “none”: Keeps the variable as-is

Type of recoding recoding for numerical variables. Types available:
  • “part Id” (default): An id for the interval/group

  • “part label”: A label for the interval/group

  • “0-1 binarization”: A 0’s and 1’s coding the interval/group id

  • “conditional info”: Conditional information of the interval/group

  • “center-reduction”: “(X - Mean(X)) / StdDev(X)”

  • “0-1 normalization”: “(X - Min(X)) / (Max(X) - Min(X))”

  • “rank normalization”: mean normalized rank (between 0 and 1) of the instances

  • “none”: Keeps the variable as-is

Type of recoding for bivariate variables. Types available:
  • “part Id” (default): An id for the interval/group

  • “part label”: A label for the interval/group

  • “0-1 binarization”: A 0’s and 1’s coding the interval/group id

  • “conditional info”: Conditional information of the interval/group

  • “none”: Keeps the variable as-is

Name of the grouping method. Its valid values depend on the task:
  • Supervised: “MODL” (default) or “BasicGrouping”

  • Unsupervised: “BasicGrouping” (default) or “None”

min_group_frequencyint, default 0

Minimum number of instances for a group. Deprecated will be removed in Khiops 11.

max_groupsint, default 0

Maximum number of groups. If equal to 0 it is automatically calculated. Deprecated will be replaced by max_parts in Khiops 11.

results_prefixstr, default “”

Prefix of the result files. Deprecated will be removed in Khiops 11.

See Common Parameters.

A 2-tuple containing:
  • The path of the JSON file report of the process

  • The path of the dictionary containing the recoding model


See the following functions of the documentation script: