core.helpers#

Submodule of khiops.core

Helper functions for specific and/or advanced treatments

Functions#

build_multi_table_dictionary_domain

Builds a multi-table dictionary domain from a dictionary with a key

deploy_coclustering

Deploys an individual-variable coclustering on a data table

deploy_predictor_for_metrics

Deploys the necessary data to estimate the performance metrics of a predictor

khiops.core.helpers.build_multi_table_dictionary_domain(dictionary_domain, root_dictionary_name, secondary_table_variable_name)#

Builds a multi-table dictionary domain from a dictionary with a key

Parameters:
dictionary_domainDictionaryDomain

DictionaryDomain object. Its root dictionary must have its key set.

root_dictionary_namestr

Name for the new root dictionary

secondary_table_variable_namestr

Name, in the root dictionary, for the “table” variable of the secondary table.

Raises:
TypeError

Invalid type of an argument

ValueError

Invalid values of an argument: - the dictionary domain doesn’t contain at least a dictionary - the dictionary domain’s root dictionary doesn’t have a key set

khiops.core.helpers.deploy_coclustering(dictionary_file_path_or_domain, dictionary_name, data_table_path, coclustering_file_path, key_variable_names, deployed_variable_name, results_dir, detect_format=True, header_line=None, field_separator=None, output_header_line=True, output_field_separator='\t', max_preserved_information=0, max_cells=0, max_part_numbers=None, build_cluster_variable=True, build_distance_variables=False, build_frequency_variables=False, variables_prefix='', results_prefix='', batch_mode=True, log_file_path=None, output_scenario_path=None, task_file_path=None, trace=False)#

Deploys an individual-variable coclustering on a data table

This procedure generates the following files in results_dir:
  • Coclustering.kdic: A multi-table dictionary file for further deployments of the coclustering with deploy_model

  • Keys<data_table_file_name>: A data table file containing only the keys of individual

  • Deployed<data_table_file_name>: A data table file containing the deployed coclustering model

Parameters:
dictionary_file_path_or_domainstr or DictionaryDomain

Path of a Khiops dictionary file or a DictionaryDomain object.

dictionary_namestr

Name of the dictionary to be analyzed.

data_table_pathstr

Path of the data table file.

coclustering_file_pathstr

Path of the coclustering model file (extension .khc or .khcj)

key_variable_nameslist of str

Names of the variables forming the unique keys of the individuals.

deployed_variable_namestr

Name of the coclustering variable to deploy.

results_dirstr

Path of the results directory.

detect_formatbool, default True

If True detects automatically whether the data table file has a header and its field separator. It’s ignored if header_line or field_separator are set.

header_linebool, optional (default True if detect_format is False)

If True it uses the first line of the data as column names. Overrides detect_format if set.

field_separatorstr, optional (default “\t” if detect_format is False)

A field separator character, overrides detect_format if set (”” counts as “\t”).

output_header_linebool, default True

If True writes a header line containing the column names in the output table.

output_field_separatorstr, default “\t”

A field separator character (empty string counts as tab).

max_preserved_informationint, default 0

Maximum information preserve in the simplified coclustering. If equal to 0 there is no limit.

max_cellsint, default 0

Maximum number of cells in the simplified coclustering. If equal to 0 there is no limit.

max_part_numbersdict, optional

Dictionary associating variable names to their maximum number of parts to preserve in the simplified coclustering. For variables not present in max_part_numbers there is no limit.

build_cluster_variablebool, default True

If True includes a cluster id variable in the deployment.

build_distance_variablesbool, default False

If True includes a cluster distance variable in the deployment.

build_frequency_variablesbool, default False

If True includes the frequency variables in the deployment.

variables_prefixstr, default “”

Prefix for the variables in the deployment dictionary.

results_prefixstr, default “”

Prefix of the result files.

Options of the KhiopsRunner.run method from the class KhiopsRunner.

Returns:
tuple

A 2-tuple containing:

  • The deployed data table path

  • The deployment dictionary file path.

Raises:
TypeError

Invalid type dictionary_file_path_or_domain or key_variable_names

ValueError

If the type of the dictionary key variables is not equal to Categorical

Examples

See the following function of the samples.py documentation script:
khiops.core.helpers.deploy_predictor_for_metrics(dictionary_file_path_or_domain, dictionary_name, data_table_path, output_data_table_path, detect_format=True, header_line=None, field_separator=None, sample_percentage=70, sampling_mode='Include sample', additional_data_tables=None, output_header_line=True, output_field_separator='\t', trace=False)#

Deploys the necessary data to estimate the performance metrics of a predictor

For each instance for each instance it deploys:

  • The true value of the target variable

  • The predicted value of the target variable

  • The probabilities of each value of the target variable (classifier only)

Note

To obtain the data of the default Khiops test dataset use sample_percentage = 70 and sampling_mode = "Exclude sample".

Parameters:
dictionary_file_path_or_domainstr or DictionaryDomain

Path of a Khiops dictionary file or a DictionaryDomain object.

dictionary_namestr

Name of the predictor dictionary.

data_table_pathstr

Path of the data table file.

output_data_table_pathstr

Path of the scores output data file.

detect_formatbool, default True

If True detects automatically whether the data table file has a header and its field separator. It’s ignored if header_line or field_separator are set.

header_linebool, optional (default True if detect_format is False)

If True it uses the first line of the data as column names. Overrides detect_format if set.

field_separatorstr, optional (default “\t” if detect_format is False)

A field separator character, overrides detect_format if set (”” counts as “\t”).

sample_percentageint, default 70

See sampling_mode option below.

sampling_mode“Include sample” or “Exclude sample”

If equal to “Include sample” deploys the predictor on sample_percentage percent of data and if equal to “Exclude sample” on the complementary 100 - sample_percentage percent of data.

additional_data_tablesdict, optional

A dictionary containing the data paths and file paths for a multi-table dictionary file. For more details see Multi-Table Learning Primer documentation.

output_header_linebool, default True

If True writes a header line containing the column names in the output table.

output_field_separatorstr, default “\t”

A field separator character (”” counts as “\t”).

Options of the KhiopsRunner.run method from the class KhiopsRunner.