core.helpers

Submodule of khiops.core

Helper functions for specific and/or advanced treatments

Functions

build_multi_table_dictionary_domain

Builds a multi-table dictionary domain from a dictionary with a key

deploy_coclustering

Deploys an individual-variable coclustering on a data table

deploy_predictor_for_metrics

Deploys the necessary data to estimate the performance metrics of a predictor

visualize_report

Opens a Khiops or Khiops Coclustering report with the desktop visualization app

khiops.core.helpers.build_multi_table_dictionary_domain(dictionary_domain, root_dictionary_name, secondary_table_variable_name)

Builds a multi-table dictionary domain from a dictionary with a key

Note

This is a special-purpose function whose goal is to assist in preparing the coclustering deployment.

This function builds a new root dictionary and adds it to an existing dictionary domain. The new root dictionary only contains one field, which references a preexisting dictionary from the input dictionary domain as a new (secondary) Table variable. The preexisting dictionary must have a key set on it, as this is the join key with the new root table.

Warning

This method is deprecated since Khiops 10.3.1.0 and will be removed in Khiops 11.

Parameters:
dictionary_domainDictionaryDomain

DictionaryDomain object. Its root dictionary must have its key set.

root_dictionary_namestr

Name for the new root dictionary

secondary_table_variable_namestr

Name, in the root dictionary, for the “table” variable of the secondary table.

Returns:
DictionaryDomain

The new dictionary domain

Raises:
TypeError

Invalid type of an argument

ValueError

Invalid values of an argument: - the dictionary domain doesn’t contain at least a dictionary - the dictionary domain’s root dictionary doesn’t have a key set

khiops.core.helpers.deploy_coclustering(dictionary_file_path_or_domain, dictionary_name, data_table_path, coclustering_file_path, key_variable_names, deployed_variable_name, results_dir, detect_format=True, header_line=None, field_separator=None, output_header_line=True, output_field_separator='\t', max_preserved_information=0, max_cells=0, max_part_numbers=None, build_cluster_variable=True, build_distance_variables=False, build_frequency_variables=False, variables_prefix='', results_prefix='', batch_mode=True, log_file_path=None, output_scenario_path=None, task_file_path=None, trace=False)

Deploys an individual-variable coclustering on a data table

This procedure generates the following files in results_dir:
  • Coclustering.kdic: A multi-table dictionary file for further deployments of the coclustering with deploy_model

  • Keys<data_table_file_name>: A data table file containing only the keys of individual

  • Deployed<data_table_file_name>: A data table file containing the deployed coclustering model

Parameters:
dictionary_file_path_or_domainstr or DictionaryDomain

Path of a Khiops dictionary file or a DictionaryDomain object.

dictionary_namestr

Name of the dictionary to be analyzed.

data_table_pathstr

Path of the data table file.

coclustering_file_pathstr

Path of the coclustering model file (extension .khc or .khcj)

key_variable_nameslist of str

Names of the variables forming the unique keys of the individuals.

deployed_variable_namestr

Name of the coclustering variable to deploy.

results_dirstr

Path of the results directory.

detect_formatbool, default True

If True detects automatically whether the data table file has a header and its field separator. It’s ignored if header_line or field_separator are set.

header_linebool, optional (default True if detect_format is False)

If True it uses the first line of the data as column names. Overrides detect_format if set.

field_separatorstr, optional (default “\t” if detect_format is False)

A field separator character, overrides detect_format if set (”” counts as “\t”).

output_header_linebool, default True

If True writes a header line containing the column names in the output table.

output_field_separatorstr, default “\t”

A field separator character (empty string counts as tab).

max_preserved_informationint, default 0

Maximum information preserve in the simplified coclustering. If equal to 0 there is no limit.

max_cellsint, default 0

Maximum number of cells in the simplified coclustering. If equal to 0 there is no limit.

max_part_numbersdict, optional

Dictionary associating variable names to their maximum number of parts to preserve in the simplified coclustering. For variables not present in max_part_numbers there is no limit.

build_cluster_variablebool, default True

If True includes a cluster id variable in the deployment.

build_distance_variablesbool, default False

If True includes a cluster distance variable in the deployment.

build_frequency_variablesbool, default False

If True includes the frequency variables in the deployment.

variables_prefixstr, default “”

Prefix for the variables in the deployment dictionary.

results_prefixstr, default “”

Prefix of the result files.

Options of the KhiopsRunner.run method from the class KhiopsRunner.

Returns:
tuple

A 2-tuple containing:

  • The deployed data table path

  • The deployment dictionary file path.

Raises:
TypeError

Invalid type dictionary_file_path_or_domain or key_variable_names

ValueError

If the type of the dictionary key variables is not equal to Categorical

Examples

See the following function of the samples.py documentation script:
khiops.core.helpers.deploy_predictor_for_metrics(dictionary_file_path_or_domain, dictionary_name, data_table_path, output_data_table_path, detect_format=True, header_line=None, field_separator=None, sample_percentage=70, sampling_mode='Include sample', additional_data_tables=None, output_header_line=True, output_field_separator='\t', trace=False)

Deploys the necessary data to estimate the performance metrics of a predictor

For each instance for each instance it deploys:

  • The true value of the target variable

  • The predicted value of the target variable

  • The probabilities of each value of the target variable (classifier only)

Note

To obtain the data of the default Khiops test dataset use sample_percentage = 70 and sampling_mode = "Exclude sample".

Parameters:
dictionary_file_path_or_domainstr or DictionaryDomain

Path of a Khiops dictionary file or a DictionaryDomain object.

dictionary_namestr

Name of the predictor dictionary.

data_table_pathstr

Path of the data table file.

output_data_table_pathstr

Path of the scores output data file.

detect_formatbool, default True

If True detects automatically whether the data table file has a header and its field separator. It’s ignored if header_line or field_separator are set.

header_linebool, optional (default True if detect_format is False)

If True it uses the first line of the data as column names. Overrides detect_format if set.

field_separatorstr, optional (default “\t” if detect_format is False)

A field separator character, overrides detect_format if set (”” counts as “\t”).

sample_percentageint, default 70

See sampling_mode option below.

sampling_mode“Include sample” or “Exclude sample”

If equal to “Include sample” deploys the predictor on sample_percentage percent of data and if equal to “Exclude sample” on the complementary 100 - sample_percentage percent of data.

additional_data_tablesdict, optional

A dictionary containing the data paths and file paths for a multi-table dictionary file. For more details see Multi-Table Learning Primer documentation.

output_header_linebool, default True

If True writes a header line containing the column names in the output table.

output_field_separatorstr, default “\t”

A field separator character (”” counts as “\t”).

Options of the KhiopsRunner.run method from the class KhiopsRunner.

khiops.core.helpers.visualize_report(report_path)

Opens a Khiops or Khiops Coclustering report with the desktop visualization app

Before using this function, make sure you have installed the Khiops Visualization app and/or the Khiops Co-Visualization app. More info at https://khiops.org/setup/visualization/

Parameters:
report_pathstr

The path of the report file to be open. It must have extension ‘.khj’ (Khiops report) or ‘.khcj’ (Khiops Coclustering report).

Raises:
ValueError

If the report file path does not have extension ‘.khj’ or ‘.khcj’.

FileNotFoundError

If the report file does not exist.

RuntimeError

If the report file is executable.