core.helpers¶
Submodule of khiops.core
Helper functions for specific and/or advanced treatments
Functions¶
Builds a multi-table dictionary domain from a dictionary with a key |
|
Deploys an individual-variable coclustering on a data table |
|
Deploys the necessary data to estimate the performance metrics of a predictor |
|
Opens a Khiops or Khiops Coclustering report with the desktop visualization app |
- khiops.core.helpers.build_multi_table_dictionary_domain(dictionary_domain, root_dictionary_name, secondary_table_variable_name)¶
Builds a multi-table dictionary domain from a dictionary with a key
Note
This is a special-purpose function whose goal is to assist in preparing the coclustering deployment.
This function builds a new root dictionary and adds it to an existing dictionary domain. The new root dictionary only contains one field, which references a preexisting dictionary from the input dictionary domain as a new (secondary) Table variable. The preexisting dictionary must have a key set on it, as this is the join key with the new root table.
Warning
This method is deprecated since Khiops 10.3.1.0 and will be removed in Khiops 11.
- Parameters:
- dictionary_domain
DictionaryDomain DictionaryDomain object. Its root dictionary must have its key set.
- root_dictionary_namestr
Name for the new root dictionary
- secondary_table_variable_namestr
Name, in the root dictionary, for the “table” variable of the secondary table.
- dictionary_domain
- Returns:
DictionaryDomainThe new dictionary domain
- Raises:
TypeErrorInvalid type of an argument
ValueErrorInvalid values of an argument: - the dictionary domain doesn’t contain at least a dictionary - the dictionary domain’s root dictionary doesn’t have a key set
- khiops.core.helpers.deploy_coclustering(dictionary_file_path_or_domain, dictionary_name, data_table_path, coclustering_file_path, key_variable_names, deployed_variable_name, results_dir, detect_format=True, header_line=None, field_separator=None, output_header_line=True, output_field_separator='\t', max_preserved_information=0, max_cells=0, max_part_numbers=None, build_cluster_variable=True, build_distance_variables=False, build_frequency_variables=False, variables_prefix='', results_prefix='', batch_mode=True, log_file_path=None, output_scenario_path=None, task_file_path=None, trace=False)¶
Deploys an individual-variable coclustering on a data table
- This procedure generates the following files in
results_dir: Coclustering.kdic: A multi-table dictionary file for further deployments of the coclustering with deploy_modelKeys<data_table_file_name>: A data table file containing only the keys of individualDeployed<data_table_file_name>: A data table file containing the deployed coclustering model
- Parameters:
- dictionary_file_path_or_domainstr or
DictionaryDomain Path of a Khiops dictionary file or a DictionaryDomain object.
- dictionary_namestr
Name of the dictionary to be analyzed.
- data_table_pathstr
Path of the data table file.
- coclustering_file_pathstr
Path of the coclustering model file (extension
.khcor.khcj)- key_variable_nameslist of str
Names of the variables forming the unique keys of the individuals.
- deployed_variable_namestr
Name of the coclustering variable to deploy.
- results_dirstr
Path of the results directory.
- detect_formatbool, default
True If True detects automatically whether the data table file has a header and its field separator. It’s ignored if
header_lineorfield_separatorare set.- header_linebool, optional (default
Trueifdetect_formatis False) If True it uses the first line of the data as column names. Overrides
detect_formatif set.- field_separatorstr, optional (default “\t” if
detect_formatis False) A field separator character, overrides
detect_formatif set (”” counts as “\t”).- output_header_linebool, default
True If True writes a header line containing the column names in the output table.
- output_field_separatorstr, default “\t”
A field separator character (empty string counts as tab).
- max_preserved_informationint, default 0
Maximum information preserve in the simplified coclustering. If equal to 0 there is no limit.
- max_cellsint, default 0
Maximum number of cells in the simplified coclustering. If equal to 0 there is no limit.
- max_part_numbersdict, optional
Dictionary associating variable names to their maximum number of parts to preserve in the simplified coclustering. For variables not present in
max_part_numbersthere is no limit.- build_cluster_variablebool, default
True If True includes a cluster id variable in the deployment.
- build_distance_variablesbool, default False
If True includes a cluster distance variable in the deployment.
- build_frequency_variablesbool, default False
If True includes the frequency variables in the deployment.
- variables_prefixstr, default “”
Prefix for the variables in the deployment dictionary.
- results_prefixstr, default “”
Prefix of the result files.
- …
Options of the
KhiopsRunner.runmethod from the classKhiopsRunner.
- dictionary_file_path_or_domainstr or
- Returns:
- tuple
A 2-tuple containing:
The deployed data table path
The deployment dictionary file path.
- Raises:
TypeErrorInvalid type
dictionary_file_path_or_domainorkey_variable_namesValueErrorIf the type of the dictionary key variables is not equal to
Categorical
Examples
- See the following function of the
samples.pydocumentation script:
- This procedure generates the following files in
- khiops.core.helpers.deploy_predictor_for_metrics(dictionary_file_path_or_domain, dictionary_name, data_table_path, output_data_table_path, detect_format=True, header_line=None, field_separator=None, sample_percentage=70, sampling_mode='Include sample', additional_data_tables=None, output_header_line=True, output_field_separator='\t', trace=False)¶
Deploys the necessary data to estimate the performance metrics of a predictor
For each instance for each instance it deploys:
The true value of the target variable
The predicted value of the target variable
The probabilities of each value of the target variable (classifier only)
Note
To obtain the data of the default Khiops test dataset use
sample_percentage = 70andsampling_mode = "Exclude sample".- Parameters:
- dictionary_file_path_or_domainstr or
DictionaryDomain Path of a Khiops dictionary file or a DictionaryDomain object.
- dictionary_namestr
Name of the predictor dictionary.
- data_table_pathstr
Path of the data table file.
- output_data_table_pathstr
Path of the scores output data file.
- detect_formatbool, default
True If True detects automatically whether the data table file has a header and its field separator. It’s ignored if
header_lineorfield_separatorare set.- header_linebool, optional (default
Trueifdetect_formatisFalse) If True it uses the first line of the data as column names. Overrides
detect_formatif set.- field_separatorstr, optional (default “\t” if
detect_formatisFalse) A field separator character, overrides
detect_formatif set (”” counts as “\t”).- sample_percentageint, default 70
See
sampling_modeoption below.- sampling_mode“Include sample” or “Exclude sample”
If equal to “Include sample” deploys the predictor on
sample_percentagepercent of data and if equal to “Exclude sample” on the complementary100 - sample_percentagepercent of data.- additional_data_tablesdict, optional
A dictionary containing the data paths and file paths for a multi-table dictionary file. For more details see Multi-Table Learning Primer documentation.
- output_header_linebool, default
True If True writes a header line containing the column names in the output table.
- output_field_separatorstr, default “\t”
A field separator character (”” counts as “\t”).
- …
Options of the
KhiopsRunner.runmethod from the classKhiopsRunner.
- dictionary_file_path_or_domainstr or
- khiops.core.helpers.visualize_report(report_path)¶
Opens a Khiops or Khiops Coclustering report with the desktop visualization app
Before using this function, make sure you have installed the Khiops Visualization app and/or the Khiops Co-Visualization app. More info at https://khiops.org/setup/visualization/
- Parameters:
- report_pathstr
The path of the report file to be open. It must have extension ‘.khj’ (Khiops report) or ‘.khcj’ (Khiops Coclustering report).
- Raises:
ValueErrorIf the report file path does not have extension ‘.khj’ or ‘.khcj’.
FileNotFoundErrorIf the report file does not exist.
RuntimeErrorIf the report file is executable.