core.helpers¶
Submodule of khiops.core
Helper functions for specific and/or advanced treatments
Functions¶
Builds a multi-table dictionary domain from a dictionary with a key |
|
Deploys an individual-variable coclustering on a data table |
|
Deploys the necessary data to estimate the performance metrics of a predictor |
- khiops.core.helpers.build_multi_table_dictionary_domain(dictionary_domain, root_dictionary_name, secondary_table_variable_name)¶
Builds a multi-table dictionary domain from a dictionary with a key
- Parameters:
- dictionary_domain
DictionaryDomain
DictionaryDomain object. Its root dictionary must have its key set.
- root_dictionary_namestr
Name for the new root dictionary
- secondary_table_variable_namestr
Name, in the root dictionary, for the “table” variable of the secondary table.
- dictionary_domain
- Raises:
TypeError
Invalid type of an argument
ValueError
Invalid values of an argument: - the dictionary domain doesn’t contain at least a dictionary - the dictionary domain’s root dictionary doesn’t have a key set
- khiops.core.helpers.deploy_coclustering(dictionary_file_path_or_domain, dictionary_name, data_table_path, coclustering_file_path, key_variable_names, deployed_variable_name, results_dir, detect_format=True, header_line=None, field_separator=None, output_header_line=True, output_field_separator='\t', max_preserved_information=0, max_cells=0, max_part_numbers=None, build_cluster_variable=True, build_distance_variables=False, build_frequency_variables=False, variables_prefix='', results_prefix='', batch_mode=True, log_file_path=None, output_scenario_path=None, task_file_path=None, trace=False)¶
Deploys an individual-variable coclustering on a data table
- This procedure generates the following files in
results_dir
: Coclustering.kdic
: A multi-table dictionary file for further deployments of the coclustering with deploy_modelKeys<data_table_file_name>
: A data table file containing only the keys of individualDeployed<data_table_file_name>
: A data table file containing the deployed coclustering model
- Parameters:
- dictionary_file_path_or_domainstr or
DictionaryDomain
Path of a Khiops dictionary file or a DictionaryDomain object.
- dictionary_namestr
Name of the dictionary to be analyzed.
- data_table_pathstr
Path of the data table file.
- coclustering_file_pathstr
Path of the coclustering model file (extension
.khc
or.khcj
)- key_variable_nameslist of str
Names of the variables forming the unique keys of the individuals.
- deployed_variable_namestr
Name of the coclustering variable to deploy.
- results_dirstr
Path of the results directory.
- detect_formatbool, default
True
If True detects automatically whether the data table file has a header and its field separator. It’s ignored if
header_line
orfield_separator
are set.- header_linebool, optional (default
True
ifdetect_format
is False) If True it uses the first line of the data as column names. Overrides
detect_format
if set.- field_separatorstr, optional (default “\t” if
detect_format
is False) A field separator character, overrides
detect_format
if set (”” counts as “\t”).- output_header_linebool, default
True
If True writes a header line containing the column names in the output table.
- output_field_separatorstr, default “\t”
A field separator character (empty string counts as tab).
- max_preserved_informationint, default 0
Maximum information preserve in the simplified coclustering. If equal to 0 there is no limit.
- max_cellsint, default 0
Maximum number of cells in the simplified coclustering. If equal to 0 there is no limit.
- max_part_numbersdict, optional
Dictionary associating variable names to their maximum number of parts to preserve in the simplified coclustering. For variables not present in
max_part_numbers
there is no limit.- build_cluster_variablebool, default
True
If True includes a cluster id variable in the deployment.
- build_distance_variablesbool, default False
If True includes a cluster distance variable in the deployment.
- build_frequency_variablesbool, default False
If True includes the frequency variables in the deployment.
- variables_prefixstr, default “”
Prefix for the variables in the deployment dictionary.
- results_prefixstr, default “”
Prefix of the result files.
- …
Options of the
KhiopsRunner.run
method from the classKhiopsRunner
.
- dictionary_file_path_or_domainstr or
- Returns:
- tuple
A 2-tuple containing:
The deployed data table path
The deployment dictionary file path.
- Raises:
TypeError
Invalid type
dictionary_file_path_or_domain
orkey_variable_names
ValueError
If the type of the dictionary key variables is not equal to
Categorical
Examples
- See the following function of the
samples.py
documentation script:
- This procedure generates the following files in
- khiops.core.helpers.deploy_predictor_for_metrics(dictionary_file_path_or_domain, dictionary_name, data_table_path, output_data_table_path, detect_format=True, header_line=None, field_separator=None, sample_percentage=70, sampling_mode='Include sample', additional_data_tables=None, output_header_line=True, output_field_separator='\t', trace=False)¶
Deploys the necessary data to estimate the performance metrics of a predictor
For each instance for each instance it deploys:
The true value of the target variable
The predicted value of the target variable
The probabilities of each value of the target variable (classifier only)
Note
To obtain the data of the default Khiops test dataset use
sample_percentage = 70
andsampling_mode = "Exclude sample"
.- Parameters:
- dictionary_file_path_or_domainstr or
DictionaryDomain
Path of a Khiops dictionary file or a DictionaryDomain object.
- dictionary_namestr
Name of the predictor dictionary.
- data_table_pathstr
Path of the data table file.
- output_data_table_pathstr
Path of the scores output data file.
- detect_formatbool, default
True
If True detects automatically whether the data table file has a header and its field separator. It’s ignored if
header_line
orfield_separator
are set.- header_linebool, optional (default
True
ifdetect_format
isFalse
) If True it uses the first line of the data as column names. Overrides
detect_format
if set.- field_separatorstr, optional (default “\t” if
detect_format
isFalse
) A field separator character, overrides
detect_format
if set (”” counts as “\t”).- sample_percentageint, default 70
See
sampling_mode
option below.- sampling_mode“Include sample” or “Exclude sample”
If equal to “Include sample” deploys the predictor on
sample_percentage
percent of data and if equal to “Exclude sample” on the complementary100 - sample_percentage
percent of data.- additional_data_tablesdict, optional
A dictionary containing the data paths and file paths for a multi-table dictionary file. For more details see Multi-Table Learning Primer documentation.
- output_header_linebool, default
True
If True writes a header line containing the column names in the output table.
- output_field_separatorstr, default “\t”
A field separator character (”” counts as “\t”).
- …
Options of the
KhiopsRunner.run
method from the classKhiopsRunner
.
- dictionary_file_path_or_domainstr or