core.analysis_results

Submodule of khiops.core

Classes to access Khiops JSON reports

Class Overview

Below we describe with diagrams the relationships of the classes in this modules. They are mostly compositions (has-a relations) and we omit native attributes (str, int, float, etc).

The main class of this module is AnalysisResults and it is largely a composition of sub-reports objects given by the following structure:

AnalysisResults
|- preparation_report            ->  PreparationReport
|- bivariate_preparation_report  ->  BivariatePreparationReport
|- modeling_report               ->  ModelingReport
|- train_evaluation_report      |
|- test_evaluation_report       |->  EvaluationReport
|- evaluation_report            |

These sub-classes in turn use other tertiary classes to represent specific information pieces of each report. The dependencies for the classes PreparationReport and BivariatePreparationReport are:

PreparationReport
|- variable_statistics -> list of VariableStatistics

BivariatePreparationReport
|- variable_pair_statistics -> list of VariablePairStatistics

VariableStatistics
|- data_grid -> DataGrid

VariablePairStatistics
|- data_grid -> DataGrid

DataGrid
|- dimensions -> list of DataGridDimension

DataGridDimension
|- partition -> list of PartInterval OR
|               list of PartValue OR
|               list of PartValueGroup

for class ModelingReport:

ModelingReport
|- trained_predictors -> list of TrainedPredictors

TrainedPredictor
|- selected_variables -> list of SelectedVariable

and for class EvaluationReport:

EvaluationReport
|- predictors_performance -> list of PredictorPerformance
|- classification_lift_curves -> list of PredictorCurve (classification only)
|- regression_rec_curves -> list of PredictorCurve (regression only)

PredictorPerformance
|- confusion_matrix -> ConfusionMatrix (classification only)

To have a complete illustration of the access to the information of all classes in this module look at their write_report methods which write TSV (tab separated values) reports.

Functions

read_analysis_results_file

Reads a Khiops JSON report

Classes

AnalysisResults

Main class containing the information of a Khiops JSON file

BivariatePreparationReport

Bivariate data preparation report: 2D grid models

ConfusionMatrix

A classifier's confusion matrix

DataGrid

A piecewise constant probability density estimation

DataGridDimension

A dimension (variable) of a data grid

EvaluationReport

Evaluation report for predictors

ModelingReport

Modeling report of all predictors created in a supervised analysis

PartInterval

Element of a numerical interval partition in a data grid

PartValue

Element of a value partition (singletons) in a data grid

PartValueGroup

Element of a categorical partition in a data grid

PredictorCurve

A lift curve for a classifier or a REC curve for a regressor

PredictorPerformance

A predictor's performance evaluation

PreparationReport

Univariate data preparation report: discretizations and groupings

SelectedVariable

Information about a selected variable in a predictor

TrainedPredictor

Trained predictor information

VariablePairStatistics

Variable pair information and statistics

VariableStatistics

Variable information and statistics

class khiops.core.analysis_results.AnalysisResults(json_data=None)

Bases: KhiopsJSONObject

Main class containing the information of a Khiops JSON file

Sub-reports not available in the JSON data are optional (set to None).

Parameters:
json_datadict, optional

A dictionary representing the data of a Khiops JSON report file. If not specified it returns an empty instance.

Note

See also the read_analysis_results_file function from the core API to obtain an instance of this class from a Khiops JSON file.

Attributes:
toolstr

Name of the Khiops tool that generated the report.

versionstr

Version of the Khiops tool that generated the report.

short_descriptionstr

Short description defined by the user.

logslist of tuples

2-tuples linking each sub-task name to a list containing the warnings and errors found during the execution of that sub-task. Available only if there were errors or warnings.

preparation_reportPreparationReport

A report about the variables’ discretizations and groupings.

bivariate_preparation_reportBivariatePreparationReport, optional

A report of the grid models created from pairs of variables. Available only when pair of variables were created in the analysis.

modeling_reportModelingReport

A report describing the predictor models. Available only in supervised analysis.

train_evaluation_reportEvaluationReport

An evaluation report of the trained models on the train dataset split. Available only in supervised analysis.

test_evaluation_reportEvaluationReport

An evaluation report of the trained models on the test dataset split. Available only in supervised analysis and when the test split was not empty.

evaluation_reportEvaluationReport

An EvaluationReport instance for evaluations created with an explicit evaluation (either with the evaluate_predictor core API function or the Evaluate Predictor feature of the Khiops desktop app). Available only when the report was generated with the aforementioned features.

get_reports()

Returns all available sub-reports

Returns:
list

All available sub-reports.

write_report(stream_or_writer)

Writes the instance’s TSV report into a writer object

Parameters:
stream_or_writerio.IOBase or KhiopsOutputWriter

Output stream or writer.

write_report_file(report_file_path)

Writes a TSV report file with the object’s information

Parameters:
report_file_pathstr

Path of the output TSV report file.

class khiops.core.analysis_results.BivariatePreparationReport(json_data=None)

Bases: object

Bivariate data preparation report: 2D grid models

The attributes related to the target variable and null model are available only in the case of a supervised learning task (only classification in the bivariate case).

Parameters:
json_datadict, optional

JSON data of the bivariatePreparationReport field of a Khiops JSON report file. If not specified it returns an empty instance.

Attributes:
report_type“BivariatePreparation” (only possible value)

Report type.

dictionarystr

Name of the training data table dictionary.

variable_typeslist of str

The different types of variables.

variable_numberslist of int

The number of variables for each type in variables_types (synchronized lists).

databasestr

Path of the main training data table file.

sample_percentageint

Percentage of instances used in training.

sampling_modestr

Sampling mode used to split the train and datasets.

selection_variablestr

Variable used to select instances for training.

selection_valuestr

Value of selection_variable to select instances for training.

instance_numberint

Number of training instances.

learning_taskstr
Name of the associated learning task. Possible values:
  • “Classification analysis”

  • “Regression analysis”

  • “Unsupervised analysis”

target_variablestr

Target variable name in supervised analysis.

main_target_valuestr

Main modality of the target variable in supervised case.

target_stats_modestr

Mode of a categorical target variable.

target_stats_mode_frequencyint

Mode frequency of a categorical target variable.

target_valueslist of str

Values of a categorical target variable.

target_value_frequencieslist of int

Frequencies for each value in target_values (synchronized lists).

evaluated_pair_numberint

Number of variable pairs evaluated.

informative_pair_numberint

Number of informative variable pairs. A pair is considered informative if its level is greater than the sum of its components’ levels.

variable_pair_statisticslist of VariablePairStatistics

Statistics for each analyzed pair of variables.

get_variable_pair_names()

Returns the pairs of variable names available on this report

Returns:
list of tuple

The pair of variable names available on this report

get_variable_pair_statistics(variable_name_1, variable_name_2)

Returns the statistics of the specified pair of variables

Note

The variable names can be given in any order.

Parameters:
variable_name_1str

Name of the first variable.

variable_name_2str

Name of the second variable.

Returns:
VariablePairStatistics

The statistics of the specified pair of variables.

Raises:
KeyError

If no pair with the specified names exist.

write_report(writer)

Writes the instance’s TSV report into a writer object

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.ConfusionMatrix(json_data=None)

Bases: object

A classifier’s confusion matrix

Parameters:
json_datadict, optional

JSON data of the confusionMatrix field of an element of the dictionary found at the predictorsDetailedPerformances field within one of the evaluation report fields of a Khiops JSON report file. If not specified it returns an empty object.

Attributes:
valueslist of str

Values of the target variable.

matrixlist

Matrix of predicted frequencies vs target frequencies. This list is synchornized with values. Each list element represents a row of the confusion matrix, that is, the target frequencies for a fixed predicted target value.

write_report(writer)

Writes the instance’s TSV report into a writer object

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.DataGrid(json_data=None)

Bases: object

A piecewise constant probability density estimation

A data grid represents one or many variables referred to as “dimensions” to differentiate them from the original data variables. Each dimension can be partitioned by:

  • Intervals for numerical variables

  • Values (singletons) / Value groups for categorical variables

The Cartesian product of the unidimensional partitions provides a multivariate partition of cells whose frequencies allow to estimate the multivariate probability density.

In the univariate case, the data grid is simply an histogram. In the case of multiple variables, the data grid may be supervised or not. If supervised, the target variable is the last one, and the data grid represents the conditional density estimator of the source variable with respect to the target. Otherwise, it represents a joint density estimator.

In case of an unsupervised data grid, the cells are described by their index on the variable partitions, together with their frequencies. For a supervised data grid, the cells are described by their index on the input variables partitions, and a vector of target frequencies is associated to each cell.

Parameters:
json_datadict, optional

JSON data at a dataGrid field of an element of the list found at the variablesDetailedStatistics field within the preparationReport field of a Khiops JSON report file. If not specified it returns an empty instance.

Attributes:
is_supervisedbool

True if the data grid is supervised (there is a target).

dimensionslist of DataGridDimension

The dimensions of the data grid.

frequencieslist of int

Unsupervised only: Frequencies for each part.

part_interestslist of float

Supervised univariate only: Prediction interests for each part of the input dimension. Synchronized with dimensions[0].partition.

part_target_frequencieslist

Supervised univariate only: List of frequencies per target value for each part of the input dimension. Synchronized with dimensions[0].partition.

cell_idslist of str

Multivariate only: Unique identifiers of the grid’s cells.

cell_part_indexeslist

Multivariate only: List of dimension indexes defining each cell. Synchronized with cell_ids.

cell_frequencieslist of int

Unsupervised multivariate only: Frequencies for each cell. Synchronized with cell_ids.

cell_target_frequencieslist

Supervised multivariate only: List of frequencies per target value for each cell. Synchronized with cell_ids.

write_report(writer)

Writes the instance’s TSV report into a writer object

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.DataGridDimension(json_data=None)

Bases: object

A dimension (variable) of a data grid

Parameters:
json_datadict, optional

JSON data of an element at the dimensions field of a dataGrid field of an element of the list found at the variablesDetailedStatistics field within the preparationReport field of a Khiops JSON report file. If not specified it returns an empty instance.

Attributes:
variablestr

Variable name

type“Numerical” or “Categorical”

Variable type.

partition_type“Intervals”, “Values” or “Value groups”

Partition type.

partitionlist
The dimension parts. The list objects are of type:
write_report(writer)

Writes the instance’s TSV report into a writer object

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.EvaluationReport(json_data=None)

Bases: object

Evaluation report for predictors

Parameters:
json_datadict, optional
JSON data of the fields:
  • trainEvaluationReport: predictor training

  • testEvaluationReport: predictor training & non-empty test split

  • evaluationReport: explicit evaluation

The first two fields are set when doing a supervised analysis: either with the “Train Model” feature of the Khiops app or the train_predictor function of the Khiops Python core API. The third field is set when doing an explicit evaluation: either with the Evaluate Predictor feature of the Khiops app or the evaluate_predictor function of the Khiops Python core API.

If not specified it returns an empty instance.

Attributes:
report_type“Evaluation” (only possible value)

Report type.

evaluation_type“Train”, “Test” or “”

Evaluation type. The value “” is set when the evaluation was explicit.

dictionarystr

Name of the training data table dictionary.

databasestr

Path of the main training data table file.

sample_percentageint

Percentage of instances used in training.

sampling_modestr

Sampling mode used to split the train and datasets.

selection_variablestr

Variable used to select instances for training.

selection_valuestr

Value of selection_variable to select instances for training.

instance_numberint

Number of training instances.

learning_task“Classification analysis” or “Regression analysis”

Type of learning task.

target_variablestr

Name of the target variable.

main_target_valuestr

Main value of the target variable.

predictors_performancelist of PredictorPerformance

Performance metrics for each predictor.

regression_rec_curveslist of PredictorCurve

REC curves for each regressor.

classification_target_valueslist of str

Target variable values for which a classifier lift curve was evaluated.

classification_lift_curveslist of PredictorCurve

Lift curves for each target value in classification_target_values. The lift curve for the optimal predictor is prepended to those of the target values.

get_classifier_lift_curve(classifier_name, target_value)

Returns the lift curve for the specified classifier and target value

Parameters:
classifier_namestr

A name of a classifier.

target_valuestr

A specific value of the target variable.

Returns:
PredictorCurve

The lift curve for the specified classifier and target value.

Raises:
KeyError

If no classifier with the specified exists or no target value with the specified name exists.

get_predictor_names()

Returns the names of the available predictors in the report

Returns:
list of str

The names of the available predictors.

get_predictor_performance(predictor_name)

Returns the performance metrics for the specified predictor

Parameters:
predictor_namestr

A predictor name.

Returns:
PredictorPerformance

The performance metrics for the specified predictor.

Raises:
KeyError

If no predictor with the specified name exists.

get_regressor_rec_curve(regressor_name)

Returns the REC curve for the specified regressor

Parameters:
regressor_namestr

Name of a regressor.

Returns:
PredictorCurve

The REC curve for the specified regressor.

Raises:
ValueError

If no regressor curves available. (

KeyError

If no regressor with the specified name exists.

get_snb_lift_curve(target_value)

Returns lift curve for the Selective Naive Bayes clf. given a target value

Parameters:
target_valuestr

A specific value of the target variable.

Returns:
PredictorCurve

The lift curve of the Selective Naive Bayes classifier for the specified target value.

Raises:
ValueError

If the Selective Naive Bayes classifier information is not available.

KeyError

If no target value with the specified name exists.

get_snb_performance()

Returns the performance metrics for the Selective Naive Bayes predictor

Returns:
PredictorPerformance

The performance metrics for the Selective Naive Bayes predictor.

Raises:
ValueError

If the Selective Naive Bayes information is not available in the report.

get_snb_rec_curve()

Returns the REC curve for the Selective Naive Bayes regressor

Returns:
PredictorCurve

The REC curve for the Selective Naive Bayes regressor.

Raises:
ValueError

If the Selective Naive Bayes information is not available in the report.

write_report(writer)

Writes the instance’s TSV report into a writer object

Parameters:
writerKhiopsOutputWriter

Output writer object.

class khiops.core.analysis_results.ModelingReport(json_data=None)

Bases: object

Modeling report of all predictors created in a supervised analysis

Parameters:
json_datadict, optional

JSON data of the modelingReport field of Khiops JSON report file. If not specified it returns an empty instance.

Attributes:
report_type“Modeling” (only possible value)

Report type.

dictionarystr

Name of the training data table dictionary.

databasestr

Path of the main training data table file.

sample_percentageint

Percentage of instances used in training.

sampling_mode“Include sample” or “Exclude sample”

Sampling mode used to split the train and datasets.

selection_variablestr

Variable used to select instances for training.

selection_valuestr

Value of selection_variable to select instances for training.

learning_task“Classification analysis” or “Regression analysis”

Name of the associated learning task.

target_variablestr

Name of the target variable.

main_target_valuestr

Main value of the target variable.

trained_predictorslist of TrainedPredictor

The predictors trained in the task.

get_predictor(predictor_name)

Returns the specified predictor

Parameters:
predictor_namestr

Name of the predictor.

Returns:
TrainedPredictor

The predictor object for the specified name.

Raises:
KeyError

If there is no predictor with the specified name.

get_predictor_names()

Returns the names of the available predictor reports

Returns:
list of str

The names of the available predictor reports.

get_snb_predictor()

Returns the Selective Naive Bayes predictor

Returns:
TrainedPredictor

The predictor object for “Selective Naive Bayes”.

Raises:
KeyError

If there is no predictor named “Selective Naive Bayes”.

write_report(writer)

Writes the instance’s TSV report into a writer object

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.PartInterval(json_data=None)

Bases: object

Element of a numerical interval partition in a data grid

Parameters:
json_datalist, optional

JSON data of the partition field of a dataGrid field of an element of the list found at the variablesDetailedStatistics field within the preparationReport field of a Khiops JSON report file. If not specified it returns an empty instance.

Attributes:
lower_boundfloat

The lower bound of the interval.

upper_boundfloat

The upper bound of the interval.

is_missingbool

True if it is the missing values part (bounds are None).

is_left_openbool

True if the interval has no minimum. lower_bound still contains the minimum value seen on data.

is_right_openbool

True if the interval has no maximum. upper_bound still contains the minimum value seen on data.

part_type()

Type of this part

Returns:
str

Only possible value: “Interval”.

write_report_line(writer)

Writes a line of the TSV report into a writer object

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.PartValue(json_data=None)

Bases: object

Element of a value partition (singletons) in a data grid

Parameters:
json_datastr, optional

The value contained in this singleton part. If not specified it returns an empty object.

Attributes:
valuestr

A representation of the value defining the singleton.

part_type()

Type of the instance

Returns:
str

Only possible value: “Value”.

write_report_line(writer)

Writes a line of the TSV report into a writer object

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.PartValueGroup(json_data=None)

Bases: object

Element of a categorical partition in a data grid

Parameters:
json_datalist of str, optional

The list of values of the group. If not specified it returns an empty instance.

Attributes:
valueslist of str

The group’s values.

is_default_partbool

True if this part is dedicated to all unknown values.

part_type()

Type of the instance

Returns:
str

Only possible value: “Value group”.

write_report_line(writer)

Writes a line of the TSV report into a writer object

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.PredictorCurve(json_data=None)

Bases: object

A lift curve for a classifier or a REC curve for a regressor

Parameters:
json_datadict, optional

JSON data of an element of the liftCurves or recCurves field of one of the evaluation report fields of a Khiops JSON report file. If not specified it returns an empty instance.

Attributes:
type“Lift” (classifier) or “REC” (regressor)

Type of predictor curve.

namestr

Name of evaluated predictor.

valueslist of float

The curve’s y-axis values.

class khiops.core.analysis_results.PredictorPerformance(json_data=None)

Bases: object

A predictor’s performance evaluation

This class describes the performance of a predictor (classifier or regressor).

Parameters:
json_datadict, optional

JSON data of an element of the dictionary found at the predictorPerformances field within the one of the evaluation report fields of a Khiops JSON report file. If not specified it returns an empty instance.

Note

The confusion_matrix field is considered as “detail” and is not initialized in the constructor. Instead, it is initialized explicitly via the init_details method. This allows to make partial initializations for large reports.

Attributes:
rankstr

An string index representing the order in the report.

type“Classifier” or “Regressor”

Type of the predictor.

namestr

Human readable name.

data_gridDataGrid

Data grid representing the distribution of the target values per part of the descriptive variable in the evaluated dataset.

accuracyfloat

Classifier only: Accuracy.

compressionfloat

Classifier only: Compression rate.

aucfloat

Classifier only: Area under the ROC curve.

confusion_matrixConfusionMatrix

Classifier only: Confusion matrix.

rmsefloat

Regressor only: Root mean square error.

maefloat

Regressor only: Mean absolute error.

nlpdfloat

Regressor only: Negative log predictive density.

rank_rmsefloat

Regressor only: Root mean square error on the target’s value rank.

rank_maefloat

Regressor only: Mean absolute error on the target’s value rank.

rank_nlpdfloat

Regressor only: Negative log predictive density on the target’s value rank.

get_metric(metric_name)

Returns the value of the specified metric

Note

The available metrics is available via the method get_metric_names.

Parameters:
metric_namestr

A metric name (case insensitive).

Returns:
float

The value of the specified metric.

get_metric_names()

Returns the available univariate metrics

Returns:
list of str

The names of the available metrics.

init_details(json_data=None)

Initializes the details’ attributes from a python JSON object

is_detailed()

Returns True if the report contains any detailed information

Returns:
bool

True if the report contains any detailed information.

write_report_details(writer)

Writes the details of the TSV report into a writer object

Parameters:
writerKhiopsOutputWriter

Output writer.

write_report_header_line(writer)

Writes the header line of a TSV report into a writer object

The header is the same for all variable types.

Parameters:
writerKhiopsOutputWriter

Output writer.

write_report_line(writer)

Writes a line of the TSV report into a writer object

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.PreparationReport(json_data=None)

Bases: object

Univariate data preparation report: discretizations and groupings

The attributes related to the target variable and null model are available only in the case of a supervised learning task (classification or regression).

Parameters:
json_datadict, optional

JSON data of the preparationReport field of a Khiops JSON report file. If not specified it returns an empty instance.

Attributes:
report_type“Preparation” (only possible value)

Report type.

dictionarystr

Name of the training data table dictionary.

variable_typeslist of str

The different types of variables.

variable_numberslist of int

Number of variables for each type. Synchronized with variable_types.

databasestr

Path of the main training data table file.

sample_percentageint

Percentage of instances used in training.

sampling_modestr

Sampling mode used to split the train and datasets.

selection_variablestr

Name of the variable used to select training instances.

selection_valuestr

Value of selection_variable to select training instance.

instance_numberint

Number of training instances.

learning_taskstr
Name of the associated learning task. Possible values:
  • “Classification analysis”

  • “Regression analysis”

  • “Unsupervised analysis”

target_variablestr

Target variable name.

main_target_valuestr

Main value of a categorical target variable.

target_stats_minfloat

Minimum of a numerical target variable.

target_stats_maxfloat

Maximum of a numerical target variable.

target_stats_meanfloat

Mean of a numerical target variable.

target_stats_std_devfloat

Standard deviation of a numerical target variable.

target_stats_missing_numberint

Number of missing values for a numerical target variable.

target_stats_modestr

Mode of a categorical target variable.

target_stats_mode_frequencyint

Mode frequency of a categorical target variable.

target_valueslist of str

Values of a categorical target variable.

target_value_frequencieslist of int

Frequencies for each target value. Synchronized with target_values.

evaluated_variable_numberint

Number of variables analyzed.

informative_variable_numberint

Supervised analysis only: Number of informative variables.

max_constructed_variablesint

Maximum number of constructed variable specified for the analysis.

max_treesint

Maximum number of constructed trees specified for the analysis.

max_pairsint

Maximum number of constructed variables pairs specified for the analysis.

discretizationstr

Type of discretization method used.

value_groupingstr

Type of grouping method used.

null_model_construction_costfloat

Coding length of the null construction model.

null_model_preparation_costfloat

Coding length of the null preparation model.

null_model_data_costfloat

Coding length of the data given the null model.

variables_statisticslist of VariableStatistics

Variable statistics for each variable analyzed.

get_variable_names()

Returns the names of the variables analyzed during the preparation

Returns:
list of str

The names of the variables analyzed during the preparation.

get_variable_statistics(variable_name)

Returns the statistics of the specified variable

Parameters:
variable_namestr

Name of the variable.

Returns:
VariableStatistics

The statistics of the specified variable.

Raises:
KeyError

If no variable with the specified names exist.

write_report(writer)

Writes the instance’s TSV report into a writer object

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.SelectedVariable(json_data=None)

Bases: object

Information about a selected variable in a predictor

Parameters:
json_datadict, optional

JSON data representing an element of the selectedVariables list in the trainedPredictorsDetails field within the modelingReport field of a Khiops JSON report file. If not specified it returns an empty instance.

Attributes:
namestr

Human readable variable name.

prepared_namestr

Internal variable name.

levelfloat

Variable level.

weightfloat

Variable weight in the model.

importancefloat

A measure of overall importance of the variable in the model. It is the geometric mean of the level and weight.

mapbool

True if the variable is in the MAP model. Deprecated: Will be removed in Khiops Python 11.

write_report_header_line(writer)

Writes the header line of a TSV report into a writer object

The header is the same for all variable types.

Parameters:
writerKhiopsOutputWriter

Output writer.

write_report_line(writer)

Writes a line of the TSV report into a writer object

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.TrainedPredictor(json_data=None)

Bases: object

Trained predictor information

Parameters:
json_datadict, optional

JSON data of an element of the list found at the trainedPredictors field within the modelingReport field of a Khiops JSON report file. If not specified it returns an empty instance.

Note

The selected_variables field is considered a “detail” and is not initialized in the constructor. Instead, it is initialized explicitly via the init_details method. This allows to make partial initializations for large reports.

Attributes:
typestr

Predictor type. Valid values are found in the predictor_types class attribute. They are:

  • “Selective Naive Bayes”

  • “MAP Naive Bayes” Deprecated

  • “Naive Bayes”

  • “Univariate”

family“Classifier” or “Regressor”

Predictor family name. Valid values are found in the predictor_families class variable.

namestr

Human readable predictor name.

variable_numberint

Number of variables used by the predictor.

selected_variableslist of SelectedVariable

Variables used by the predictor. Only for types “Selective Naive Bayes” and “MAP Naive Bayes”.

init_details(json_data=None)

Initializes the details’ attributes from a Python JSON object

Parameters:
json_datadict, optional

JSON data of the dictionary found at the trainedPredictorsDetails field within the modelingReport field of a Khiops JSON report file. If not specified it leaves the object as-is.

is_detailed()

Returns True if the report contains any detailed information

Returns:
bool

True if the report contains any detailed information.

write_report_details(writer)

Writes the details of the TSV report into a writer object

Parameters:
writerKhiopsOutputWriter

Output writer.

write_report_header_line(writer)

Writes the header line of a TSV report into a writer object

The header is the same for all variable types.

Parameters:
writerKhiopsOutputWriter

Output writer.

write_report_line(writer)

Writes a line of the TSV report into a writer object

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.VariablePairStatistics(json_data=None)

Bases: object

Variable pair information and statistics

Parameters:
json_datadict, optional

JSON data of an element of the list found at the variablesPairStatistics field within the bivariatePreparationReport field of a Khiops JSON report file. If not specified it returns an empty instance.

Note

The data_grid field is considered as “detail” and is not initialized in the constructor. Instead, it is initialized explicitly via the init_details method. This allows to make partial initializations for large reports. If not specified it returns an empty instance.

Attributes:
rankstr

Variable rank with respect to its level. Lower Rank = Higher Level.

name1str

Name of the pair’s first variable.

name2str

Name of the pair’s second variable.

levelfloat

Predictive importance of the pair.

level1float

Predictive importance of the first variable.

level2float

Predictive importance of the second variable.

delta_levelfloat

Difference between the pair’s level and the sum of those of its components (delta_level = level - level1 - level2).

variable_numberint
Number of active variables in the pair:
  • 0 means that there is no information in any of the variables

  • 1 means that the pair information reduces to that of any of its components

  • 2 means that the two variables are jointly informative

part_number1int

Number of parts of the first variable partition.

part_number2int

Number of parts of the second variable partition.

cell_numberint

Number of cells generated of the pair grid.

construction_costfloat

Advanced: Construction cost of the variable. More complex variables cost more.

preparation_costfloat

Advanced: Partition model cost. More complex partitions cost more.

data_costfloat

Advanced: Negative log-likelihood of the variable given a preparation model and a construction model.

data_gridDataGrid

A density estimation of the partitioned pair of variable with respect to the target.

init_details(json_data=None)

Initializes the details’ attributes from a Python JSON object

Parameters:
json_datadict, optional

JSON data of an element of the list found at the variablesPairsDetailedStatistics field within the bivariatePreparationReport field of a Khiops JSON report file. If not specified it leaves the object as-is.

is_detailed()

Returns True if the report contains any detailed information

Returns:
bool

True if the report contains any detailed information.

write_report_details(writer)

Writes the details’ attributes into a writer object

Parameters:
writerKhiopsOutputWriter

Output writer.

write_report_header_line(writer)

Writes the header line of a TSV report into a writer object

The header is the same for all variable types.

Parameters:
writerKhiopsOutputWriter

Output writer.

write_report_line(writer)

Writes a line of the TSV report into a writer object

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.VariableStatistics(json_data=None)

Bases: object

Variable information and statistics

Note

The statistics in this class are for both numerical and categorical data.

Parameters:
json_datadict, optional

JSON data of an element of the list found at the variablesStatistics field within the preparationReport field of a Khiops JSON report file. If not specified it returns an empty instance.

Note

The data_grid field is considered a “detail” and is not initialized in the constructor. Instead, it is initialized explicitly via the init_details method. This allows to make partial initializations for large reports. If not specified it returns an empty instance.

Attributes:
rankstr

Variable rank with respect to its level. Lower Rank = Higher Level.

namestr

Variable name.

typestr
Variable type. Valid values:
  • “Numerical”

  • “Categorical”

  • “Date”

  • “Time”

  • “Timestamp”

  • “Table”

  • “Entity”

  • “Structure”

levelfloat

Variable predictive importance.

target_part_numberint
  • In regression: Number of the target intervals

  • In classification with target grouping: Number of target groups

part_numberint

Number of parts of the variable partition.

value_numberint

Number of distinct values of the variable.

minfloat

Minimum value of the variable.

maxfloat

Maximum value of the variable.

meanfloat

Mean value of the variable.

std_devfloat

Standard deviation of the variable.

missing_numberint

Number of missing values of the variable.

modefloat

Most common value.

mode_frequencyint

Frequency of the most common value.

input_valueslist of str

Different values taken by the variable. If there are too many values only the more frequent will be available.

input_value_frequencieslist of int

The frequencies for each input value. Synchronized with input_values.

construction_costfloat

Construction cost of the variable. More complex variables cost more.

preparation_costfloat

Partition model cost. More complex partitions cost more.

data_costfloat

Negative log-likelihood of the variable given a preparation model and a construction model.

derivation_rulestr

If the variable is not native it is Khiops dictionary function to derive it. Otherwise is set to None.

data_gridDataGrid

A density estimation of the partitioned variable with respect to the target.

init_details(json_data=None)

Initializes the details’ attributes from a Python JSON object

Parameters:
json_datadict, optional

JSON data of an element of the list found at the variablesDetailedStatistics field within the preparationReport field of a Khiops JSON report file. If not specified it leaves the object as-is.

is_detailed()

Returns True if the report contains any detailed information

Returns:
bool

True if the report contains any detailed information.

write_report_details(writer)

Writes the details’ attributes into a writer object

Parameters:
writerKhiopsOutputWriter

Output writer.

write_report_header_line(writer)

Writes the header line of a TSV report into a writer object

The header is the same for all variable types.

Parameters:
writerKhiopsOutputWriter

Output writer.

write_report_line(writer)

Writes a line of the TSV report into a writer object

Parameters:
writerKhiopsOutputWriter

Output writer.

khiops.core.analysis_results.read_analysis_results_file(json_file_path)

Reads a Khiops JSON report

Parameters:
json_file_pathstr

Path of the JSON report file.

Returns:
AnalysisResults

An instance of AnalysisResults containing the report’s information.

Examples

See the following functions of the samples.py documentation script: