core.coclustering_results

Submodule of khiops.core

Classes to access Khiops Coclustering JSON reports

Class Overview

Below we describe with diagrams the relationships of the classes in this modules. They are mostly compositions (has-a relations) and we omit native attributes (str, int, float, etc).

The main class of this module is CoclusteringResults and it is largely a composition of sub-reports objects given by the following structure:

CoclusteringResults
|- coclustering_report -> CoclusteringReport

CoclusteringReport
|- dimensions -> list of CoclusteringDimension
|- cells      -> list of CoclusteringCell

CoclusteringDimension
|- parts        -> list of CoclusteringDimensionPart
|- clusters     -> list of CoclusteringCluster
|- root_cluster -> CoclusteringCluster

CoclusteringDimensionPartValueGroup
|- values -> list of CoclusteringDimensionPartValue

CoclusteringCluster
|- leaf_part        -> CoclusteringDimensionPart or None
|- parent_cluster  |
|- child_cluster1  |-> CoclusteringCluster or None
|- child_cluster2  |

To have a complete illustration of the access to the information of all classes in this module look at their write_report methods which write TSV (tab separated values) reports.

Functions

read_coclustering_results_file

Reads a Khiops Coclustering JSON report

Classes

CoclusteringCell

A coclustering cell

CoclusteringCluster

A cluster in a coclustering dimension hierarchy

CoclusteringDimension

A coclustering dimension (variable)

CoclusteringDimensionPart

An element of a partition of a dimension

CoclusteringDimensionPartInterval

An interval of a numerical partition

CoclusteringDimensionPartValue

A specific value of a variable in a dimension value group.

CoclusteringDimensionPartValueGroup

A value group of a categorical partition

CoclusteringReport

Main coclustering report

CoclusteringResults

Main class containing the information of a Khiops Coclustering JSON file

class khiops.core.coclustering_results.CoclusteringCell

Bases: object

A coclustering cell

Note

This class has only a no-parameter constructor initializing an instance with the default values.

Attributes:
partslist of CoclusteringDimensionPart

Parts for each coclustering dimension.

frequencyint

Frequency of this cell.

write_line(writer)

Writes a line of the instance’s report to a writer object

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.coclustering_results.CoclusteringCluster(json_data=None)

Bases: object

A cluster in a coclustering dimension hierarchy

Parameters:
json_datadict, optional

JSON data of an element of the list at the dimensionHierarchies field within the coclusteringReport field of a Khiops Coclustering JSON report file. If not specified it returns an empty instance.

Attributes:
namestr

Name of the cluster.

parent_cluster_namestr

Name of the parent cluster.

frequencyint

Number of individuals in the cluster.

interestfloat

The cluster’s interest/informativeness.

hierarchical_levelfloat

A measure interpretable as the distance of the cluster to the root. Between 0 and 1.

rankint

Rank of clusters in the top-down list of clusters, with the smallest ranks at the top.

hierarchical_rankint

Rank of clusters in the hierarchy, with the smallest ranks being the closest from the root of the hierarchy.

is_leafbool

True if the cluster is a leaf of the hierarchy.

short_descriptionstr

Succinct cluster description.

descriptionstr

Cluster description.

leaf_partCoclusteringDimensionPart

On a leaf cluster: Its unique associated partition element. Otherwise None.

parent_clusterCoclusteringCluster

On a non-root cluster: Its unique parent cluster. Otherwise None.

child_cluster1CoclusteringCluster

On a non-leaf cluster : The first child cluster. Otherwise None.

child_cluster2CoclusteringCluster

On a non-leaf cluster : The second child cluster. Otherwise None.

write_annotation_header_line(writer)

Writes the “annotation” section’s header to a writer object

Parameters:
writerKhiopsOutputWriter

Output writer for the report file.

write_annotation_line(writer)

Writes a line of the “annotation” section to a writer object

Parameters:
writerKhiopsOutputWriter

Output writer for the report file.

write_hierarchy_header_line(writer)

Writes the “hierarchy” section’s header to a writer object

Parameters:
writerKhiopsOutputWriter

Output writer for the report file.

write_hierarchy_line(writer)

Writes a line of the “hierarchy” section to a writer object

Parameters:
writerKhiopsOutputWriter

Output writer for the report file.

write_hierarchy_structure_report(writer)

Writes the hierarchical structure from this instance to a writer object

This method is mainly a test of the encoding of the cluster hierarchy.

Parameters:
writerKhiopsOutputWriter

Output writer for the report file.

class khiops.core.coclustering_results.CoclusteringDimension

Bases: object

A coclustering dimension (variable)

A coclustering dimension is a hierarchical clustering of an input variable. The leafs of this hierarchy are linked to an element of a partition of the input variable. Leaf clusters have variable parts as their children.

It only has a no-parameter constructor.

Note

The instance information is initialized with the init_summary, init_partition and init_hierarchy methods. Its owner object (class CoclusteringReport) uses the information found in the fields dimensionSummaries, dimensionPartitions and dimensionHierarchies to coherently initialize the all dimensions with these methods.

Attributes:
namestr

Name of the variable associated to this dimension.

type“Numerical” or “Categorical”

Dimension type.

part_numberint

Number of parts of the variable associated to this dimension.

initial_part_numberint

Number of initial parts. Note that part_number <= initial_part_number after a coclustering simplification (see simplify_coclustering).

value_numberint

Number of values of the dimension’s variable.

interestfloat

Interest of the dimension with respect to the other coclustering dimensions.

descriptionstr

Description of the dimension/variable.

minfloat

Minimum value of a numerical dimension/variable.

maxfloat

Maximum value of a numerical dimension/variable.

partslist of CoclusteringDimensionPart

Partition of this dimension.

clusterslist of CoclusteringCluster

Clusters of this dimension’s hierarchy. Note that includes intermediary clusters.

root_clusterCoclusteringCluster

Root cluster of the hierarchy.

get_cluster(cluster_name)

Returns the specified cluster

Parameters:
cluster_namestr

Name of the cluster.

Returns:
CoclusteringCluster

The specified cluster.

Raises:
KeyError

If there is no cluster with the specified name.

get_part(part_name)

Returns a part of the dimension given the part’s name

Parameters:
part_namestr

Name of the part.

Returns:
CoclusteringDimensionPart

The part with the specified name.

Raises:
KeyError

If there is no part with the specified name.

init_hierarchy(json_data)

Initializes the hierarchy attributes from a Python JSON object

Parameters:
json_datadict, optional

Python dictionary representing the data of an element of the list found at the dimensionHierarchies field of a Khiops Coclustering JSON report file. If not specified it leaves the object as-is.

Returns:
self

A reference to the caller instance.

init_partition(json_data=None)

Initializes the partition attributes from a Python JSON object

Parameters:
json_datadict, optional

Python dictionary representing the data of an element of the list found at the dimensionPartitions field of a Khiops Coclustering JSON report file. If not specified it leaves the object as-is.

Returns:
self

A reference to the caller instance.

init_summary(json_data=None)

Initializes the summary attributes from a Python JSON object

Parameters:
json_datadict, optional

Dictionary representing the data of an element of the list found at the dimensionSummaries field of a Khiops Coclustering JSON report file. If not specified it leaves the object as-is.

Returns:
self

A reference to the caller instance.

needs_annotation_report()

Status about the annotation report

Returns:
bool

True if the “annotation” section is reported

write_annotation(writer)

Writes the “annotation” section to a writer object

Parameters:
writerKhiopsOutputWriter

Output writer for the report file.

write_composition(writer)

Writes the “composition” section to a writer object

Parameters:
writerKhiopsOutputWriter

Output writer for the report file.

write_dimension_header_line(writer)

Writes the “dimensions” section header to a writer object

Parameters:
writerKhiopsOutputWriter

Output writer for the report file.

write_dimension_line(writer)

Writes the “dimensions” section line to a writer object

Parameters:
writerKhiopsOutputWriter

Output writer for the report file.

write_hierarchy(writer)

Writes the “hierarchy” section to a writer object

Parameters:
writerKhiopsOutputWriter

Output writer for the report file.

write_hierarchy_structure_report_file(report_file_path)

Writes the hierarchical structure of the clusters to a file

This method is mainly a test of the encoding of the cluster hierarchy.

Parameters:
report_file_pathstr

Path of the output file.

class khiops.core.coclustering_results.CoclusteringDimensionPart(json_data=None)

Bases: object

An element of a partition of a dimension

Abstract class.

Parameters:
json_data: dict, optional

See child classes for specific information about this parameter.

Attributes:
cluster_namestr

Name of the cluster to which this part belongs.

class khiops.core.coclustering_results.CoclusteringDimensionPartInterval(json_data=None)

Bases: CoclusteringDimensionPart

An interval of a numerical partition

Parameters:
json_datadict, optional

Python dictionary representing an element of type “Numerical” of the list at the dimensionPartitions field of a Khiops Coclustering JSON report file. If not specifed it returns an empty instance.

Attributes:
cluster_namestr

Name of the cluster containing this interval.

lower_boundfloat

Lower bound of the interval.

upper_boundfloat

Upper bound of the interval.

is_missingbool

True if the instance’s represent the missing values. In this case lower_bound and upper_bound are set to None.

is_left_openbool

True if the interval is unbounded below lower_bound may contain the minimum value of the training data.

is_right_openbool

True if the interval is unbounded above upper_bound may contain the maximum value of training data.

Raises:
KhiopsJSONError

If json_data does not contain a “cluster” key.

part_type()

Part type of this instance

Returns:
str

Only possible value: “Interval”.

class khiops.core.coclustering_results.CoclusteringDimensionPartValue

Bases: object

A specific value of a variable in a dimension value group.

Note

This class has only a no-parameter constructor initializing an instance with the default values.

Attributes:
valuestr

String representation of the value.

frequencyint

Number of individuals having this value.

typicalityfloat

Indicates how much the value is representative of the cluster. Ranges from 0 to 1, 1 being completely representative.

class khiops.core.coclustering_results.CoclusteringDimensionPartValueGroup(json_data=None)

Bases: CoclusteringDimensionPart

A value group of a categorical partition

Parameters:
json_datadict, optional

Python dictionary representing an element of type “Categorical” of the list at the dimensionPartitions field of a Khiops Coclustering JSON report file. If None it returns an empty instance.

Attributes:
cluster_namestr

Name of the cluster containing this group.

valueslist of CoclusteringDimensionPartValue

The singleton parts composing this group part.

is_default_partbool

True if the instance represents the “unknown values” group.

Raises:
KhiopsJSONError

If json_data does not contain a “cluster” key.

part_type()

Part type of this instance

Returns:
str

Only possible value: “Value group”.

class khiops.core.coclustering_results.CoclusteringReport(json_data=None)

Bases: object

Main coclustering report

A coclustering is an unsupervised data grid equipped with additional structures to ease its exploration. In particular, it is a piecewise constant density estimator of the data distribution. The additional structures are the following:

  • A cluster hierarchy for each dimension

  • Indicators (such as the interest) for each variable, part and value.

A coclustering consists of one to many variables (dimensions), where each variable is partitioned as:

  • Intervals in the numerical case

  • Individual values or value groups in the categorical case.

The cross-product of the partitions forms a multivariate partition of cells and their frequencies allow to estimate the multivariate density.

In case of an unsupervised data grid, the cells are described by their index on the variable partitions, together with their frequencies.

Parameters:
json_datadict, optional

JSON data of the coclusteringReport field of a Khiops Coclustering JSON report file. If not specified it returns an empty instance.

Attributes:
instance_numberint

Number of individuals in the learning data table.

cell_numberint

Number of coclustering cells.

null_costfloat

Cost of the null model.

levelfloat

Measure between 0 and 1 measuring the information gain over the null model.

initial_dimension_numberint

Initial number of dimensions. The number of dimensions (len(dimensions)) may be less than this quantity after a simplification (see simplify_coclustering).

frequency_variablestr

Name of the variable to be aggregated in the cells. By default is the number of individuals.

dictionarystr

Name dictionary from which the model was learned.

databasestr

Path of the main training data table file.

sample_percentagefloat

Percentage of instances used in training.

sampling_mode“Include sample” or “Exclude samples”

Sampling mode used to split the train and datasets.

selection_variablestr

Variable used to select instances for training.

selection_valuestr

Value of selection_variable to select instances for training.

dimensionslist of CoclusteringDimension

Coclustering dimensions (variable).

cellslist of CoclusteringCell

Coclustering cells.

get_dimension(dimension_name)

Returns the specified dimension

Parameters:
dimension_namestr

Name of the dimension (variable).

Returns:
CoclusteringDimension

The specified dimension.

Raises:
KeyError

If no dimension with the specified names exist.

get_dimension_names()

Returns the names of the available dimensions

Returns:
list of str

The names of the available dimensions.

write_annotations(writer)

Writes the dimensions’ “annotation” sections to a writer object

Parameters:
writerKhiopsOutputWriter

Output writer for the report file.

write_bounds(writer)

Writes the “bounds” section of the TSV report to a writer object

Parameters:
writerKhiopsOutputWriter

Output writer for the report file.

write_cells(writer)

Writes the “cells” section of the TSV report to a writer object

Parameters:
writerKhiopsOutputWriter

Output writer for the report file.

write_coclustering_stats(writer)

Writes the “stats” section of the TSV report to a writer object

Parameters:
writerKhiopsOutputWriter

Output writer for the report file.

write_compositions(writer)

Writes the dimensions’ “composition” sections to a writer object

Parameters:
writerKhiopsOutputWriter

Output writer for the report file.

write_dimensions(writer)

Writes the “dimensions” section of the TSV report to a writer object

Parameters:
writerKhiopsOutputWriter

Output writer for the report file.

write_hierarchies(writer)

Writes the dimension reports’ “hierarchy” sections to a writer object

Parameters:
writerKhiopsOutputWriter

Output writer for the report file.

write_report(writer)

Writes the instance’s TSV report to a writer object

Parameters:
writerKhiopsOutputWriter

Output stream or writer.

class khiops.core.coclustering_results.CoclusteringResults(json_data=None)

Bases: KhiopsJSONObject

Main class containing the information of a Khiops Coclustering JSON file

Parameters:
json_datadict, optional

Python dictionary representing the data of a Khiops Coclustering JSON report file. If not specified it returns an empty instance.

Note

Prefer either the the read_coclustering_results_file function from the core API to obtain an instance of this class from a Khiops Coclustering JSON file.

Attributes:
toolstr

Name of the Khiops tool that generated the JSON file.

versionstr

Version of the Khiops tool that generated the JSON file.

coclustering_reportCoclusteringReport

Coclustering modeling report.

write_report(stream_or_writer)

Writes the instance’s TSV report to a writer object

Parameters:
stream_or_writerio.IOBase or KhiopsOutputWriter

Output stream or writer.

write_report_file(report_file_path)

Writes a TSV report file with the object’s information

Parameters:
report_file_pathstr

Path of the output TSV report file.

khiops.core.coclustering_results.read_coclustering_results_file(json_file_path)

Reads a Khiops Coclustering JSON report

Parameters:
json_file_pathstr

Path of the JSON report file.

Returns:
CoclusteringResults

An instance of CoclusteringResults containing the report’s information.