core.coclustering_results¶
Submodule of khiops.core
Classes to access Khiops Coclustering JSON reports
Class Overview¶
Below we describe with diagrams the relationships of the classes in this modules. They are mostly compositions (has-a relations) and we omit native attributes (str, int, float, etc).
The main class of this module is CoclusteringResults
and it is largely a composition
of sub-reports objects given by the following structure:
CoclusteringResults
|- coclustering_report -> CoclusteringReport
CoclusteringReport
|- dimensions -> list of CoclusteringDimension
|- cells -> list of CoclusteringCell
CoclusteringDimension
|- parts -> list of CoclusteringDimensionPart
|- clusters -> list of CoclusteringCluster
|- root_cluster -> CoclusteringCluster
CoclusteringDimensionPartValueGroup
|- values -> list of CoclusteringDimensionPartValue
CoclusteringCluster
|- leaf_part -> CoclusteringDimensionPart or None
|- parent_cluster |
|- child_cluster1 |-> CoclusteringCluster or None
|- child_cluster2 |
To have a complete illustration of the access to the information of all classes in this
module look at their write_report
methods which write TSV (tab separated values)
reports.
Functions¶
Reads a Khiops Coclustering JSON report |
Classes¶
A coclustering cell |
|
A cluster in a coclustering dimension hierarchy |
|
A coclustering dimension (variable) |
|
An element of a partition of a dimension |
|
An interval of a numerical partition |
|
A specific value of a variable in a dimension value group. |
|
A value group of a categorical partition |
|
Main coclustering report |
|
Main class containing the information of a Khiops Coclustering JSON file |
- class khiops.core.coclustering_results.CoclusteringCell¶
Bases:
object
A coclustering cell
Note
This class has only a no-parameter constructor initializing an instance with the default values.
- Attributes:
- partslist of
CoclusteringDimensionPart
Parts for each coclustering dimension.
- frequencyint
Frequency of this cell.
- partslist of
- write_line(writer)¶
Writes a line of the instance’s report to a writer object
- Parameters:
- writer
KhiopsOutputWriter
Output writer.
- writer
- class khiops.core.coclustering_results.CoclusteringCluster(json_data=None)¶
Bases:
object
A cluster in a coclustering dimension hierarchy
- Parameters:
- json_datadict, optional
JSON data of an element of the list at the
dimensionHierarchies
field within thecoclusteringReport
field of a Khiops Coclustering JSON report file. If not specified it returns an empty instance.
- Attributes:
- namestr
Name of the cluster.
- parent_cluster_namestr
Name of the parent cluster.
- frequencyint
Number of individuals in the cluster.
- interestfloat
The cluster’s interest/informativeness.
- hierarchical_levelfloat
A measure interpretable as the distance of the cluster to the root. Between 0 and 1.
- rankint
Rank of clusters in the top-down list of clusters, with the smallest ranks at the top.
- hierarchical_rankint
Rank of clusters in the hierarchy, with the smallest ranks being the closest from the root of the hierarchy.
- is_leafbool
True
if the cluster is a leaf of the hierarchy.- short_descriptionstr
Succinct cluster description.
- descriptionstr
Cluster description.
- leaf_part
CoclusteringDimensionPart
On a leaf cluster: Its unique associated partition element. Otherwise
None
.- parent_cluster
CoclusteringCluster
On a non-root cluster: Its unique parent cluster. Otherwise
None
.- child_cluster1
CoclusteringCluster
On a non-leaf cluster : The first child cluster. Otherwise
None
.- child_cluster2
CoclusteringCluster
On a non-leaf cluster : The second child cluster. Otherwise
None
.
- write_annotation_header_line(writer)¶
Writes the “annotation” section’s header to a writer object
- Parameters:
- writer
KhiopsOutputWriter
Output writer for the report file.
- writer
- write_annotation_line(writer)¶
Writes a line of the “annotation” section to a writer object
- Parameters:
- writer
KhiopsOutputWriter
Output writer for the report file.
- writer
- write_hierarchy_header_line(writer)¶
Writes the “hierarchy” section’s header to a writer object
- Parameters:
- writer
KhiopsOutputWriter
Output writer for the report file.
- writer
- write_hierarchy_line(writer)¶
Writes a line of the “hierarchy” section to a writer object
- Parameters:
- writer
KhiopsOutputWriter
Output writer for the report file.
- writer
- write_hierarchy_structure_report(writer)¶
Writes the hierarchical structure from this instance to a writer object
This method is mainly a test of the encoding of the cluster hierarchy.
- Parameters:
- writer
KhiopsOutputWriter
Output writer for the report file.
- writer
- class khiops.core.coclustering_results.CoclusteringDimension¶
Bases:
object
A coclustering dimension (variable)
A coclustering dimension is a hierarchical clustering of an input variable. The leafs of this hierarchy are linked to an element of a partition of the input variable. Leaf clusters have variable parts as their children.
It only has a no-parameter constructor.
Note
The instance information is initialized with the
init_summary
,init_partition
andinit_hierarchy
methods. Its owner object (classCoclusteringReport
) uses the information found in the fieldsdimensionSummaries
,dimensionPartitions
anddimensionHierarchies
to coherently initialize the all dimensions with these methods.- Attributes:
- namestr
Name of the variable associated to this dimension.
- type“Numerical” or “Categorical”
Dimension type.
- part_numberint
Number of parts of the variable associated to this dimension.
- initial_part_numberint
Number of initial parts. Note that
part_number
<=initial_part_number
after a coclustering simplification (seesimplify_coclustering
).- value_numberint
Number of values of the dimension’s variable.
- interestfloat
Interest of the dimension with respect to the other coclustering dimensions.
- descriptionstr
Description of the dimension/variable.
- minfloat
Minimum value of a numerical dimension/variable.
- maxfloat
Maximum value of a numerical dimension/variable.
- partslist of
CoclusteringDimensionPart
Partition of this dimension.
- clusterslist of
CoclusteringCluster
Clusters of this dimension’s hierarchy. Note that includes intermediary clusters.
- root_cluster
CoclusteringCluster
Root cluster of the hierarchy.
- get_cluster(cluster_name)¶
Returns the specified cluster
- Parameters:
- cluster_namestr
Name of the cluster.
- Returns:
CoclusteringCluster
The specified cluster.
- Raises:
KeyError
If there is no cluster with the specified name.
- get_part(part_name)¶
Returns a part of the dimension given the part’s name
- Parameters:
- part_namestr
Name of the part.
- Returns:
CoclusteringDimensionPart
The part with the specified name.
- Raises:
KeyError
If there is no part with the specified name.
- init_hierarchy(json_data)¶
Initializes the hierarchy attributes from a Python JSON object
- Parameters:
- json_datadict, optional
Python dictionary representing the data of an element of the list found at the
dimensionHierarchies
field of a Khiops Coclustering JSON report file. If not specified it leaves the object as-is.
- Returns:
- self
A reference to the caller instance.
- init_partition(json_data=None)¶
Initializes the partition attributes from a Python JSON object
- Parameters:
- json_datadict, optional
Python dictionary representing the data of an element of the list found at the
dimensionPartitions
field of a Khiops Coclustering JSON report file. If not specified it leaves the object as-is.
- Returns:
- self
A reference to the caller instance.
- init_summary(json_data=None)¶
Initializes the summary attributes from a Python JSON object
- Parameters:
- json_datadict, optional
Dictionary representing the data of an element of the list found at the
dimensionSummaries
field of a Khiops Coclustering JSON report file. If not specified it leaves the object as-is.
- Returns:
- self
A reference to the caller instance.
- needs_annotation_report()¶
Status about the annotation report
- Returns:
- bool
True if the “annotation” section is reported
- write_annotation(writer)¶
Writes the “annotation” section to a writer object
- Parameters:
- writer
KhiopsOutputWriter
Output writer for the report file.
- writer
- write_composition(writer)¶
Writes the “composition” section to a writer object
- Parameters:
- writer
KhiopsOutputWriter
Output writer for the report file.
- writer
- write_dimension_header_line(writer)¶
Writes the “dimensions” section header to a writer object
- Parameters:
- writer
KhiopsOutputWriter
Output writer for the report file.
- writer
- write_dimension_line(writer)¶
Writes the “dimensions” section line to a writer object
- Parameters:
- writer
KhiopsOutputWriter
Output writer for the report file.
- writer
- write_hierarchy(writer)¶
Writes the “hierarchy” section to a writer object
- Parameters:
- writer
KhiopsOutputWriter
Output writer for the report file.
- writer
- write_hierarchy_structure_report_file(report_file_path)¶
Writes the hierarchical structure of the clusters to a file
This method is mainly a test of the encoding of the cluster hierarchy.
- Parameters:
- report_file_pathstr
Path of the output file.
- class khiops.core.coclustering_results.CoclusteringDimensionPart(json_data=None)¶
Bases:
object
An element of a partition of a dimension
Abstract class.
- Parameters:
- json_data: dict, optional
See child classes for specific information about this parameter.
- Attributes:
- cluster_namestr
Name of the cluster to which this part belongs.
- class khiops.core.coclustering_results.CoclusteringDimensionPartInterval(json_data=None)¶
Bases:
CoclusteringDimensionPart
An interval of a numerical partition
- Parameters:
- json_datadict, optional
Python dictionary representing an element of type “Numerical” of the list at the
dimensionPartitions
field of a Khiops Coclustering JSON report file. If not specifed it returns an empty instance.
- Attributes:
- cluster_namestr
Name of the cluster containing this interval.
- lower_boundfloat
Lower bound of the interval.
- upper_boundfloat
Upper bound of the interval.
- is_missingbool
True if the instance’s represent the missing values. In this case
lower_bound
andupper_bound
are set toNone
.- is_left_openbool
True if the interval is unbounded below
lower_bound
may contain the minimum value of the training data.- is_right_openbool
True if the interval is unbounded above
upper_bound
may contain the maximum value of training data.
- Raises:
KhiopsJSONError
If
json_data
does not contain a “cluster” key.
- part_type()¶
Part type of this instance
- Returns:
- str
Only possible value: “Interval”.
- class khiops.core.coclustering_results.CoclusteringDimensionPartValue¶
Bases:
object
A specific value of a variable in a dimension value group.
Note
This class has only a no-parameter constructor initializing an instance with the default values.
- Attributes:
- valuestr
String representation of the value.
- frequencyint
Number of individuals having this value.
- typicalityfloat
Indicates how much the value is representative of the cluster. Ranges from 0 to 1, 1 being completely representative.
- class khiops.core.coclustering_results.CoclusteringDimensionPartValueGroup(json_data=None)¶
Bases:
CoclusteringDimensionPart
A value group of a categorical partition
- Parameters:
- json_datadict, optional
Python dictionary representing an element of type “Categorical” of the list at the
dimensionPartitions
field of a Khiops Coclustering JSON report file. If None it returns an empty instance.
- Attributes:
- cluster_namestr
Name of the cluster containing this group.
- valueslist of
CoclusteringDimensionPartValue
The singleton parts composing this group part.
- is_default_partbool
True if the instance represents the “unknown values” group.
- Raises:
KhiopsJSONError
If
json_data
does not contain a “cluster” key.
- part_type()¶
Part type of this instance
- Returns:
- str
Only possible value: “Value group”.
- class khiops.core.coclustering_results.CoclusteringReport(json_data=None)¶
Bases:
object
Main coclustering report
A coclustering is an unsupervised data grid equipped with additional structures to ease its exploration. In particular, it is a piecewise constant density estimator of the data distribution. The additional structures are the following:
A cluster hierarchy for each dimension
Indicators (such as the interest) for each variable, part and value.
A coclustering consists of one to many variables (dimensions), where each variable is partitioned as:
Intervals in the numerical case
Individual values or value groups in the categorical case.
The cross-product of the partitions forms a multivariate partition of cells and their frequencies allow to estimate the multivariate density.
In case of an unsupervised data grid, the cells are described by their index on the variable partitions, together with their frequencies.
- Parameters:
- json_datadict, optional
JSON data of the
coclusteringReport
field of a Khiops Coclustering JSON report file. If not specified it returns an empty instance.
- Attributes:
- instance_numberint
Number of individuals in the learning data table.
- cell_numberint
Number of coclustering cells.
- null_costfloat
Cost of the null model.
- levelfloat
Measure between 0 and 1 measuring the information gain over the null model.
- initial_dimension_numberint
Initial number of dimensions. The number of dimensions (
len(dimensions)
) may be less than this quantity after a simplification (seesimplify_coclustering
).- frequency_variablestr
Name of the variable to be aggregated in the cells. By default is the number of individuals.
- dictionarystr
Name dictionary from which the model was learned.
- databasestr
Path of the main training data table file.
- sample_percentagefloat
Percentage of instances used in training.
- sampling_mode“Include sample” or “Exclude samples”
Sampling mode used to split the train and datasets.
- selection_variablestr
Variable used to select instances for training.
- selection_valuestr
Value of
selection_variable
to select instances for training.- dimensionslist of
CoclusteringDimension
Coclustering dimensions (variable).
- cellslist of
CoclusteringCell
Coclustering cells.
- get_dimension(dimension_name)¶
Returns the specified dimension
- Parameters:
- dimension_namestr
Name of the dimension (variable).
- Returns:
CoclusteringDimension
The specified dimension.
- Raises:
KeyError
If no dimension with the specified names exist.
- get_dimension_names()¶
Returns the names of the available dimensions
- Returns:
- list of str
The names of the available dimensions.
- write_annotations(writer)¶
Writes the dimensions’ “annotation” sections to a writer object
- Parameters:
- writer
KhiopsOutputWriter
Output writer for the report file.
- writer
- write_bounds(writer)¶
Writes the “bounds” section of the TSV report to a writer object
- Parameters:
- writer
KhiopsOutputWriter
Output writer for the report file.
- writer
- write_cells(writer)¶
Writes the “cells” section of the TSV report to a writer object
- Parameters:
- writer
KhiopsOutputWriter
Output writer for the report file.
- writer
- write_coclustering_stats(writer)¶
Writes the “stats” section of the TSV report to a writer object
- Parameters:
- writer
KhiopsOutputWriter
Output writer for the report file.
- writer
- write_compositions(writer)¶
Writes the dimensions’ “composition” sections to a writer object
- Parameters:
- writer
KhiopsOutputWriter
Output writer for the report file.
- writer
- write_dimensions(writer)¶
Writes the “dimensions” section of the TSV report to a writer object
- Parameters:
- writer
KhiopsOutputWriter
Output writer for the report file.
- writer
- write_hierarchies(writer)¶
Writes the dimension reports’ “hierarchy” sections to a writer object
- Parameters:
- writer
KhiopsOutputWriter
Output writer for the report file.
- writer
- write_report(writer)¶
Writes the instance’s TSV report to a writer object
- Parameters:
- writer
KhiopsOutputWriter
Output stream or writer.
- writer
- class khiops.core.coclustering_results.CoclusteringResults(json_data=None)¶
Bases:
KhiopsJSONObject
Main class containing the information of a Khiops Coclustering JSON file
- Parameters:
- json_datadict, optional
Python dictionary representing the data of a Khiops Coclustering JSON report file. If not specified it returns an empty instance.
Note
Prefer either the the
read_coclustering_results_file
function from the core API to obtain an instance of this class from a Khiops Coclustering JSON file.
- Attributes:
- toolstr
Name of the Khiops tool that generated the JSON file.
- versionstr
Version of the Khiops tool that generated the JSON file.
- coclustering_report
CoclusteringReport
Coclustering modeling report.
- write_report(stream_or_writer)¶
Writes the instance’s TSV report to a writer object
- Parameters:
- stream_or_writer
io.IOBase
orKhiopsOutputWriter
Output stream or writer.
- stream_or_writer
- write_report_file(report_file_path)¶
Writes a TSV report file with the object’s information
- Parameters:
- report_file_pathstr
Path of the output TSV report file.
- khiops.core.coclustering_results.read_coclustering_results_file(json_file_path)¶
Reads a Khiops Coclustering JSON report
- Parameters:
- json_file_pathstr
Path of the JSON report file.
- Returns:
CoclusteringResults
An instance of CoclusteringResults containing the report’s information.