sklearn.tables#

Submodule of khiops.sklearn

Classes for handling diverse data tables

Functions#

get_khiops_type

Translates a numpy type to a Khiops dictionary type

read_internal_data_table

Reads into a DataFrame a data table file with the internal format settings

write_internal_data_table

Writes a DataFrame to data table file with the internal format settings

Classes#

Dataset

A representation of a dataset

DatasetTable

A generic dataset table

FileTable

A table representing a delimited text file

NumpyTable

Table encapsulating (X,y) pair with types (ndarray, ndarray)

PandasTable

Table encapsulating (X,y) pair with types (pandas.DataFrame, pandas.Series)

class khiops.sklearn.tables.Dataset(X, y=None, categorical_target=True, key=None)#

Bases: object

A representation of a dataset

Parameters:
Xpandas.DataFrame or dict (Deprecated types: tuple and list)
Either:
  • A single dataframe

  • A dict dataset specification

ypandas.Series or str, optional

The target column.

categorical_targetbool, default True

True if the vector y should be considered as a categorical variable. If False it is considered as numeric. Ignored if y is None.

keystr

The name of the key column for all tables. Deprecated: Will be removed in pyKhiops 11.

copy()#

Creates a copy of the dataset

Referenced dataframes in tables are copied as references

create_khiops_dictionary_domain()#

Creates a Khiops dictionary domain representing this dataset

Returns:
DictionaryDomain

The dictionary domain object representing this dataset

create_table_files_for_khiops(target_dir, sort=True)#

Prepares the tables of the dataset to be used by Khiops

If this is a multi-table dataset it will create sorted copies the tables.

Parameters:
target_dirstr

The directory where the sorted tables will be created

Returns:
tuple

A tuple containing:

  • The path of the main table

  • A dictionary containing the relation [table-name -> file-path] for the secondary tables. The dictionary is empty for monotable datasets.

is_in_memory()#

Tests whether the dataset is in memory

A dataset is in memory if it is constituted either of only pandas.DataFrame tables or numpy.ndarray tables.

Returns:
bool

True if the dataset is constituted of pandas.DataFrame tables.

is_multitable()#

Tests whether the dataset is a multi-table one

Returns:
bool

True if the dataset is multi-table.

property target_column_type#

The target column’s type

class khiops.sklearn.tables.DatasetTable(name, categorical_target=True, key=None)#

Bases: ABC

A generic dataset table

check_key()#

Checks that the key columns exist

create_khiops_dictionary()#

Creates a Khiops dictionary representing this table

Returns:
Dictionary:

The Khiops Dictionary object describing this table’s schema

abstract create_table_file_for_khiops(output_dir, sort=True)#

Creates a copy of the table at the specified directory

n_features()#

Returns the number of features of the table

The target column does not count.

class khiops.sklearn.tables.FileTable(name, path, target_column_id=None, categorical_target=True, key=None, sep='\t', header=True)#

Bases: DatasetTable

A table representing a delimited text file

Parameters:
namestr

Name for the table.

pathstr

Path of the file containing the table.

sepstr, optional

Field separator character. If not specified it will be inferred from the file.

headerbool, optional

Indicates if the table

keylist-like of str, optional

The names of the columns composing the key

target_column_idstr, optional

Name of the target variable column.

categorical_targetbool, default True.

True if the target column is categorical.

create_table_file_for_khiops(output_dir, sort=True)#

Creates a copy of the table at the specified directory

class khiops.sklearn.tables.NumpyTable(name, array, key=None, target_column=None, categorical_target=True)#

Bases: DatasetTable

Table encapsulating (X,y) pair with types (ndarray, ndarray)

Parameters:
namestr

Name for the table.

arrayarray-like of shape (n_samples, n_features_in)

The data frame to be encapsulated.

key:external:term`array-like` of int, optional

The names of the columns composing the key

target_columnarray-like of shape (n_samples,) , optional

The series representing the target column.

categorical_targetbool, default True.

True if the target column is categorical.

create_table_file_for_khiops(output_dir, sort=True)#

Creates a copy of the table at the specified directory

get_khiops_variable_name(column_id)#

Return the khiops variable name associated to a column id

class khiops.sklearn.tables.PandasTable(name, dataframe, key=None, target_column=None, categorical_target=True)#

Bases: DatasetTable

Table encapsulating (X,y) pair with types (pandas.DataFrame, pandas.Series)

Parameters:
namestr

Name for the table.

dataframepandas.DataFrame

The data frame to be encapsulated.

keylist-like of str, optional

The names of the columns composing the key

target_columnarray-like, optional

The array containing the target column.

categorical_targetbool, default True.

True if the target column is categorical.

create_table_file_for_khiops(output_dir, sort=True)#

Creates a copy of the table at the specified directory

get_khiops_variable_name(column_id)#

Return the khiops variable name associated to a column id

khiops.sklearn.tables.get_khiops_type(numpy_type)#

Translates a numpy type to a Khiops dictionary type

Parameters:
numpy_typenumpy.dtype:

Numpy type of the column

Returns:
str

Khiops type name. Either “Categorical”, “Numerical” or “Timestamp”

khiops.sklearn.tables.read_internal_data_table(file_path_or_stream)#

Reads into a DataFrame a data table file with the internal format settings

The table is read with the following settings:

  • Use tab as separator

  • Read the column names from the first line

  • Use ‘”’ as quote character

  • Use csv.QUOTE_MINIMAL

  • double quoting enabled (quotes within quotes can be escaped with ‘””’)

  • UTF-8 encoding

Parameters:
file_path_or_streamstr or file object

The path of the internal data table file to be read or a readable file object.

Returns:
pandas.DataFrame

The dataframe representation.

khiops.sklearn.tables.write_internal_data_table(dataframe, file_path_or_stream)#

Writes a DataFrame to data table file with the internal format settings

The table is written with the following settings:

  • Use tab as separator

  • Write the column names on the first line

  • Use ‘”’ as quote character

  • Use csv.QUOTE_MINIMAL

  • double quoting enabled (quotes within quotes can be escaped with ‘””’)

  • UTF-8 encoding

  • The index is not written

Parameters:
dataframepandas.DataFrame

The dataframe to write.

file_path_or_streamstr or file object

The path of the internal data table file to be written or a writable file object.