sklearn.tables¶
Submodule of khiops.sklearn
Classes for handling diverse data tables
Functions¶
Translates a numpy type to a Khiops dictionary type |
|
Reads into a DataFrame a data table file with the internal format settings |
|
Writes a DataFrame to data table file with the internal format settings |
Classes¶
A representation of a dataset |
|
A generic dataset table |
|
A table representing a delimited text file |
|
Table encapsulating (X,y) pair with types (ndarray, ndarray) |
|
Table encapsulating the features dataframe X and the target labels y |
|
Table encapsulating feature matrix X and target array y |
- class khiops.sklearn.tables.Dataset(X, y=None, categorical_target=True, key=None)¶
Bases:
object
A representation of a dataset
- Parameters:
- X
pandas.DataFrame
or dict (Deprecated types: tuple and list) - Either:
A single dataframe
A
dict
dataset specification
- y
pandas.Series
or str, optional The target column.
- categorical_targetbool, default True
True
if the vectory
should be considered as a categorical variable. IfFalse
it is considered as numeric. Ignored ify
isNone
.- keystr
The name of the key column for all tables. Deprecated: Will be removed in pyKhiops 11.
- X
- copy()¶
Creates a copy of the dataset
Referenced dataframes in tables are copied as references
- create_khiops_dictionary_domain()¶
Creates a Khiops dictionary domain representing this dataset
- Returns:
DictionaryDomain
The dictionary domain object representing this dataset
- create_table_files_for_khiops(target_dir, sort=True)¶
Prepares the tables of the dataset to be used by Khiops
If this is a multi-table dataset it will create sorted copies the tables.
- Parameters:
- target_dirstr
The directory where the sorted tables will be created
- Returns:
- tuple
A tuple containing:
The path of the main table
A dictionary containing the relation [table-name -> file-path] for the secondary tables. The dictionary is empty for monotable datasets.
- is_in_memory()¶
Tests whether the dataset is in memory
A dataset is in memory if it is constituted either of only pandas.DataFrame tables, numpy.ndarray, or scipy.sparse.spmatrix tables.
- Returns:
- bool
True
if the dataset is constituted of pandas.DataFrame tables.
- is_multitable()¶
Tests whether the dataset is a multi-table one
- Returns:
- bool
True
if the dataset is multi-table.
- property target_column_type¶
The target column’s type
- class khiops.sklearn.tables.DatasetTable(name, categorical_target=True, key=None)¶
Bases:
ABC
A generic dataset table
- check_key()¶
Checks that the key columns exist
- create_khiops_dictionary()¶
Creates a Khiops dictionary representing this table
- Returns:
Dictionary
:The Khiops Dictionary object describing this table’s schema
- abstract create_table_file_for_khiops(output_dir, sort=True)¶
Creates a copy of the table at the specified directory
- n_features()¶
Returns the number of features of the table
The target column does not count.
- class khiops.sklearn.tables.FileTable(name, path, target_column_id=None, categorical_target=True, key=None, sep='\t', header=True)¶
Bases:
DatasetTable
A table representing a delimited text file
- Parameters:
- namestr
Name for the table.
- pathstr
Path of the file containing the table.
- sepstr, optional
Field separator character. If not specified it will be inferred from the file.
- headerbool, optional
Indicates if the table
- keylist-like of str, optional
The names of the columns composing the key
- target_column_idstr, optional
Name of the target variable column.
- categorical_targetbool, default
True
. True
if the target column is categorical.
- create_table_file_for_khiops(output_dir, sort=True)¶
Creates a copy of the table at the specified directory
- class khiops.sklearn.tables.NumpyTable(name, array, key=None, target_column=None, categorical_target=True)¶
Bases:
DatasetTable
Table encapsulating (X,y) pair with types (ndarray, ndarray)
- Parameters:
- namestr
Name for the table.
- arrayarray-like of shape (n_samples, n_features_in)
The data frame to be encapsulated.
- key:external:term`array-like` of int, optional
The names of the columns composing the key
- target_columnarray-like of shape (n_samples,) , optional
The series representing the target column.
- categorical_targetbool, default
True
. True
if the target column is categorical.
- create_table_file_for_khiops(output_dir, sort=True)¶
Creates a copy of the table at the specified directory
- get_khiops_variable_name(column_id)¶
Return the khiops variable name associated to a column id
- class khiops.sklearn.tables.PandasTable(name, dataframe, key=None, target_column=None, categorical_target=True)¶
Bases:
DatasetTable
Table encapsulating the features dataframe X and the target labels y
X is of type pandas.DataFrame. y is of type pandas.Series or pandas.DataFrame.
- Parameters:
- namestr
Name for the table.
- dataframe
pandas.DataFrame
The data frame to be encapsulated.
- keylist-like of str, optional
The names of the columns composing the key
- target_columnarray-like, optional
The array containing the target column.
- categorical_targetbool, default
True
. True
if the target column is categorical.
- create_table_file_for_khiops(output_dir, sort=True)¶
Creates a copy of the table at the specified directory
- get_khiops_variable_name(column_id)¶
Return the khiops variable name associated to a column id
- class khiops.sklearn.tables.SparseTable(name, matrix, key=None, target_column=None, categorical_target=True)¶
Bases:
DatasetTable
Table encapsulating feature matrix X and target array y
X is of type scipy.sparse.spmatrix. y is array-like.
- Parameters:
- namestr
Name for the table.
- matrix
scipy.sparse.spmatrix
The sparse matrix to be encapsulated.
- keylist-like of str, optional
The names of the columns composing the key
- target_columnarray-like, optional
The array containing the target column.
- categorical_targetbool, default
True
. True
if the target column is categorical.
- create_khiops_dictionary()¶
Creates a Khiops dictionary representing this sparse table
Adds metadata to each sparse variable
- Returns:
Dictionary
:The Khiops Dictionary object describing this table’s schema
- create_table_file_for_khiops(output_dir, sort=True)¶
Creates a copy of the table at the specified directory
- get_khiops_variable_name(column_id)¶
Return the khiops variable name associated to a column id
- khiops.sklearn.tables.get_khiops_type(numpy_type)¶
Translates a numpy type to a Khiops dictionary type
- Parameters:
- numpy_type
numpy.dtype
: Numpy type of the column
- numpy_type
- Returns:
- str
Khiops type name. Either “Categorical”, “Numerical” or “Timestamp”
- khiops.sklearn.tables.read_internal_data_table(file_path_or_stream)¶
Reads into a DataFrame a data table file with the internal format settings
The table is read with the following settings:
Use tab as separator
Read the column names from the first line
Use ‘”’ as quote character
double quoting enabled (quotes within quotes can be escaped with ‘””’)
UTF-8 encoding
- Parameters:
- file_path_or_streamstr or file object
The path of the internal data table file to be read or a readable file object.
- Returns:
pandas.DataFrame
The dataframe representation.
- khiops.sklearn.tables.write_internal_data_table(dataframe, file_path_or_stream)¶
Writes a DataFrame to data table file with the internal format settings
The table is written with the following settings:
Use tab as separator
Write the column names on the first line
Use ‘”’ as quote character
double quoting enabled (quotes within quotes can be escaped with ‘””’)
UTF-8 encoding
The index is not written
- Parameters:
- dataframe
pandas.DataFrame
The dataframe to write.
- file_path_or_streamstr or file object
The path of the internal data table file to be written or a writable file object.
- dataframe