Core Basics 4: Train a Coclustering

The steps to train a coclustering model with Khiops are very similar to what we have already seen in the basic classifier tutorials.

Make sure you have installed Khiops and Khiops CoVisualization.

We start by importing Khiops, checking its installation and defining some helper functions:

import os
import platform
import subprocess
from khiops import core as kh

# Define helper functions
def peek(file_path, n=10):
    """Shows the first n lines of a file"""
    with open(file_path, encoding="utf8", errors="replace") as file:
        for line in file.readlines()[:n]:
            print(line, end="")
    print("")


# If there are any issues you may Khiops status with the following command
# kh.get_runner().print_status()

As stated before, sometimes it is better to have a more adapted visualization for an unsupervised analysis. We illustrate this point with the dataset CountriesByOrganization that contains the relation country-organization for a large number of organizations and countries (it is bit outdated though)

countries_kdic = os.path.join(
    "data", "CountriesByOrganization", "CountriesByOrganization.kdic"
)
countries_data_file = os.path.join(
    "data", "CountriesByOrganization", "CountriesByOrganization.csv"
)

print(f"CountriesByOrganization dictionary file location: {countries_kdic}")
print("")
peek(countries_kdic)

print(f"CountriesByOrganization data table file location: {countries_data_file}")
print("")
peek(countries_data_file)
CountriesByOrganization dictionary file location: data/CountriesByOrganization/CountriesByOrganization.kdic

Dictionary  CountriesByOrganization
{
    Categorical     Country;
    Categorical     Organization;
};

CountriesByOrganization data table file location: data/CountriesByOrganization/CountriesByOrganization.csv

Country;Organization
Afghanistan;AsDB
Afghanistan;COLOMBO
Afghanistan;ECO
Afghanistan;ICCROM
Afghanistan;NAM
Afghanistan;PIARC
Afghanistan;SAARC
Afghanistan;WHO
Afghanistan;UN

We now create a coclustering model for this dataset

countries_results_dir = os.path.join("exercises", "CountriesByOrganization")

countries_cc_report = kh.train_coclustering(
    countries_kdic,
    dictionary_name="CountriesByOrganization",
    data_table_path=countries_data_file,
    coclustering_variables=["Country", "Organization"],
    results_dir=countries_results_dir,
    field_separator=";",
)

We can now browse the results with the Khiops Covisualization app:

# To visualize uncomment the line below
# kh.visualize_report(countries_cc_report)

We can now dump the country clusters and its metrics to a file with the extract_clusters function

country_clusters_file = os.path.join(
    "exercises", "CountriesByOrganization", "CountryClusters.txt"
)
kh.extract_clusters(
    countries_cc_report,
    cluster_variable="Country",
    clusters_file_path=country_clusters_file,
)
peek(country_clusters_file, n=100)
Cluster     Value   Frequency       Typicality
{Germany, France, Netherlands, ...} Germany 106     1
{Germany, France, Netherlands, ...} France  125     0.979874
{Germany, France, Netherlands, ...} Netherlands     105     0.957798
{Germany, France, Netherlands, ...} Denmark 101     0.951486
{Germany, France, Netherlands, ...} Sweden  102     0.945064
{Germany, France, Netherlands, ...} Belgium 104     0.918546
{Germany, France, Netherlands, ...} Finland 100     0.890127
{Germany, France, Netherlands, ...} Italy   105     0.882327
{Germany, France, Netherlands, ...} Norway  96      0.875153
{Germany, France, Netherlands, ...} Spain   103     0.857387
{Germany, France, Netherlands, ...} Portugal        94      0.760038
{Germany, France, Netherlands, ...} Austria 88      0.757942
{Germany, France, Netherlands, ...} United Kingdom  102     0.749886
{Germany, France, Netherlands, ...} Luxembourg      81      0.735935
{Germany, France, Netherlands, ...} Switzerland     90      0.734097
{Germany, France, Netherlands, ...} Greece  87      0.697847
{Germany, France, Netherlands, ...} Ireland 75      0.641185
{Germany, France, Netherlands, ...} Iceland 55      0.434573
{United States of America, Canada, Japan, ...}      United States of America        92      1
{United States of America, Canada, Japan, ...}      Canada  85      0.81184
{United States of America, Canada, Japan, ...}      Japan   81      0.750277
{United States of America, Canada, Japan, ...}      Australia       75      0.745923
{United States of America, Canada, Japan, ...}      New Zealand     60      0.535378
{United States of America, Canada, Japan, ...}      South Korea     69      0.507395
{United States of America, Canada, Japan, ...}      Taiwan  7       0.111344
{United States of America, Canada, Japan, ...}       *      0       0
{Poland, Hungary, Turkey, ...}      Poland  79      1
{Poland, Hungary, Turkey, ...}      Hungary 72      0.901571
{Poland, Hungary, Turkey, ...}      Turkey  78      0.896009
{Poland, Hungary, Turkey, ...}      Czech Republic  64      0.87056
{Poland, Hungary, Turkey, ...}      Russia  80      0.848669
{Poland, Hungary, Turkey, ...}      Bulgaria        70      0.846619
{Poland, Hungary, Turkey, ...}      Romania 69      0.846295
{Poland, Hungary, Turkey, ...}      Slovakia        58      0.789467
{Poland, Hungary, Turkey, ...}      Slovenia        56      0.719738
{Poland, Hungary, Turkey, ...}      Ukraine 53      0.692657
{Poland, Hungary, Turkey, ...}      Croatia 57      0.675641
{Poland, Hungary, Turkey, ...}      Estonia 46      0.620249
{Poland, Hungary, Turkey, ...}      Latvia  45      0.609433
{Poland, Hungary, Turkey, ...}      Lithuania       43      0.544669
{Poland, Hungary, Turkey, ...}      Albania 47      0.461515
{Poland, Hungary, Turkey, ...}      Cyprus  62      0.45359
{Poland, Hungary, Turkey, ...}      Serbia  42      0.451178
{Poland, Hungary, Turkey, ...}      Israel  57      0.444308
{Poland, Hungary, Turkey, ...}      Macedonia       39      0.431943
{Poland, Hungary, Turkey, ...}      Malta   52      0.411462
{Poland, Hungary, Turkey, ...}      Liechtenstein   20      0.332344
{Poland, Hungary, Turkey, ...}      Monaco  32      0.317019
{Poland, Hungary, Turkey, ...}      Bosnia and Herzegovina  33      0.28569
{Poland, Hungary, Turkey, ...}      San Marino      17      0.157491
{Poland, Hungary, Turkey, ...}      Andorra 13      0.152848
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...}   Kazakhstan      47      1
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...}   Kyrgyzstan      45      0.952391
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...}   Azerbaijan      41      0.88496
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...}   Uzbekistan      41      0.876753
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...}   Moldova 47      0.862644
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...}   Tajikistan      35      0.807963
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...}   Turkmenistan    35      0.803721
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...}   Belarus 38      0.751301
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...}   Georgia 42      0.741386
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...}   Armenia 37      0.722033
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...}   Mongolia        36      0.357377
{Venezuela, Nicaragua, Ecuador, ...}        Venezuela       87      1
{Venezuela, Nicaragua, Ecuador, ...}        Nicaragua       73      0.966554
{Venezuela, Nicaragua, Ecuador, ...}        Ecuador 79      0.947947
{Venezuela, Nicaragua, Ecuador, ...}        Costa Rica      74      0.937068
{Venezuela, Nicaragua, Ecuador, ...}        Colombia        83      0.936774
{Venezuela, Nicaragua, Ecuador, ...}        Bolivia 72      0.909502
{Venezuela, Nicaragua, Ecuador, ...}        Guatemala       71      0.909374
{Venezuela, Nicaragua, Ecuador, ...}        Mexico  87      0.908228
{Venezuela, Nicaragua, Ecuador, ...}        Panama  72      0.904032
{Venezuela, Nicaragua, Ecuador, ...}        Peru    79      0.88861
{Venezuela, Nicaragua, Ecuador, ...}        Brazil  86      0.871975
{Venezuela, Nicaragua, Ecuador, ...}        Argentina       84      0.856383
{Venezuela, Nicaragua, Ecuador, ...}        Honduras        67      0.850245
{Venezuela, Nicaragua, Ecuador, ...}        El Salvador     64      0.841953
{Venezuela, Nicaragua, Ecuador, ...}        Uruguay 72      0.825788
{Venezuela, Nicaragua, Ecuador, ...}        Chile   80      0.824771
{Venezuela, Nicaragua, Ecuador, ...}        Paraguay        65      0.79883
{Venezuela, Nicaragua, Ecuador, ...}        Dominican Republic      67      0.735771
{Venezuela, Nicaragua, Ecuador, ...}        Cuba    63      0.527546
{Venezuela, Nicaragua, Ecuador, ...}        Haiti   62      0.502219
{Trinidad and Tobago, Barbados, Grenada, ...}       Trinidad and Tobago     63      1
{Trinidad and Tobago, Barbados, Grenada, ...}       Barbados        56      0.991871
{Trinidad and Tobago, Barbados, Grenada, ...}       Grenada 50      0.920456
{Trinidad and Tobago, Barbados, Grenada, ...}       Jamaica 63      0.906127
{Trinidad and Tobago, Barbados, Grenada, ...}       Belize  51      0.835825
{Trinidad and Tobago, Barbados, Grenada, ...}       Guyana  56      0.811175
{Trinidad and Tobago, Barbados, Grenada, ...}       Dominica        47      0.808271
{Trinidad and Tobago, Barbados, Grenada, ...}       Antigua and Barbuda     43      0.806807
{Trinidad and Tobago, Barbados, Grenada, ...}       Saint Lucia     46      0.777665
{Trinidad and Tobago, Barbados, Grenada, ...}       Saint Vincent and the Grenadines        41      0.771643
{Trinidad and Tobago, Barbados, Grenada, ...}       The Bahamas     49      0.747276
{Trinidad and Tobago, Barbados, Grenada, ...}       Suriname        48      0.694766
{Trinidad and Tobago, Barbados, Grenada, ...}       Saint Kitts and Nevis   36      0.689034
{Niger, Ivory Coast, Benin, ...}    Niger   66      1
{Niger, Ivory Coast, Benin, ...}    Ivory Coast     83      0.991187
{Niger, Ivory Coast, Benin, ...}    Benin   76      0.985146
{Niger, Ivory Coast, Benin, ...}    Burkina Faso    75      0.98505

Exercise

We’ll build a coclustering for the Tokyo2021 dataset. It is extracted for the Athletes table of the Tokyo 2021 Kaggle dataset and each record contains three variables: - Name: the name of a competing athlete - Country: the country (or organization) it represents - Discipline: the athletes discipline

The idea for this exercise is to make a coclustering between Country and Discipline and see which countries resemble the most in terms of the athletes they bring to the Olympics.

We start by saving the dataset dictionary file and data table location into variables:

tokyo_kdic = os.path.join("data", "Tokyo2021", "Athletes.kdic")
tokyo_data_file = os.path.join("data", "Tokyo2021", "Athletes.csv")
tokyo_results_dir = os.path.join("exercises", "Tokyo2021")

peek the contents of the dictionary and data files

print(f"Tokyo2021 dictionary file: {tokyo_kdic}")
print("")
peek(tokyo_kdic, n=15)

print(f"Tokyo data table file: {tokyo_data_file}")
print("")
peek(tokyo_data_file)
Tokyo2021 dictionary file: data/Tokyo2021/Athletes.kdic

Dictionary  Athletes
{
    Categorical     Name;
    Categorical     Country;
    Categorical     Discipline;
};

Tokyo data table file: data/Tokyo2021/Athletes.csv

Name,Country,Discipline
AALERUD Katrine,Norway,Cycling Road
ABAD Nestor,Spain,Artistic Gymnastics
ABAGNALE Giovanni,Italy,Rowing
ABALDE Alberto,Spain,Basketball
ABALDE Tamara,Spain,Basketball
ABALO Luc,France,Handball
ABAROA Cesar,Chile,Rowing
ABASS Abobakr,Sudan,Swimming
ABBASALI Hamideh,Islamic Republic of Iran,Karate

Train the coclustering for the variables Country and Discipline

Do not forget that the separator is ,

tokyo_cc_report = kh.train_coclustering(
    tokyo_kdic,
    dictionary_name="Athletes",
    coclustering_variables=["Country", "Discipline"],
    data_table_path=tokyo_data_file,
    results_dir=tokyo_results_dir,
    field_separator=",",
)

You may see the coclustering with the covisualization app:

# To visualize uncomment the line below
# kh.visualize_report(tokyo_cc_report)

Use extract_clusters to extract the country clusters and peek its contents

tokyo_country_clusters_file = os.path.join(
    "exercises", "Tokyo2021", "CountryClusters.txt"
)

kh.extract_clusters(
    tokyo_cc_report,
    cluster_variable="Country",
    clusters_file_path=tokyo_country_clusters_file,
)
peek(tokyo_country_clusters_file, n=200)
Cluster     Value   Frequency       Typicality
{Ghana, Kosovo, Republic of Moldova, ...}   Ghana   14      1
{Ghana, Kosovo, Republic of Moldova, ...}   Kosovo  10      0.899515
{Ghana, Kosovo, Republic of Moldova, ...}   Republic of Moldova     19      0.896598
{Ghana, Kosovo, Republic of Moldova, ...}   Tajikistan      8       0.880054
{Ghana, Kosovo, Republic of Moldova, ...}   Cameroon        11      0.879744
{Ghana, Kosovo, Republic of Moldova, ...}   Jordan  11      0.853091
{Ghana, Kosovo, Republic of Moldova, ...}   Turkmenistan    8       0.84905
{Ghana, Kosovo, Republic of Moldova, ...}   Guatemala       22      0.828027
{Ghana, Kosovo, Republic of Moldova, ...}   Pakistan        10      0.80784
{Ghana, Kosovo, Republic of Moldova, ...}   Niger   7       0.804549
{Ghana, Kosovo, Republic of Moldova, ...}   Bosnia and Herzegovina  7       0.796023
{Ghana, Kosovo, Republic of Moldova, ...}   Haiti   6       0.774886
{Ghana, Kosovo, Republic of Moldova, ...}   Madagascar      6       0.774886
{Ghana, Kosovo, Republic of Moldova, ...}   Lebanon 6       0.76636
{Ghana, Kosovo, Republic of Moldova, ...}   Albania 8       0.755625
{Ghana, Kosovo, Republic of Moldova, ...}   Panama  9       0.739769
{Ghana, Kosovo, Republic of Moldova, ...}   Gabon   5       0.72859
{Ghana, Kosovo, Republic of Moldova, ...}   North Macedonia 8       0.719399
{Ghana, Kosovo, Republic of Moldova, ...}   Democratic Republic of the Congo        7       0.719013
{Ghana, Kosovo, Republic of Moldova, ...}   Burundi 6       0.717007
{Ghana, Kosovo, Republic of Moldova, ...}   Malawi  5       0.712125
{Ghana, Kosovo, Republic of Moldova, ...}   Nepal   5       0.712125
{Ghana, Kosovo, Republic of Moldova, ...}   Tonga   5       0.707212
{Ghana, Kosovo, Republic of Moldova, ...}   Burkina Faso    7       0.706431
{Ghana, Kosovo, Republic of Moldova, ...}   Benin   7       0.703938
{Ghana, Kosovo, Republic of Moldova, ...}   Mauritius       7       0.70232
{Ghana, Kosovo, Republic of Moldova, ...}   Nicaragua       8       0.700832
{Ghana, Kosovo, Republic of Moldova, ...}   Qatar   14      0.695168
{Ghana, Kosovo, Republic of Moldova, ...}   Kuwait  10      0.692288
{Ghana, Kosovo, Republic of Moldova, ...}   Bangladesh      6       0.689707
{Ghana, Kosovo, Republic of Moldova, ...}   Grenada 6       0.68711
{Ghana, Kosovo, Republic of Moldova, ...}   Antigua and Barbuda     6       0.677322
{Ghana, Kosovo, Republic of Moldova, ...}   Lao People's Democratic Republic        4       0.67469
{Ghana, Kosovo, Republic of Moldova, ...}   Sierra Leone    4       0.67469
{Ghana, Kosovo, Republic of Moldova, ...}   Papua New Guinea        7       0.670767
{Ghana, Kosovo, Republic of Moldova, ...}   Malta   6       0.662244
{Ghana, Kosovo, Republic of Moldova, ...}   Seychelles      5       0.661433
{Ghana, Kosovo, Republic of Moldova, ...}   Eswatini        4       0.661251
{Ghana, Kosovo, Republic of Moldova, ...}   United Arab Emirates    4       0.660965
{Ghana, Kosovo, Republic of Moldova, ...}   Guam    5       0.660256
{Ghana, Kosovo, Republic of Moldova, ...}   Guinea  5       0.660256
{Ghana, Kosovo, Republic of Moldova, ...}   Mozambique      8       0.659989
{Ghana, Kosovo, Republic of Moldova, ...}   Guyana  7       0.659306
{Ghana, Kosovo, Republic of Moldova, ...}   Cape Verde      6       0.658089
{Ghana, Kosovo, Republic of Moldova, ...}   Palestine       4       0.652895
{Ghana, Kosovo, Republic of Moldova, ...}   Afghanistan     5       0.651983
{Ghana, Kosovo, Republic of Moldova, ...}   Oman    5       0.651983
{Ghana, Kosovo, Republic of Moldova, ...}   El Salvador     5       0.647994
{Ghana, Kosovo, Republic of Moldova, ...}   Sudan   5       0.647511
{Ghana, Kosovo, Republic of Moldova, ...}   Iceland 4       0.644787
{Ghana, Kosovo, Republic of Moldova, ...}   Virgin Islands, US      4       0.644787
{Ghana, Kosovo, Republic of Moldova, ...}   Sri Lanka       9       0.63606
{Ghana, Kosovo, Republic of Moldova, ...}   Djibouti        4       0.627987
{Ghana, Kosovo, Republic of Moldova, ...}   Mali    4       0.614548
{Ghana, Kosovo, Republic of Moldova, ...}   Aruba   3       0.613398
{Ghana, Kosovo, Republic of Moldova, ...}   Cambodia        3       0.60769
{Ghana, Kosovo, Republic of Moldova, ...}   Democratic Republic of Timor-Leste      3       0.60769
{Ghana, Kosovo, Republic of Moldova, ...}   Federated States of Micronesia  3       0.60769
{Ghana, Kosovo, Republic of Moldova, ...}   Palau   3       0.60769
{Ghana, Kosovo, Republic of Moldova, ...}   Uruguay 11      0.59931
{Ghana, Kosovo, Republic of Moldova, ...}   San Marino      4       0.59027
{Ghana, Kosovo, Republic of Moldova, ...}   Monaco  6       0.589476
{Ghana, Kosovo, Republic of Moldova, ...}   Solomon Islands 3       0.585894
{Ghana, Kosovo, Republic of Moldova, ...}   Rwanda  5       0.584187
{Ghana, Kosovo, Republic of Moldova, ...}   Bolivia 5       0.577973
{Ghana, Kosovo, Republic of Moldova, ...}   Cayman Islands  5       0.577973
{Ghana, Kosovo, Republic of Moldova, ...}   Marshall Islands        2       0.57664
{Ghana, Kosovo, Republic of Moldova, ...}   St Vincent and the Grenadines   2       0.57664
{Ghana, Kosovo, Republic of Moldova, ...}   Kiribati        3       0.572096
{Ghana, Kosovo, Republic of Moldova, ...}   Libya   4       0.571817
{Ghana, Kosovo, Republic of Moldova, ...}   Maldives        4       0.570575
{Ghana, Kosovo, Republic of Moldova, ...}   Saint Lucia     5       0.569901
{Ghana, Kosovo, Republic of Moldova, ...}   Yemen   3       0.569429
{Ghana, Kosovo, Republic of Moldova, ...}   Bhutan  3       0.564304
{Ghana, Kosovo, Republic of Moldova, ...}   Congo   3       0.560987
{Ghana, Kosovo, Republic of Moldova, ...}   Equatorial Guinea       3       0.560987
{Ghana, Kosovo, Republic of Moldova, ...}   Virgin Islands, British 3       0.560987
{Ghana, Kosovo, Republic of Moldova, ...}   Chad    3       0.555631
{Ghana, Kosovo, Republic of Moldova, ...}   American Samoa  5       0.548105
{Ghana, Kosovo, Republic of Moldova, ...}   Comoros 3       0.547188
{Ghana, Kosovo, Republic of Moldova, ...}   Gambia  3       0.547188
{Ghana, Kosovo, Republic of Moldova, ...}   Zimbabwe        5       0.543396
{Ghana, Kosovo, Republic of Moldova, ...}   Cyprus  14      0.537432
{Ghana, Kosovo, Republic of Moldova, ...}   Brunei Darussalam       2       0.532671
{Ghana, Kosovo, Republic of Moldova, ...}   Central African Republic        2       0.532671
{Ghana, Kosovo, Republic of Moldova, ...}   Liberia 3       0.506278
{Ghana, Kosovo, Republic of Moldova, ...}   Nauru   2       0.505434
{Ghana, Kosovo, Republic of Moldova, ...}   Somalia 2       0.505434
{Ghana, Kosovo, Republic of Moldova, ...}   Cook Islands    6       0.503043
{Ghana, Kosovo, Republic of Moldova, ...}   Samoa   8       0.502988
{Ghana, Kosovo, Republic of Moldova, ...}   Iraq    4       0.489768
{Ghana, Kosovo, Republic of Moldova, ...}   Dominica        2       0.480526
{Ghana, Kosovo, Republic of Moldova, ...}   Lesotho 2       0.480526
{Ghana, Kosovo, Republic of Moldova, ...}   Mauritania      2       0.480526
{Ghana, Kosovo, Republic of Moldova, ...}   Saint Kitts and Nevis   2       0.480526
{Ghana, Kosovo, Republic of Moldova, ...}   South Sudan     2       0.480526
{Ghana, Kosovo, Republic of Moldova, ...}   Tuvalu  2       0.480526
{Ghana, Kosovo, Republic of Moldova, ...}   United Republic of Tanzania     2       0.480526
{Ghana, Kosovo, Republic of Moldova, ...}   Guinea-Bissau   4       0.469222
{Ghana, Kosovo, Republic of Moldova, ...}   Paraguay        8       0.463871
{Ghana, Kosovo, Republic of Moldova, ...}   Belize  3       0.462988
{Ghana, Kosovo, Republic of Moldova, ...}   Syrian Arab Republic    6       0.462108
{Ghana, Kosovo, Republic of Moldova, ...}   Vanuatu 2       0.459968
{Ghana, Kosovo, Republic of Moldova, ...}   Liechtenstein   5       0.456494
{Ghana, Kosovo, Republic of Moldova, ...}   Togo    4       0.446108
{Ghana, Kosovo, Republic of Moldova, ...}   Senegal 9       0.437299
{Ghana, Kosovo, Republic of Moldova, ...}   Suriname        3       0.436313
{Ghana, Kosovo, Republic of Moldova, ...}   Myanmar 2       0.420465
{Ghana, Kosovo, Republic of Moldova, ...}   Andorra 2       0.399377
{Ghana, Kosovo, Republic of Moldova, ...}   Sao Tome and Principe   3       0.39837
{Ghana, Kosovo, Republic of Moldova, ...}   Bermuda 2       0.340473
{Ghana, Kosovo, Republic of Moldova, ...}    *      0       0
{Jamaica, Ethiopia, Trinidad and Tobago, ...}       Jamaica 60      1
{Jamaica, Ethiopia, Trinidad and Tobago, ...}       Ethiopia        42      0.825223
{Jamaica, Ethiopia, Trinidad and Tobago, ...}       Trinidad and Tobago     31      0.432461
{Jamaica, Ethiopia, Trinidad and Tobago, ...}       Uganda  24      0.430859
{Jamaica, Ethiopia, Trinidad and Tobago, ...}       Bahamas 16      0.357461
{Jamaica, Ethiopia, Trinidad and Tobago, ...}       Botswana        13      0.239674
{Jamaica, Ethiopia, Trinidad and Tobago, ...}       Eritrea 13      0.211894
{Jamaica, Ethiopia, Trinidad and Tobago, ...}       Barbados        7       0.175004
{Kenya, Fiji}       Kenya   78      1
{Kenya, Fiji}       Fiji    28      0.542238
{Uzbekistan, Azerbaijan, Mongolia, ...}     Uzbekistan      63      1
{Uzbekistan, Azerbaijan, Mongolia, ...}     Azerbaijan      41      0.879047
{Uzbekistan, Azerbaijan, Mongolia, ...}     Mongolia        43      0.839974
{Uzbekistan, Azerbaijan, Mongolia, ...}     Georgia 35      0.833922
{Uzbekistan, Azerbaijan, Mongolia, ...}     Cuba    69      0.776003
{Uzbekistan, Azerbaijan, Mongolia, ...}     Bulgaria        41      0.668226
{Uzbekistan, Azerbaijan, Mongolia, ...}     Refugee Olympic Team    26      0.442847
{Uzbekistan, Azerbaijan, Mongolia, ...}     Kyrgyzstan      16      0.43093
{Uzbekistan, Azerbaijan, Mongolia, ...}     Armenia 16      0.428469
{Serbia, Islamic Republic of Iran}  Serbia  83      1
{Serbia, Islamic Republic of Iran}  Islamic Republic of Iran        66      0.926275
{Turkey, Tunisia, Venezuela, ...}   Turkey  102     1
{Turkey, Tunisia, Venezuela, ...}   Tunisia 57      0.859262
{Turkey, Tunisia, Venezuela, ...}   Venezuela       43      0.621774
{Turkey, Tunisia, Venezuela, ...}   Algeria 41      0.464331
{Chinese Taipei, Thailand, Indonesia, ...}  Chinese Taipei  67      1
{Chinese Taipei, Thailand, Indonesia, ...}  Thailand        39      0.785704
{Chinese Taipei, Thailand, Indonesia, ...}  Indonesia       26      0.726278
{Chinese Taipei, Thailand, Indonesia, ...}  Malaysia        29      0.66791
{Chinese Taipei, Thailand, Indonesia, ...}  Vietnam 17      0.42853
{Chinese Taipei, Thailand, Indonesia, ...}  Philippines     18      0.399828
{Switzerland, Austria, Hong Kong, China, ...}       Switzerland     115     1
{Switzerland, Austria, Hong Kong, China, ...}       Austria 72      0.611503
{Switzerland, Austria, Hong Kong, China, ...}       Hong Kong, China        40      0.49501
{Switzerland, Austria, Hong Kong, China, ...}       Estonia 33      0.3928
{Switzerland, Austria, Hong Kong, China, ...}       Singapore       23      0.327429
{Switzerland, Austria, Hong Kong, China, ...}       Luxembourg      11      0.16727
{Switzerland, Austria, Hong Kong, China, ...}       Namibia 11      0.135898
{Colombia, Morocco, Ecuador, ...}   Colombia        64      1
{Colombia, Morocco, Ecuador, ...}   Morocco 48      0.923096
{Colombia, Morocco, Ecuador, ...}   Ecuador 46      0.853756
{Colombia, Morocco, Ecuador, ...}   Finland 45      0.613799
{Colombia, Morocco, Ecuador, ...}   Peru    33      0.576108
{Colombia, Morocco, Ecuador, ...}   Latvia  29      0.473062
{Colombia, Morocco, Ecuador, ...}   Costa Rica      13      0.275235
{Ukraine, Belarus, Slovakia}        Ukraine 152     1
{Ukraine, Belarus, Slovakia}        Belarus 104     0.750629
{Ukraine, Belarus, Slovakia}        Slovakia        38      0.253781
{Kazakhstan, Croatia, Greece}       Kazakhstan      92      1
{Kazakhstan, Croatia, Greece}       Croatia 57      0.795678
{Kazakhstan, Croatia, Greece}       Greece  75      0.585687
{Japan}     Japan   586     1
{Argentina} Argentina       180     1
{Republic of Korea} Republic of Korea       223     1
{Egypt}     Egypt   133     1
{Israel, Dominican Republic}        Israel  85      1
{Israel, Dominican Republic}        Dominican Republic      61      0.982487
{Mexico}    Mexico  155     1
{Zambia, Saudi Arabia, Honduras, ...}       Zambia  29      1
{Zambia, Saudi Arabia, Honduras, ...}       Saudi Arabia    32      0.968523
{Zambia, Saudi Arabia, Honduras, ...}       Honduras        25      0.960536
{Zambia, Saudi Arabia, Honduras, ...}       C�te d'Ivoire   29      0.940942
{Zambia, Saudi Arabia, Honduras, ...}       Chile   56      0.629075
{Romania}   Romania 99      1
{Great Britain, Ireland}    Great Britain   366     1
{Great Britain, Ireland}    Ireland 116     0.578612
{New Zealand}       New Zealand     202     1
{Australia} Australia       470     1
{Canada}    Canada  368     1
{People's Republic of China}        People's Republic of China      401     1
{United States of America}  United States of America        615     1
{Italy}     Italy   356     1
{Poland, Lithuania} Poland  195     1
{Poland, Lithuania} Lithuania       37      0.262821
{Germany, Belgium}  Germany 400     1
{Germany, Belgium}  Belgium 125     0.527562
{India}     India   117     1
{Czech Republic, Nigeria, Slovenia, ...}    Czech Republic  117     1
{Czech Republic, Nigeria, Slovenia, ...}    Nigeria 59      0.958011
{Czech Republic, Nigeria, Slovenia, ...}    Slovenia        51      0.842486
{Czech Republic, Nigeria, Slovenia, ...}    Puerto Rico     35      0.639146
{Spain}     Spain   324     1
{South Africa}      South Africa    171     1
{Netherlands}       Netherlands     274     1
{ROC}       ROC     318     1
{Hungary, Montenegro}       Hungary 155     1
{Hungary, Montenegro}       Montenegro      35      0.49365

Use extract_clusters to extract the discipline clusters and peek its contents

tokyo_discipline_clusters_file = os.path.join(
    "exercises", "Tokyo2021", "CountryClusters.txt"
)

kh.extract_clusters(
    tokyo_cc_report,
    cluster_variable="Discipline",
    clusters_file_path=tokyo_discipline_clusters_file,
)
peek(tokyo_discipline_clusters_file, n=200)
Cluster     Value   Frequency       Typicality
{Handball}  Handball        343     1
{Football}  Football        567     1
{Baseball/Softball} Baseball/Softball       220     1
{Rugby Sevens}      Rugby Sevens    283     1
{Hockey}    Hockey  406     1
{Basketball}        Basketball      280     1
{Athletics} Athletics       2068    1
{Swimming}  Swimming        743     1
{Wrestling, Karate} Wrestling       279     1
{Wrestling, Karate} Karate  77      0.224612
{Boxing, Weightlifting, Taekwondo}  Boxing  270     1
{Boxing, Weightlifting, Taekwondo}  Weightlifting   187     0.761327
{Boxing, Weightlifting, Taekwondo}  Taekwondo       123     0.532844
{Judo}      Judo    373     1
{Volleyball}        Volleyball      274     1
{Fencing, 3x3 Basketball}   Fencing 249     1
{Fencing, 3x3 Basketball}   3x3 Basketball  62      0.260895
{Rowing, Cycling Track}     Rowing  496     1
{Rowing, Cycling Track}     Cycling Track   208     0.493616
{Water Polo}        Water Polo      269     1
{Sailing, Equestrian}       Sailing 336     1
{Sailing, Equestrian}       Equestrian      237     0.805468
{Cycling Road, Triathlon, Cycling Mountain Bike, ...}       Cycling Road    190     1
{Cycling Road, Triathlon, Cycling Mountain Bike, ...}       Triathlon       106     0.645708
{Cycling Road, Triathlon, Cycling Mountain Bike, ...}       Cycling Mountain Bike   74      0.574422
{Cycling Road, Triathlon, Cycling Mountain Bike, ...}       Canoe Slalom    78      0.472006
{Cycling Road, Triathlon, Cycling Mountain Bike, ...}       Marathon Swimming       49      0.36855
{Skateboarding, Beach Volleyball, Cycling BMX Racing, ...}  Skateboarding   77      1
{Skateboarding, Beach Volleyball, Cycling BMX Racing, ...}  Beach Volleyball        90      0.96865
{Skateboarding, Beach Volleyball, Cycling BMX Racing, ...}  Cycling BMX Racing      43      0.695325
{Skateboarding, Beach Volleyball, Cycling BMX Racing, ...}  Surfing 38      0.625796
{Skateboarding, Beach Volleyball, Cycling BMX Racing, ...}  Sport Climbing  37      0.328895
{Skateboarding, Beach Volleyball, Cycling BMX Racing, ...}  Cycling BMX Freestyle   19      0.252887
{Skateboarding, Beach Volleyball, Cycling BMX Racing, ...}   *      0       0
{Badminton, Golf, Diving}   Badminton       164     1
{Badminton, Golf, Diving}   Golf    115     0.737186
{Badminton, Golf, Diving}   Diving  133     0.595778
{Shooting, Archery} Shooting        342     1
{Shooting, Archery} Archery 122     0.418538
{Table Tennis, Tennis, Artistic Gymnastics} Table Tennis    164     1
{Table Tennis, Tennis, Artistic Gymnastics} Tennis  178     0.946405
{Table Tennis, Tennis, Artistic Gymnastics} Artistic Gymnastics     187     0.880643
{Canoe Sprint, Modern Pentathlon}   Canoe Sprint    236     1
{Canoe Sprint, Modern Pentathlon}   Modern Pentathlon       69      0.259697
{Rhythmic Gymnastics, Artistic Swimming, Trampoline Gymnastics}     Rhythmic Gymnastics     95      1
{Rhythmic Gymnastics, Artistic Swimming, Trampoline Gymnastics}     Artistic Swimming       98      0.900649
{Rhythmic Gymnastics, Artistic Swimming, Trampoline Gymnastics}     Trampoline Gymnastics   31      0.278322