Core Basics 4: Train a Coclustering¶
The steps to train a coclustering model with Khiops are very similar to what we have already seen in the basic classifier tutorials.
Make sure you have installed Khiops and Khiops CoVisualization.
We start by importing Khiops, checking its installation and defining some helper functions:
import os
import platform
import subprocess
from khiops import core as kh
# Define helper functions
def peek(file_path, n=10):
"""Shows the first n lines of a file"""
with open(file_path, encoding="utf8", errors="replace") as file:
for line in file.readlines()[:n]:
print(line, end="")
print("")
# If there are any issues you may Khiops status with the following command
# kh.get_runner().print_status()
As stated before, sometimes it is better to have a more adapted
visualization for an unsupervised analysis. We illustrate this point
with the dataset CountriesByOrganization
that contains the relation
country-organization for a large number of organizations and countries
(it is bit outdated though)
countries_kdic = os.path.join(
"data", "CountriesByOrganization", "CountriesByOrganization.kdic"
)
countries_data_file = os.path.join(
"data", "CountriesByOrganization", "CountriesByOrganization.csv"
)
print(f"CountriesByOrganization dictionary file location: {countries_kdic}")
print("")
peek(countries_kdic)
print(f"CountriesByOrganization data table file location: {countries_data_file}")
print("")
peek(countries_data_file)
CountriesByOrganization dictionary file location: data/CountriesByOrganization/CountriesByOrganization.kdic
Dictionary CountriesByOrganization
{
Categorical Country;
Categorical Organization;
};
CountriesByOrganization data table file location: data/CountriesByOrganization/CountriesByOrganization.csv
Country;Organization
Afghanistan;AsDB
Afghanistan;COLOMBO
Afghanistan;ECO
Afghanistan;ICCROM
Afghanistan;NAM
Afghanistan;PIARC
Afghanistan;SAARC
Afghanistan;WHO
Afghanistan;UN
We now create a coclustering model for this dataset
countries_results_dir = os.path.join("exercises", "CountriesByOrganization")
countries_cc_report = kh.train_coclustering(
countries_kdic,
dictionary_name="CountriesByOrganization",
data_table_path=countries_data_file,
coclustering_variables=["Country", "Organization"],
results_dir=countries_results_dir,
field_separator=";",
)
We can now browse the results with the Khiops Covisualization app:
# To visualize uncomment the line below
# kh.visualize_report(countries_cc_report)
We can now dump the country clusters and its metrics to a file with the
extract_clusters
function
country_clusters_file = os.path.join(
"exercises", "CountriesByOrganization", "CountryClusters.txt"
)
kh.extract_clusters(
countries_cc_report,
cluster_variable="Country",
clusters_file_path=country_clusters_file,
)
peek(country_clusters_file, n=100)
Cluster Value Frequency Typicality
{Germany, France, Netherlands, ...} Germany 106 1
{Germany, France, Netherlands, ...} France 125 0.979874
{Germany, France, Netherlands, ...} Netherlands 105 0.957798
{Germany, France, Netherlands, ...} Denmark 101 0.951486
{Germany, France, Netherlands, ...} Sweden 102 0.945064
{Germany, France, Netherlands, ...} Belgium 104 0.918546
{Germany, France, Netherlands, ...} Finland 100 0.890127
{Germany, France, Netherlands, ...} Italy 105 0.882327
{Germany, France, Netherlands, ...} Norway 96 0.875153
{Germany, France, Netherlands, ...} Spain 103 0.857387
{Germany, France, Netherlands, ...} Portugal 94 0.760038
{Germany, France, Netherlands, ...} Austria 88 0.757942
{Germany, France, Netherlands, ...} United Kingdom 102 0.749886
{Germany, France, Netherlands, ...} Luxembourg 81 0.735935
{Germany, France, Netherlands, ...} Switzerland 90 0.734097
{Germany, France, Netherlands, ...} Greece 87 0.697847
{Germany, France, Netherlands, ...} Ireland 75 0.641185
{Germany, France, Netherlands, ...} Iceland 55 0.434573
{United States of America, Canada, Japan, ...} United States of America 92 1
{United States of America, Canada, Japan, ...} Canada 85 0.81184
{United States of America, Canada, Japan, ...} Japan 81 0.750277
{United States of America, Canada, Japan, ...} Australia 75 0.745923
{United States of America, Canada, Japan, ...} New Zealand 60 0.535378
{United States of America, Canada, Japan, ...} South Korea 69 0.507395
{United States of America, Canada, Japan, ...} Taiwan 7 0.111344
{United States of America, Canada, Japan, ...} * 0 0
{Poland, Hungary, Turkey, ...} Poland 79 1
{Poland, Hungary, Turkey, ...} Hungary 72 0.901571
{Poland, Hungary, Turkey, ...} Turkey 78 0.896009
{Poland, Hungary, Turkey, ...} Czech Republic 64 0.87056
{Poland, Hungary, Turkey, ...} Russia 80 0.848669
{Poland, Hungary, Turkey, ...} Bulgaria 70 0.846619
{Poland, Hungary, Turkey, ...} Romania 69 0.846295
{Poland, Hungary, Turkey, ...} Slovakia 58 0.789467
{Poland, Hungary, Turkey, ...} Slovenia 56 0.719738
{Poland, Hungary, Turkey, ...} Ukraine 53 0.692657
{Poland, Hungary, Turkey, ...} Croatia 57 0.675641
{Poland, Hungary, Turkey, ...} Estonia 46 0.620249
{Poland, Hungary, Turkey, ...} Latvia 45 0.609433
{Poland, Hungary, Turkey, ...} Lithuania 43 0.544669
{Poland, Hungary, Turkey, ...} Albania 47 0.461515
{Poland, Hungary, Turkey, ...} Cyprus 62 0.45359
{Poland, Hungary, Turkey, ...} Serbia 42 0.451178
{Poland, Hungary, Turkey, ...} Israel 57 0.444308
{Poland, Hungary, Turkey, ...} Macedonia 39 0.431943
{Poland, Hungary, Turkey, ...} Malta 52 0.411462
{Poland, Hungary, Turkey, ...} Liechtenstein 20 0.332344
{Poland, Hungary, Turkey, ...} Monaco 32 0.317019
{Poland, Hungary, Turkey, ...} Bosnia and Herzegovina 33 0.28569
{Poland, Hungary, Turkey, ...} San Marino 17 0.157491
{Poland, Hungary, Turkey, ...} Andorra 13 0.152848
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...} Kazakhstan 47 1
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...} Kyrgyzstan 45 0.952391
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...} Azerbaijan 41 0.88496
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...} Uzbekistan 41 0.876753
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...} Moldova 47 0.862644
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...} Tajikistan 35 0.807963
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...} Turkmenistan 35 0.803721
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...} Belarus 38 0.751301
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...} Georgia 42 0.741386
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...} Armenia 37 0.722033
{Kazakhstan, Kyrgyzstan, Azerbaijan, ...} Mongolia 36 0.357377
{Venezuela, Nicaragua, Ecuador, ...} Venezuela 87 1
{Venezuela, Nicaragua, Ecuador, ...} Nicaragua 73 0.966554
{Venezuela, Nicaragua, Ecuador, ...} Ecuador 79 0.947947
{Venezuela, Nicaragua, Ecuador, ...} Costa Rica 74 0.937068
{Venezuela, Nicaragua, Ecuador, ...} Colombia 83 0.936774
{Venezuela, Nicaragua, Ecuador, ...} Bolivia 72 0.909502
{Venezuela, Nicaragua, Ecuador, ...} Guatemala 71 0.909374
{Venezuela, Nicaragua, Ecuador, ...} Mexico 87 0.908228
{Venezuela, Nicaragua, Ecuador, ...} Panama 72 0.904032
{Venezuela, Nicaragua, Ecuador, ...} Peru 79 0.88861
{Venezuela, Nicaragua, Ecuador, ...} Brazil 86 0.871975
{Venezuela, Nicaragua, Ecuador, ...} Argentina 84 0.856383
{Venezuela, Nicaragua, Ecuador, ...} Honduras 67 0.850245
{Venezuela, Nicaragua, Ecuador, ...} El Salvador 64 0.841953
{Venezuela, Nicaragua, Ecuador, ...} Uruguay 72 0.825788
{Venezuela, Nicaragua, Ecuador, ...} Chile 80 0.824771
{Venezuela, Nicaragua, Ecuador, ...} Paraguay 65 0.79883
{Venezuela, Nicaragua, Ecuador, ...} Dominican Republic 67 0.735771
{Venezuela, Nicaragua, Ecuador, ...} Cuba 63 0.527546
{Venezuela, Nicaragua, Ecuador, ...} Haiti 62 0.502219
{Trinidad and Tobago, Barbados, Grenada, ...} Trinidad and Tobago 63 1
{Trinidad and Tobago, Barbados, Grenada, ...} Barbados 56 0.991871
{Trinidad and Tobago, Barbados, Grenada, ...} Grenada 50 0.920456
{Trinidad and Tobago, Barbados, Grenada, ...} Jamaica 63 0.906127
{Trinidad and Tobago, Barbados, Grenada, ...} Belize 51 0.835825
{Trinidad and Tobago, Barbados, Grenada, ...} Guyana 56 0.811175
{Trinidad and Tobago, Barbados, Grenada, ...} Dominica 47 0.808271
{Trinidad and Tobago, Barbados, Grenada, ...} Antigua and Barbuda 43 0.806807
{Trinidad and Tobago, Barbados, Grenada, ...} Saint Lucia 46 0.777665
{Trinidad and Tobago, Barbados, Grenada, ...} Saint Vincent and the Grenadines 41 0.771643
{Trinidad and Tobago, Barbados, Grenada, ...} The Bahamas 49 0.747276
{Trinidad and Tobago, Barbados, Grenada, ...} Suriname 48 0.694766
{Trinidad and Tobago, Barbados, Grenada, ...} Saint Kitts and Nevis 36 0.689034
{Niger, Ivory Coast, Benin, ...} Niger 66 1
{Niger, Ivory Coast, Benin, ...} Ivory Coast 83 0.991187
{Niger, Ivory Coast, Benin, ...} Benin 76 0.985146
{Niger, Ivory Coast, Benin, ...} Burkina Faso 75 0.98505
Exercise¶
We’ll build a coclustering for the Tokyo2021
dataset. It is
extracted for the Athletes
table of the Tokyo 2021 Kaggle
dataset
and each record contains three variables: - Name
: the name of a
competing athlete - Country
: the country (or organization) it
represents - Discipline
: the athletes discipline
The idea for this exercise is to make a coclustering between Country
and Discipline
and see which countries resemble the most in terms of
the athletes they bring to the Olympics.
We start by saving the dataset dictionary file and data table location into variables:
tokyo_kdic = os.path.join("data", "Tokyo2021", "Athletes.kdic")
tokyo_data_file = os.path.join("data", "Tokyo2021", "Athletes.csv")
tokyo_results_dir = os.path.join("exercises", "Tokyo2021")
peek
the contents of the dictionary and data files¶
print(f"Tokyo2021 dictionary file: {tokyo_kdic}")
print("")
peek(tokyo_kdic, n=15)
print(f"Tokyo data table file: {tokyo_data_file}")
print("")
peek(tokyo_data_file)
Tokyo2021 dictionary file: data/Tokyo2021/Athletes.kdic
Dictionary Athletes
{
Categorical Name;
Categorical Country;
Categorical Discipline;
};
Tokyo data table file: data/Tokyo2021/Athletes.csv
Name,Country,Discipline
AALERUD Katrine,Norway,Cycling Road
ABAD Nestor,Spain,Artistic Gymnastics
ABAGNALE Giovanni,Italy,Rowing
ABALDE Alberto,Spain,Basketball
ABALDE Tamara,Spain,Basketball
ABALO Luc,France,Handball
ABAROA Cesar,Chile,Rowing
ABASS Abobakr,Sudan,Swimming
ABBASALI Hamideh,Islamic Republic of Iran,Karate
Train the coclustering for the variables Country
and Discipline
¶
Do not forget that the separator is ,
tokyo_cc_report = kh.train_coclustering(
tokyo_kdic,
dictionary_name="Athletes",
coclustering_variables=["Country", "Discipline"],
data_table_path=tokyo_data_file,
results_dir=tokyo_results_dir,
field_separator=",",
)
You may see the coclustering with the covisualization app:
# To visualize uncomment the line below
# kh.visualize_report(tokyo_cc_report)
Use extract_clusters
to extract the country clusters and peek
its contents¶
tokyo_country_clusters_file = os.path.join(
"exercises", "Tokyo2021", "CountryClusters.txt"
)
kh.extract_clusters(
tokyo_cc_report,
cluster_variable="Country",
clusters_file_path=tokyo_country_clusters_file,
)
peek(tokyo_country_clusters_file, n=200)
Cluster Value Frequency Typicality
{Ghana, Kosovo, Republic of Moldova, ...} Ghana 14 1
{Ghana, Kosovo, Republic of Moldova, ...} Kosovo 10 0.899515
{Ghana, Kosovo, Republic of Moldova, ...} Republic of Moldova 19 0.896598
{Ghana, Kosovo, Republic of Moldova, ...} Tajikistan 8 0.880054
{Ghana, Kosovo, Republic of Moldova, ...} Cameroon 11 0.879744
{Ghana, Kosovo, Republic of Moldova, ...} Jordan 11 0.853091
{Ghana, Kosovo, Republic of Moldova, ...} Turkmenistan 8 0.84905
{Ghana, Kosovo, Republic of Moldova, ...} Guatemala 22 0.828027
{Ghana, Kosovo, Republic of Moldova, ...} Pakistan 10 0.80784
{Ghana, Kosovo, Republic of Moldova, ...} Niger 7 0.804549
{Ghana, Kosovo, Republic of Moldova, ...} Bosnia and Herzegovina 7 0.796023
{Ghana, Kosovo, Republic of Moldova, ...} Haiti 6 0.774886
{Ghana, Kosovo, Republic of Moldova, ...} Madagascar 6 0.774886
{Ghana, Kosovo, Republic of Moldova, ...} Lebanon 6 0.76636
{Ghana, Kosovo, Republic of Moldova, ...} Albania 8 0.755625
{Ghana, Kosovo, Republic of Moldova, ...} Panama 9 0.739769
{Ghana, Kosovo, Republic of Moldova, ...} Gabon 5 0.72859
{Ghana, Kosovo, Republic of Moldova, ...} North Macedonia 8 0.719399
{Ghana, Kosovo, Republic of Moldova, ...} Democratic Republic of the Congo 7 0.719013
{Ghana, Kosovo, Republic of Moldova, ...} Burundi 6 0.717007
{Ghana, Kosovo, Republic of Moldova, ...} Malawi 5 0.712125
{Ghana, Kosovo, Republic of Moldova, ...} Nepal 5 0.712125
{Ghana, Kosovo, Republic of Moldova, ...} Tonga 5 0.707212
{Ghana, Kosovo, Republic of Moldova, ...} Burkina Faso 7 0.706431
{Ghana, Kosovo, Republic of Moldova, ...} Benin 7 0.703938
{Ghana, Kosovo, Republic of Moldova, ...} Mauritius 7 0.70232
{Ghana, Kosovo, Republic of Moldova, ...} Nicaragua 8 0.700832
{Ghana, Kosovo, Republic of Moldova, ...} Qatar 14 0.695168
{Ghana, Kosovo, Republic of Moldova, ...} Kuwait 10 0.692288
{Ghana, Kosovo, Republic of Moldova, ...} Bangladesh 6 0.689707
{Ghana, Kosovo, Republic of Moldova, ...} Grenada 6 0.68711
{Ghana, Kosovo, Republic of Moldova, ...} Antigua and Barbuda 6 0.677322
{Ghana, Kosovo, Republic of Moldova, ...} Lao People's Democratic Republic 4 0.67469
{Ghana, Kosovo, Republic of Moldova, ...} Sierra Leone 4 0.67469
{Ghana, Kosovo, Republic of Moldova, ...} Papua New Guinea 7 0.670767
{Ghana, Kosovo, Republic of Moldova, ...} Malta 6 0.662244
{Ghana, Kosovo, Republic of Moldova, ...} Seychelles 5 0.661433
{Ghana, Kosovo, Republic of Moldova, ...} Eswatini 4 0.661251
{Ghana, Kosovo, Republic of Moldova, ...} United Arab Emirates 4 0.660965
{Ghana, Kosovo, Republic of Moldova, ...} Guam 5 0.660256
{Ghana, Kosovo, Republic of Moldova, ...} Guinea 5 0.660256
{Ghana, Kosovo, Republic of Moldova, ...} Mozambique 8 0.659989
{Ghana, Kosovo, Republic of Moldova, ...} Guyana 7 0.659306
{Ghana, Kosovo, Republic of Moldova, ...} Cape Verde 6 0.658089
{Ghana, Kosovo, Republic of Moldova, ...} Palestine 4 0.652895
{Ghana, Kosovo, Republic of Moldova, ...} Afghanistan 5 0.651983
{Ghana, Kosovo, Republic of Moldova, ...} Oman 5 0.651983
{Ghana, Kosovo, Republic of Moldova, ...} El Salvador 5 0.647994
{Ghana, Kosovo, Republic of Moldova, ...} Sudan 5 0.647511
{Ghana, Kosovo, Republic of Moldova, ...} Iceland 4 0.644787
{Ghana, Kosovo, Republic of Moldova, ...} Virgin Islands, US 4 0.644787
{Ghana, Kosovo, Republic of Moldova, ...} Sri Lanka 9 0.63606
{Ghana, Kosovo, Republic of Moldova, ...} Djibouti 4 0.627987
{Ghana, Kosovo, Republic of Moldova, ...} Mali 4 0.614548
{Ghana, Kosovo, Republic of Moldova, ...} Aruba 3 0.613398
{Ghana, Kosovo, Republic of Moldova, ...} Cambodia 3 0.60769
{Ghana, Kosovo, Republic of Moldova, ...} Democratic Republic of Timor-Leste 3 0.60769
{Ghana, Kosovo, Republic of Moldova, ...} Federated States of Micronesia 3 0.60769
{Ghana, Kosovo, Republic of Moldova, ...} Palau 3 0.60769
{Ghana, Kosovo, Republic of Moldova, ...} Uruguay 11 0.59931
{Ghana, Kosovo, Republic of Moldova, ...} San Marino 4 0.59027
{Ghana, Kosovo, Republic of Moldova, ...} Monaco 6 0.589476
{Ghana, Kosovo, Republic of Moldova, ...} Solomon Islands 3 0.585894
{Ghana, Kosovo, Republic of Moldova, ...} Rwanda 5 0.584187
{Ghana, Kosovo, Republic of Moldova, ...} Bolivia 5 0.577973
{Ghana, Kosovo, Republic of Moldova, ...} Cayman Islands 5 0.577973
{Ghana, Kosovo, Republic of Moldova, ...} Marshall Islands 2 0.57664
{Ghana, Kosovo, Republic of Moldova, ...} St Vincent and the Grenadines 2 0.57664
{Ghana, Kosovo, Republic of Moldova, ...} Kiribati 3 0.572096
{Ghana, Kosovo, Republic of Moldova, ...} Libya 4 0.571817
{Ghana, Kosovo, Republic of Moldova, ...} Maldives 4 0.570575
{Ghana, Kosovo, Republic of Moldova, ...} Saint Lucia 5 0.569901
{Ghana, Kosovo, Republic of Moldova, ...} Yemen 3 0.569429
{Ghana, Kosovo, Republic of Moldova, ...} Bhutan 3 0.564304
{Ghana, Kosovo, Republic of Moldova, ...} Congo 3 0.560987
{Ghana, Kosovo, Republic of Moldova, ...} Equatorial Guinea 3 0.560987
{Ghana, Kosovo, Republic of Moldova, ...} Virgin Islands, British 3 0.560987
{Ghana, Kosovo, Republic of Moldova, ...} Chad 3 0.555631
{Ghana, Kosovo, Republic of Moldova, ...} American Samoa 5 0.548105
{Ghana, Kosovo, Republic of Moldova, ...} Comoros 3 0.547188
{Ghana, Kosovo, Republic of Moldova, ...} Gambia 3 0.547188
{Ghana, Kosovo, Republic of Moldova, ...} Zimbabwe 5 0.543396
{Ghana, Kosovo, Republic of Moldova, ...} Cyprus 14 0.537432
{Ghana, Kosovo, Republic of Moldova, ...} Brunei Darussalam 2 0.532671
{Ghana, Kosovo, Republic of Moldova, ...} Central African Republic 2 0.532671
{Ghana, Kosovo, Republic of Moldova, ...} Liberia 3 0.506278
{Ghana, Kosovo, Republic of Moldova, ...} Nauru 2 0.505434
{Ghana, Kosovo, Republic of Moldova, ...} Somalia 2 0.505434
{Ghana, Kosovo, Republic of Moldova, ...} Cook Islands 6 0.503043
{Ghana, Kosovo, Republic of Moldova, ...} Samoa 8 0.502988
{Ghana, Kosovo, Republic of Moldova, ...} Iraq 4 0.489768
{Ghana, Kosovo, Republic of Moldova, ...} Dominica 2 0.480526
{Ghana, Kosovo, Republic of Moldova, ...} Lesotho 2 0.480526
{Ghana, Kosovo, Republic of Moldova, ...} Mauritania 2 0.480526
{Ghana, Kosovo, Republic of Moldova, ...} Saint Kitts and Nevis 2 0.480526
{Ghana, Kosovo, Republic of Moldova, ...} South Sudan 2 0.480526
{Ghana, Kosovo, Republic of Moldova, ...} Tuvalu 2 0.480526
{Ghana, Kosovo, Republic of Moldova, ...} United Republic of Tanzania 2 0.480526
{Ghana, Kosovo, Republic of Moldova, ...} Guinea-Bissau 4 0.469222
{Ghana, Kosovo, Republic of Moldova, ...} Paraguay 8 0.463871
{Ghana, Kosovo, Republic of Moldova, ...} Belize 3 0.462988
{Ghana, Kosovo, Republic of Moldova, ...} Syrian Arab Republic 6 0.462108
{Ghana, Kosovo, Republic of Moldova, ...} Vanuatu 2 0.459968
{Ghana, Kosovo, Republic of Moldova, ...} Liechtenstein 5 0.456494
{Ghana, Kosovo, Republic of Moldova, ...} Togo 4 0.446108
{Ghana, Kosovo, Republic of Moldova, ...} Senegal 9 0.437299
{Ghana, Kosovo, Republic of Moldova, ...} Suriname 3 0.436313
{Ghana, Kosovo, Republic of Moldova, ...} Myanmar 2 0.420465
{Ghana, Kosovo, Republic of Moldova, ...} Andorra 2 0.399377
{Ghana, Kosovo, Republic of Moldova, ...} Sao Tome and Principe 3 0.39837
{Ghana, Kosovo, Republic of Moldova, ...} Bermuda 2 0.340473
{Ghana, Kosovo, Republic of Moldova, ...} * 0 0
{Jamaica, Ethiopia, Trinidad and Tobago, ...} Jamaica 60 1
{Jamaica, Ethiopia, Trinidad and Tobago, ...} Ethiopia 42 0.825223
{Jamaica, Ethiopia, Trinidad and Tobago, ...} Trinidad and Tobago 31 0.432461
{Jamaica, Ethiopia, Trinidad and Tobago, ...} Uganda 24 0.430859
{Jamaica, Ethiopia, Trinidad and Tobago, ...} Bahamas 16 0.357461
{Jamaica, Ethiopia, Trinidad and Tobago, ...} Botswana 13 0.239674
{Jamaica, Ethiopia, Trinidad and Tobago, ...} Eritrea 13 0.211894
{Jamaica, Ethiopia, Trinidad and Tobago, ...} Barbados 7 0.175004
{Kenya, Fiji} Kenya 78 1
{Kenya, Fiji} Fiji 28 0.542238
{Uzbekistan, Azerbaijan, Mongolia, ...} Uzbekistan 63 1
{Uzbekistan, Azerbaijan, Mongolia, ...} Azerbaijan 41 0.879047
{Uzbekistan, Azerbaijan, Mongolia, ...} Mongolia 43 0.839974
{Uzbekistan, Azerbaijan, Mongolia, ...} Georgia 35 0.833922
{Uzbekistan, Azerbaijan, Mongolia, ...} Cuba 69 0.776003
{Uzbekistan, Azerbaijan, Mongolia, ...} Bulgaria 41 0.668226
{Uzbekistan, Azerbaijan, Mongolia, ...} Refugee Olympic Team 26 0.442847
{Uzbekistan, Azerbaijan, Mongolia, ...} Kyrgyzstan 16 0.43093
{Uzbekistan, Azerbaijan, Mongolia, ...} Armenia 16 0.428469
{Serbia, Islamic Republic of Iran} Serbia 83 1
{Serbia, Islamic Republic of Iran} Islamic Republic of Iran 66 0.926275
{Turkey, Tunisia, Venezuela, ...} Turkey 102 1
{Turkey, Tunisia, Venezuela, ...} Tunisia 57 0.859262
{Turkey, Tunisia, Venezuela, ...} Venezuela 43 0.621774
{Turkey, Tunisia, Venezuela, ...} Algeria 41 0.464331
{Chinese Taipei, Thailand, Indonesia, ...} Chinese Taipei 67 1
{Chinese Taipei, Thailand, Indonesia, ...} Thailand 39 0.785704
{Chinese Taipei, Thailand, Indonesia, ...} Indonesia 26 0.726278
{Chinese Taipei, Thailand, Indonesia, ...} Malaysia 29 0.66791
{Chinese Taipei, Thailand, Indonesia, ...} Vietnam 17 0.42853
{Chinese Taipei, Thailand, Indonesia, ...} Philippines 18 0.399828
{Switzerland, Austria, Hong Kong, China, ...} Switzerland 115 1
{Switzerland, Austria, Hong Kong, China, ...} Austria 72 0.611503
{Switzerland, Austria, Hong Kong, China, ...} Hong Kong, China 40 0.49501
{Switzerland, Austria, Hong Kong, China, ...} Estonia 33 0.3928
{Switzerland, Austria, Hong Kong, China, ...} Singapore 23 0.327429
{Switzerland, Austria, Hong Kong, China, ...} Luxembourg 11 0.16727
{Switzerland, Austria, Hong Kong, China, ...} Namibia 11 0.135898
{Colombia, Morocco, Ecuador, ...} Colombia 64 1
{Colombia, Morocco, Ecuador, ...} Morocco 48 0.923096
{Colombia, Morocco, Ecuador, ...} Ecuador 46 0.853756
{Colombia, Morocco, Ecuador, ...} Finland 45 0.613799
{Colombia, Morocco, Ecuador, ...} Peru 33 0.576108
{Colombia, Morocco, Ecuador, ...} Latvia 29 0.473062
{Colombia, Morocco, Ecuador, ...} Costa Rica 13 0.275235
{Ukraine, Belarus, Slovakia} Ukraine 152 1
{Ukraine, Belarus, Slovakia} Belarus 104 0.750629
{Ukraine, Belarus, Slovakia} Slovakia 38 0.253781
{Kazakhstan, Croatia, Greece} Kazakhstan 92 1
{Kazakhstan, Croatia, Greece} Croatia 57 0.795678
{Kazakhstan, Croatia, Greece} Greece 75 0.585687
{Japan} Japan 586 1
{Argentina} Argentina 180 1
{Republic of Korea} Republic of Korea 223 1
{Egypt} Egypt 133 1
{Israel, Dominican Republic} Israel 85 1
{Israel, Dominican Republic} Dominican Republic 61 0.982487
{Mexico} Mexico 155 1
{Zambia, Saudi Arabia, Honduras, ...} Zambia 29 1
{Zambia, Saudi Arabia, Honduras, ...} Saudi Arabia 32 0.968523
{Zambia, Saudi Arabia, Honduras, ...} Honduras 25 0.960536
{Zambia, Saudi Arabia, Honduras, ...} C�te d'Ivoire 29 0.940942
{Zambia, Saudi Arabia, Honduras, ...} Chile 56 0.629075
{Romania} Romania 99 1
{Great Britain, Ireland} Great Britain 366 1
{Great Britain, Ireland} Ireland 116 0.578612
{New Zealand} New Zealand 202 1
{Australia} Australia 470 1
{Canada} Canada 368 1
{People's Republic of China} People's Republic of China 401 1
{United States of America} United States of America 615 1
{Italy} Italy 356 1
{Poland, Lithuania} Poland 195 1
{Poland, Lithuania} Lithuania 37 0.262821
{Germany, Belgium} Germany 400 1
{Germany, Belgium} Belgium 125 0.527562
{India} India 117 1
{Czech Republic, Nigeria, Slovenia, ...} Czech Republic 117 1
{Czech Republic, Nigeria, Slovenia, ...} Nigeria 59 0.958011
{Czech Republic, Nigeria, Slovenia, ...} Slovenia 51 0.842486
{Czech Republic, Nigeria, Slovenia, ...} Puerto Rico 35 0.639146
{Spain} Spain 324 1
{South Africa} South Africa 171 1
{Netherlands} Netherlands 274 1
{ROC} ROC 318 1
{Hungary, Montenegro} Hungary 155 1
{Hungary, Montenegro} Montenegro 35 0.49365
Use extract_clusters
to extract the discipline clusters and peek
its contents¶
tokyo_discipline_clusters_file = os.path.join(
"exercises", "Tokyo2021", "CountryClusters.txt"
)
kh.extract_clusters(
tokyo_cc_report,
cluster_variable="Discipline",
clusters_file_path=tokyo_discipline_clusters_file,
)
peek(tokyo_discipline_clusters_file, n=200)
Cluster Value Frequency Typicality
{Handball} Handball 343 1
{Football} Football 567 1
{Baseball/Softball} Baseball/Softball 220 1
{Rugby Sevens} Rugby Sevens 283 1
{Hockey} Hockey 406 1
{Basketball} Basketball 280 1
{Athletics} Athletics 2068 1
{Swimming} Swimming 743 1
{Wrestling, Karate} Wrestling 279 1
{Wrestling, Karate} Karate 77 0.224612
{Boxing, Weightlifting, Taekwondo} Boxing 270 1
{Boxing, Weightlifting, Taekwondo} Weightlifting 187 0.761327
{Boxing, Weightlifting, Taekwondo} Taekwondo 123 0.532844
{Judo} Judo 373 1
{Volleyball} Volleyball 274 1
{Fencing, 3x3 Basketball} Fencing 249 1
{Fencing, 3x3 Basketball} 3x3 Basketball 62 0.260895
{Rowing, Cycling Track} Rowing 496 1
{Rowing, Cycling Track} Cycling Track 208 0.493616
{Water Polo} Water Polo 269 1
{Sailing, Equestrian} Sailing 336 1
{Sailing, Equestrian} Equestrian 237 0.805468
{Cycling Road, Triathlon, Cycling Mountain Bike, ...} Cycling Road 190 1
{Cycling Road, Triathlon, Cycling Mountain Bike, ...} Triathlon 106 0.645708
{Cycling Road, Triathlon, Cycling Mountain Bike, ...} Cycling Mountain Bike 74 0.574422
{Cycling Road, Triathlon, Cycling Mountain Bike, ...} Canoe Slalom 78 0.472006
{Cycling Road, Triathlon, Cycling Mountain Bike, ...} Marathon Swimming 49 0.36855
{Skateboarding, Beach Volleyball, Cycling BMX Racing, ...} Skateboarding 77 1
{Skateboarding, Beach Volleyball, Cycling BMX Racing, ...} Beach Volleyball 90 0.96865
{Skateboarding, Beach Volleyball, Cycling BMX Racing, ...} Cycling BMX Racing 43 0.695325
{Skateboarding, Beach Volleyball, Cycling BMX Racing, ...} Surfing 38 0.625796
{Skateboarding, Beach Volleyball, Cycling BMX Racing, ...} Sport Climbing 37 0.328895
{Skateboarding, Beach Volleyball, Cycling BMX Racing, ...} Cycling BMX Freestyle 19 0.252887
{Skateboarding, Beach Volleyball, Cycling BMX Racing, ...} * 0 0
{Badminton, Golf, Diving} Badminton 164 1
{Badminton, Golf, Diving} Golf 115 0.737186
{Badminton, Golf, Diving} Diving 133 0.595778
{Shooting, Archery} Shooting 342 1
{Shooting, Archery} Archery 122 0.418538
{Table Tennis, Tennis, Artistic Gymnastics} Table Tennis 164 1
{Table Tennis, Tennis, Artistic Gymnastics} Tennis 178 0.946405
{Table Tennis, Tennis, Artistic Gymnastics} Artistic Gymnastics 187 0.880643
{Canoe Sprint, Modern Pentathlon} Canoe Sprint 236 1
{Canoe Sprint, Modern Pentathlon} Modern Pentathlon 69 0.259697
{Rhythmic Gymnastics, Artistic Swimming, Trampoline Gymnastics} Rhythmic Gymnastics 95 1
{Rhythmic Gymnastics, Artistic Swimming, Trampoline Gymnastics} Artistic Swimming 98 0.900649
{Rhythmic Gymnastics, Artistic Swimming, Trampoline Gymnastics} Trampoline Gymnastics 31 0.278322