config

poi_interlinking.config.sort_thres = 0.55

Similarity threshold on whether sorting on toponym tokens is applied or not. It is triggered on a score below the assigned threshold.

Type

float

poi_interlinking.config.seed_no = 13

Seed used by each of the random number generators.

Type

int

class poi_interlinking.config.MLConf[source]

This class initializes parameters that correspond to the machine learning part of the framework.

These variables define the parameter grid for GridSearchCV:

Variables
  • SVM_hyperparameters (list) – Defines the search space for SVM.

  • MLP_hyperparameters (dict) – Defines the search space for MLP.

  • DecisionTree_hyperparameters (dict) – Defines the search space for Decision Trees.

  • RandomForest_hyperparameters (dict) – Defines the search space for Random Forests and Extra-Trees.

  • XGBoost_hyperparameters (dict) – Defines the search space for XGBoost.

These variables define the parameter grid for RandomizedSearchCV where continuous distributions are used for continuous parameters (whenever this is feasible):

Variables
  • SVM_hyperparameters_dist (dict) – Defines the search space for SVM.

  • MLP_hyperparameters_dist (dict) – Defines the search space for MLP.

  • DecisionTree_hyperparameters_dist (dict) – Defines the search space for Decision Trees.

  • RandomForest_hyperparameters_dist (dict) – Defines the search space for Random Forests and Extra-Trees.

  • XGBoost_hyperparameters_dist (dict) – Defines the search space for XGBoost.

kfold_no = 5

The number of outer folds that splits the dataset for the k-fold cross-validation.

Type

int

kfold_inner_parameter = 4

The number of inner folds that splits the dataset for the k-fold cross-validation.

Type

int

n_jobs = 4

Number of parallel jobs to be initiated. -1 means to utilize all available processors.

Type

int

classification_method = 'lgm'

The classification group of features to use. (basic | basic_sorted | lgm).

See also

Features

Details on the supported groups.

Type

str

hyperparams_search_method = 'grid'

Search Method to use for finding best hyperparameters. (randomized | grid).

See also

fineTuneClassifiers()

Details on the supported methods.

Type

str

max_iter = 300

Number of iterations that RandomizedSearchCV should execute. It applies only when hyperparams_search_method equals to ‘randomized’.

Type

int

max_features_to_show = 10

Number of ranked features to print

Type

int

classifiers = ['RandomForest']

Define the classifiers to apply on code execution. Accepted values are:

  • SVM

  • DecisionTree

  • RandomForest

  • ExtraTrees

  • XGBoost

  • MLP.

Type

list of str

score = 'roc_auc'

The metric to optimize on hyper-parameter tuning. Possible valid values presented on Scikit predefined values.

Type

str

Return Home