config¶
poi_interlinking.config.
sort_thres
= 0.55¶Similarity threshold on whether sorting on toponym tokens is applied or not. It is triggered on a score below the assigned threshold.
- Type
float
poi_interlinking.config.
seed_no
= 13¶Seed used by each of the random number generators.
- Type
int
- class
poi_interlinking.config.
MLConf
[source]¶This class initializes parameters that correspond to the machine learning part of the framework.
These variables define the parameter grid for GridSearchCV:
- Variables
SVM_hyperparameters (
list
) – Defines the search space for SVM.MLP_hyperparameters (
dict
) – Defines the search space for MLP.DecisionTree_hyperparameters (
dict
) – Defines the search space for Decision Trees.RandomForest_hyperparameters (
dict
) – Defines the search space for Random Forests and Extra-Trees.XGBoost_hyperparameters (
dict
) – Defines the search space for XGBoost.These variables define the parameter grid for RandomizedSearchCV where continuous distributions are used for continuous parameters (whenever this is feasible):
- Variables
SVM_hyperparameters_dist (
dict
) – Defines the search space for SVM.MLP_hyperparameters_dist (
dict
) – Defines the search space for MLP.DecisionTree_hyperparameters_dist (
dict
) – Defines the search space for Decision Trees.RandomForest_hyperparameters_dist (
dict
) – Defines the search space for Random Forests and Extra-Trees.XGBoost_hyperparameters_dist (
dict
) – Defines the search space for XGBoost.
kfold_no
= 5¶The number of outer folds that splits the dataset for the k-fold cross-validation.
- Type
int
kfold_inner_parameter
= 4¶The number of inner folds that splits the dataset for the k-fold cross-validation.
- Type
int
n_jobs
= 4¶Number of parallel jobs to be initiated. -1 means to utilize all available processors.
- Type
int
classification_method
= 'lgm'¶The classification group of features to use. (basic | basic_sorted | lgm).
See also
Features
Details on the supported groups.
- Type
str
hyperparams_search_method
= 'grid'¶Search Method to use for finding best hyperparameters. (randomized | grid).
See also
fineTuneClassifiers()
Details on the supported methods.
- Type
str
max_iter
= 300¶Number of iterations that RandomizedSearchCV should execute. It applies only when
hyperparams_search_method
equals to ‘randomized’.
- Type
int
max_features_to_show
= 10¶Number of ranked features to print
- Type
int
classifiers
= ['RandomForest']¶Define the classifiers to apply on code execution. Accepted values are:
SVM
DecisionTree
RandomForest
ExtraTrees
XGBoost
MLP.
- Type
list of str
score
= 'roc_auc'¶The metric to optimize on hyper-parameter tuning. Possible valid values presented on Scikit predefined values.
- Type
str