config¶
interlinking.config.
default_data_path
= 'data'¶The folder name, relative to root path, that contains all required input files, e.g., train/test dataset, frequent terms etc.
- Type
interlinking.config.
fieldnames
= ['s1', 's2', 'status', 'c1', 'c2', 'a1', 'a2', 'cc1', 'cc2']¶A list of names assigned to each column in train/test dataset. If a header exists, it should be set to None.
- Type
list of str
interlinking.config.
use_cols
= {'label': 'status', 's1': 's1', 's2': 's2'}¶A dictionary of useful column names.
interlinking.config.
delimiter
= '\t'¶The delimiter used to separate each column in CSV input files.
- Type
char
interlinking.config.
sort_thres
= 0.55¶Similarity threshold on whether sorting on toponym tokens is applied or not. It is triggered on a score below the assigned threshold.
- Type
- class
interlinking.config.
MLConf
[source]¶This class initializes parameters that correspond to the machine learning part of the framework.
- Variables
opt_values (
dict
of dicts) – A list of learned parameters for LGM-Sim meta-similarity function. \(θ_{split}, w_b, w_m, w_f\).clf_custom_params (
dict
of dicts) – A list of custom hyper-parameters to utilize for specified classifiers. These parameters are used when evaluate command is executed in cli.These variables define the parameter grid for GridSearchCV:
- Variables
SVM_hyperparameters (
list
) – Defines the search space for SVM.MLP_hyperparameters (
dict
) – Defines the search space for MLP.DecisionTree_hyperparameters (
dict
) – Defines the search space for Decision Trees.RandomForest_hyperparameters (
dict
) – Defines the search space for Random Forests and Extra-Trees.XGBoost_hyperparameters (
dict
) – Defines the search space for XGBoost.These variables define the parameter grid for RandomizedSearchCV where continuous distributions are used for continuous parameters (whenever this is feasible):
- Variables
SVM_hyperparameters_dist (
dict
) – Defines the search space for SVM.MLP_hyperparameters_dist (
dict
) – Defines the search space for MLP.DecisionTree_hyperparameters_dist (
dict
) – Defines the search space for Decision Trees.RandomForest_hyperparameters_dist (
dict
) – Defines the search space for Random Forests and Extra-Trees.XGBoost_hyperparameters_dist (
dict
) – Defines the search space for XGBoost.
kfold_no
= 5¶The number of outer folds that splits the dataset for the k-fold cross-validation.
- Type
n_jobs
= 4¶Number of parallel jobs to be initiated. -1 means to utilize all available processors.
- Type
classification_method
= 'lgm'¶The classification group of features to use. (basic | basic_sorted | lgm).
See also
Features
Details on available inputs.
- Type
hyperparams_search_method
= 'randomized'¶Search Method to use for finding best hyperparameters. (randomized | grid).
See also
fineTuneClassifiers()
Details on available inputs.
- Type
classifiers
= ['RandomForest']¶Define the classifiers to apply on code execution. Accepted values are:
SVM
DecisionTree
RandomForest
ExtraTrees
XGBoost
MLP.
- Type
list of str
score
= 'accuracy'¶The metric to optimize on hyper-parameter tuning. Possible valid values presented on Scikit predefined values.
- Type