tune hyperparameters

class interlinking.hyperparam_tuning.ParamTuning[source]

This class provides all main methods for selecting, fine tuning hyperparameters, training and testing the best classifier for toponym matching. The following classifiers are examined:

  • Support Vector Machine (SVM)

  • Decision Trees

  • Multi-Layer Perceptron (MLP)

  • Random Forest

  • Extra-Trees

  • eXtreme Gradient Boosting (XGBoost)

fineTuneClassifiers(X, y)[source]

Search over specified parameter values for various estimators/classifiers and choose the best one.

This method searches over specified values and selects the classifier that achieves the best avg accuracy score for all evaluations. The supported search methods are:

  • GridSearchCV: Exhaustive search over specified parameter values for supported estimators. The following variables are defined in MLConf() :

  • MLP_hyperparameters

  • RandomForests_hyperparameters

  • XGBoost_hyperparameters

  • SVM_hyperparameters

  • DecisionTree_hyperparameters

  • RandomizedSearchCV: Randomized search over continuous distribution space. max_iter defines the number of parameter settings that are sampled. max_iter trades off runtime vs quality of the solution. The following variables are defined in MLConf() :

  • MLP_hyperparameters_dist

  • RandomForests_hyperparameters_dist

  • XGBoost_hyperparameters_dist

  • SVM_hyperparameters_dist

  • DecisionTree_hyperparameters_dist

Parameters
  • X (array-like or sparse matrix, shape = [n_samples, n_features]) – The training input samples.

  • y (array-like, shape = [n_samples] or [n_samples, n_outputs]) – The target values, i.e. class labels.

Returns

out – It returns a dictionary with keys accuracy, i.e., the used similarity score, and classifier, i.e., the name of the model in reference.

Return type

dict of {str: int, str: str}

trainClassifier(X_train, y_train, model)[source]

Build a classifier from the training set (X_train, y_train).

Parameters
  • X_train (array-like or sparse matrix, shape = [n_samples, n_features]) – The training input samples.

  • y_train (array-like, shape = [n_samples] or [n_samples, n_outputs]) – The target values, i.e. class labels.

  • model (classifier object) – An instance of a classifier.

Returns

It returns a trained classifier.

Return type

classifier object

testClassifier(X_test, y_test, model)[source]

Evaluate a classifier on a testing set (X_test, y_test).

Parameters
  • X_test (array-like or sparse matrix, shape = [n_samples, n_features]) – The training input samples.

  • y_test (array-like, shape = [n_samples] or [n_samples, n_outputs]) – The target values, i.e. class labels.

  • model (classifier object) – A trained classifier.

Returns

Returns the computed metrics, i.e., accuracy, precision, recall and f1, for the specified model on the test dataset.

Return type

tuple of (float, float, float, float)

Return Home