learning

Hyperparameters

class poi_interlinking.learning.hyperparam_tuning.ParamTuning[source]

This class provides all main methods for selecting, fine tuning hyperparameters, training and testing the best classifier for toponym matching. The following classifiers are examined:

  • Support Vector Machine (SVM)

  • Decision Trees

  • Multi-Layer Perceptron (MLP)

  • Random Forest

  • Extra-Trees

  • eXtreme Gradient Boosting (XGBoost)

fineTuneClassifiers(X, y)[source]

Search over specified parameter values for various estimators/classifiers and choose the best one.

This method searches over specified values and selects the classifier that achieves the best avg accuracy score for all evaluations. The supported search methods are:

  • GridSearchCV: Exhaustive search over specified parameter values for supported estimators. The following variables are defined in MLConf :

  • MLP_hyperparameters

  • RandomForests_hyperparameters

  • XGBoost_hyperparameters

  • SVM_hyperparameters

  • DecisionTree_hyperparameters

  • RandomizedSearchCV: Randomized search over continuous distribution space. max_iter defines the number of parameter settings that are sampled. max_iter trades off runtime vs quality of the solution. The following variables are defined in MLConf :

  • MLP_hyperparameters_dist

  • RandomForests_hyperparameters_dist

  • XGBoost_hyperparameters_dist

  • SVM_hyperparameters_dist

  • DecisionTree_hyperparameters_dist

Parameters
  • X (array-like or sparse matrix, shape = [n_samples, n_features]) – The training input samples.

  • y (array-like, shape = [n_samples] or [n_samples, n_outputs]) – The target values, i.e. class labels.

Returns

out – It returns a dictionary with keys accuracy, i.e., the used similarity score, and classifier, i.e., the name of the model in reference.

Return type

dict of {str: int, str: str}

trainClassifier(X_train, y_train, model)[source]

Build a classifier from the training set (X_train, y_train).

Parameters
  • X_train (array-like or sparse matrix, shape = [n_samples, n_features]) – The training input samples.

  • y_train (array-like, shape = [n_samples] or [n_samples, n_outputs]) – The target values, i.e. class labels.

  • model (classifier object) – An instance of a classifier.

Returns

It returns a trained classifier.

Return type

classifier object

testClassifier(X_test, y_test, model)[source]

Evaluate a classifier on a testing set (X_test, y_test).

Parameters
  • X_test (array-like or sparse matrix, shape = [n_samples, n_features]) – The training input samples.

  • y_test (array-like, shape = [n_samples] or [n_samples, n_outputs]) – The target values, i.e. class labels.

  • model (classifier object) – A trained classifier.

Returns

Returns the computed metrics, i.e., accuracy, precision, recall and f1, for the specified model on the test dataset.

Return type

tuple of (float, float, float, float)

Similarity thresholds and weights

poi_interlinking.learning.parameters.learn_thres(fname, sim_group='basic')[source]

Learn optimal thresholds of supported similarity metrics on achieving highest accuracy on input data.

Parameters
  • fname (str) – Input filename to search for optimal thresholds.

  • sim_group (str) – The group of metrics to search for optimal thresholds. This applies to all groups except for lgm.

See also

Features

Details on the supported groups.

poi_interlinking.learning.parameters.learn_params_for_lgm(fname, encoding)[source]

Learn optimal thresholds and weights for the lgm group of similarity metrics on achieving highest accuracy on input data.

Parameters
  • fname (str) – Input filename to search for optimal thresholds.

  • encoding (str) – The encoding of the fname. Valid options are latin or global.

See also

Features

Details on the supported groups.

Return Home