learning¶

Hyperparameters¶

class poi_interlinking.learning.hyperparam_tuning.ParamTuning[source]¶

This class provides all main methods for selecting, fine tuning hyperparameters, training and testing the best classifier for toponym matching. The following classifiers are examined:

Support Vector Machine (SVM)

Decision Trees

Multi-Layer Perceptron (MLP)

Random Forest

Extra-Trees

eXtreme Gradient Boosting (XGBoost)

fineTuneClassifiers(X, y)[source]¶

Search over specified parameter values for various estimators/classifiers and choose the best one.

This method searches over specified values and selects the classifier that achieves the best avg accuracy score for all evaluations. The supported search methods are:

GridSearchCV: Exhaustive search over specified parameter values for supported estimators. The following variables are defined in MLConf :

MLP_hyperparameters

RandomForests_hyperparameters

XGBoost_hyperparameters

SVM_hyperparameters

DecisionTree_hyperparameters

RandomizedSearchCV: Randomized search over continuous distribution space. max_iter defines the number of parameter settings that are sampled. max_iter trades off runtime vs quality of the solution. The following variables are defined in MLConf :

MLP_hyperparameters_dist

RandomForests_hyperparameters_dist

XGBoost_hyperparameters_dist

SVM_hyperparameters_dist

DecisionTree_hyperparameters_dist

Parameters

X (array-like or sparse matrix, shape = [n_samples, n_features]) – The training input samples.

y (array-like, shape = [n_samples] or [n_samples, n_outputs]) – The target values, i.e. class labels.

Returns

out – It returns a dictionary with keys accuracy, i.e., the used similarity score, and classifier, i.e., the name of the model in reference.

Return type

dict of {str: int, str: str}

trainClassifier(X_train, y_train, model)[source]¶

Build a classifier from the training set (X_train, y_train).

Parameters

X_train (array-like or sparse matrix, shape = [n_samples, n_features]) – The training input samples.

y_train (array-like, shape = [n_samples] or [n_samples, n_outputs]) – The target values, i.e. class labels.

model (classifier object) – An instance of a classifier.

Returns

It returns a trained classifier.

Return type

classifier object

testClassifier(X_test, y_test, model)[source]¶

Evaluate a classifier on a testing set (X_test, y_test).

Parameters

X_test (array-like or sparse matrix, shape = [n_samples, n_features]) – The training input samples.

y_test (array-like, shape = [n_samples] or [n_samples, n_outputs]) – The target values, i.e. class labels.

model (classifier object) – A trained classifier.

Returns

Returns the computed metrics, i.e., accuracy, precision, recall and f1, for the specified model on the test dataset.

Return type

tuple of (float, float, float, float)

Similarity thresholds and weights¶

poi_interlinking.learning.parameters.learn_thres(fname, sim_group='basic')[source]¶

Learn optimal thresholds of supported similarity metrics on achieving highest accuracy on input data.

Parameters

fname (str) – Input filename to search for optimal thresholds.

sim_group (str) – The group of metrics to search for optimal thresholds. This applies to all groups except for lgm.

See also

Features
Details on the supported groups.

poi_interlinking.learning.parameters.learn_params_for_lgm(fname, encoding)[source]¶

Learn optimal thresholds and weights for the lgm group of similarity metrics on achieving highest accuracy on input data.

Parameters

fname (str) – Input filename to search for optimal thresholds.

encoding (str) – The encoding of the fname. Valid options are latin or global.

See also

Features
Details on the supported groups.

Return Home