learning¶
Hyperparameters¶
- class
poi_interlinking.learning.hyperparam_tuning.
ParamTuning
[source]¶This class provides all main methods for selecting, fine tuning hyperparameters, training and testing the best classifier for toponym matching. The following classifiers are examined:
Support Vector Machine (SVM)
Decision Trees
Multi-Layer Perceptron (MLP)
Random Forest
Extra-Trees
eXtreme Gradient Boosting (XGBoost)
fineTuneClassifiers
(X, y)[source]¶Search over specified parameter values for various estimators/classifiers and choose the best one.
This method searches over specified values and selects the classifier that achieves the best avg accuracy score for all evaluations. The supported search methods are:
GridSearchCV: Exhaustive search over specified parameter values for supported estimators. The following variables are defined in
MLConf
:
MLP_hyperparameters
RandomForests_hyperparameters
XGBoost_hyperparameters
SVM_hyperparameters
DecisionTree_hyperparameters
RandomizedSearchCV: Randomized search over continuous distribution space.
max_iter
defines the number of parameter settings that are sampled.max_iter
trades off runtime vs quality of the solution. The following variables are defined inMLConf
:
MLP_hyperparameters_dist
RandomForests_hyperparameters_dist
XGBoost_hyperparameters_dist
SVM_hyperparameters_dist
DecisionTree_hyperparameters_dist
- Parameters
X (array-like or sparse matrix, shape = [n_samples, n_features]) – The training input samples.
y (array-like, shape = [n_samples] or [n_samples, n_outputs]) – The target values, i.e. class labels.
- Returns
out – It returns a dictionary with keys accuracy, i.e., the used similarity score, and classifier, i.e., the name of the model in reference.
- Return type
dict
of {str
:int
,str
:str
}
trainClassifier
(X_train, y_train, model)[source]¶Build a classifier from the training set (X_train, y_train).
- Parameters
X_train (array-like or sparse matrix, shape = [n_samples, n_features]) – The training input samples.
y_train (array-like, shape = [n_samples] or [n_samples, n_outputs]) – The target values, i.e. class labels.
model (classifier object) – An instance of a classifier.
- Returns
It returns a trained classifier.
- Return type
classifier object
testClassifier
(X_test, y_test, model)[source]¶Evaluate a classifier on a testing set (X_test, y_test).
- Parameters
X_test (array-like or sparse matrix, shape = [n_samples, n_features]) – The training input samples.
y_test (array-like, shape = [n_samples] or [n_samples, n_outputs]) – The target values, i.e. class labels.
model (classifier object) – A trained classifier.
- Returns
Returns the computed metrics, i.e., accuracy, precision, recall and f1, for the specified model on the test dataset.
- Return type
tuple of (float, float, float, float)
Similarity thresholds and weights¶
poi_interlinking.learning.parameters.
learn_thres
(fname, sim_group='basic')[source]¶Learn optimal thresholds of supported similarity metrics on achieving highest accuracy on input data.
- Parameters
fname (str) – Input filename to search for optimal thresholds.
sim_group (str) – The group of metrics to search for optimal thresholds. This applies to all groups except for
lgm
.See also
Features
Details on the supported groups.
poi_interlinking.learning.parameters.
learn_params_for_lgm
(fname, encoding)[source]¶Learn optimal thresholds and weights for the
lgm
group of similarity metrics on achieving highest accuracy on input data.
- Parameters
fname (str) – Input filename to search for optimal thresholds.
encoding (str) – The encoding of the fname. Valid options are latin or global.
See also
Features
Details on the supported groups.