features¶
- class
interlinking.features.
Features
[source]¶This class loads the dataset, frequent terms and builds features that are used as input to supported classification groups:
basic: similarity features based on basic similarity measures.
basic_sorted: similarity features based on sorted version of the basic similarity measures used in basic group.
lgm: similarity features based on variations of LGM-Sim similarity measures.
See also
compute_features()
Details on the metrics each classification group implements.
build
()[source]¶Build features depending on the assignment of parameter
classification_method
and return values (fX, y) as ndarray of floats.
- Returns
fX (ndarray) – The computed features that will be used as input to ML classifiers.
y (ndarray) – Binary labels {True, False} to train the classifiers.
compute_features
(s1, s2, sorted=True, lgm_sims=True)[source]¶Depending on the group assigned to parameter
classification_method
, this method builds an ndarray of the following groups of features:
basic: various similarity measures, i.e.,
damerau_levenshtein()
,jaro()
,jaro_winkler()
and the reversed one,sorted_winkler()
,cosine()
,jaccard()
,strike_a_match()
,monge_elkan()
,soft_jaccard()
,davies()
,tuned_jaro_winkler()
and the reversed one,skipgrams()
.basic_sorted: sorted versions of similarity measures utilized in basic group, except for the
sorted_winkler()
.lgm: LGM-Sim variations that integrate, as internal, the similarity measures utilized in basic group, except for the
sorted_winkler()
.
- Parameters
s1, s2 (str) – Input toponyms.
sorted (bool, optional) – Value of True indicate to build features for groups basic and basic_sorted, value of False only for basic group.
lgm_sims (bool, optional) – Values of True or False indicate whether to build or not features for group lgm.
- Returns
It returns a list (vector) of features.
- Return type