miscellaneous¶
poi_interlinking.helpers.
strip_accents
(s)[source]¶Transliterate any unicode string into the closest possible representation in ascii text.
- Parameters
s (str) – Input string
- Returns
The transliterated string.
- Return type
str
poi_interlinking.helpers.
transform
(s1, s2, sorting=False, canonical=False, delimiter=' ', simple_sorting=False)[source]¶Perform normalization processes to input strings such as lowercasing, transliteration and punctuation/accentuation alignment.
- Parameters
s1 (str) – The first string.
s2 (str) – The second string.
sorting (bool) – A boolean flag whether to perform a custom mechanism of sorting or not. Specifically, an alphanumerical sorting applies only when the strings similarity is below the
sort_thres
. IfTrue
andsimple_sorting
isFalse
, then perform the custom type of sorting.canonical (bool) – A boolean flag whether to perform canonical decomposition, i.e., translates each character into its decomposed form, and, afterwards, apply the compatibility decomposition, i.e. replace all compatibility characters with their equivalents.sorting or not.
delimiter (str) – Character used to split s1 and s2.
simple_sorting (bool) – If
True
apply alphanumeric sorting on s1 and s2.- Returns
s1, s2 – The transformed strings according to the selected parameters, e.g., canonical, sorting or simple_sorting.
- Return type
str