miscellaneous

poi_interlinking.helpers.strip_accents(s)[source]

Transliterate any unicode string into the closest possible representation in ascii text.

Parameters

s (str) – Input string

Returns

The transliterated string.

Return type

str

poi_interlinking.helpers.transform(s1, s2, sorting=False, canonical=False, delimiter=' ', simple_sorting=False)[source]

Perform normalization processes to input strings such as lowercasing, transliteration and punctuation/accentuation alignment.

Parameters
  • s1 (str) – The first string.

  • s2 (str) – The second string.

  • sorting (bool) – A boolean flag whether to perform a custom mechanism of sorting or not. Specifically, an alphanumerical sorting applies only when the strings similarity is below the sort_thres. If True and simple_sorting is False, then perform the custom type of sorting.

  • canonical (bool) – A boolean flag whether to perform canonical decomposition, i.e., translates each character into its decomposed form, and, afterwards, apply the compatibility decomposition, i.e. replace all compatibility characters with their equivalents.sorting or not.

  • delimiter (str) – Character used to split s1 and s2.

  • simple_sorting (bool) – If True apply alphanumeric sorting on s1 and s2.

Returns

s1, s2 – The transformed strings according to the selected parameters, e.g., canonical, sorting or simple_sorting.

Return type

str

poi_interlinking.helpers.sorted_nicely(l)[source]

Sort the given iterable in the way that is expected.

Parameters

l (list or set of str) – The iterable to be sorted.

Returns

A sorted list of strs

Return type

list

poi_interlinking.misc.writers.write_results(fpath, results, delimiter='&')[source]

Writes full and averaged experiment results.

Parameters
  • fpath (str) – Path to write.

  • results (dict) – Contains metrics as keys and the corresponding values values.

  • delimiter (str) – Field delimiter to use.

Return Home