Textual features

textual_features.create_textual_index(poi_gdf, path)[source]

Creates index containing the pois names given.

Parameters
  • poi_gdf (geopandas.GeoDataFrame) – Contains pois to be stored in the index

  • path (str) – Path to save the index

Returns

None

textual_features.get_similarity_per_class(poi_gdf, textual_index_path, nlabels)[source]

Creates a features array. For each poi p (each row) the array will contain a score in column c, representing how similar p’s name is with each poi category.

Parameters
  • poi_gdf (geopandas.GeoDataFrame) – Contains pois for which the features will be created

  • textual_index_path (str) – Path to the stored index

  • nlabels (int) – Number of poi categories

Returns

The features array of shape (n_samples, n_features), here (len(poi_gdf), nlabels)

Return type

numpy.ndarray

textual_features.get_top_k_fourgrams(poi_gdf, names, k)[source]

Creates a features array. Firstly, the top k % fourgrams among names are considered (e.g. a set of fourgrams T). Then, for each poi p (each row) the array will contain 1 (True) in column c, if fourgrams T[c] appears in p’s name.

Parameters
  • poi_gdf (geopandas.GeoDataFrame) – Contains pois for which the features will be created

  • names (list) – Contains the names of train pois

  • k (float) – Percentage of top fourgrams to be considered

Returns

The features array of shape (n_samples, n_features), here (len(poi_gdf), len(T))

Return type

numpy.ndarray

textual_features.get_top_k_terms(poi_gdf, names, k)[source]

Creates a features array. Firstly, the top k % terms among names are considered (e.g. a set of terms T). Then, for each poi p (each row) the array will contain 1 (True) in column c, if term T[c] appears in p’s name.

Parameters
  • poi_gdf (geopandas.GeoDataFrame) – Contains pois for which the features will be created

  • names (list) – Contains the names of train pois

  • k (float) – Percentage of top terms to be considered

Returns

The features array of shape (n_samples, n_features), here (len(poi_gdf), len(T))

Return type

numpy.ndarray

textual_features.get_top_k_trigrams(poi_gdf, names, k)[source]

Creates a features array. Firstly, the top k % trigrams among names are considered (e.g. a set of trigrams T). Then, for each poi p (each row) the array will contain 1 (True) in column c, if trigram T[c] appears in p’s name.

Parameters
  • poi_gdf (geopandas.GeoDataFrame) – Contains pois for which the features will be created

  • names (list) – Contains the names of train pois

  • k (float) – Percentage of top trigrams to be considered

Returns

The features array of shape (n_samples, n_features), here (len(poi_gdf), len(T))

Return type

numpy.ndarray