Textual features¶
-
textual_features.
create_textual_index
(poi_gdf, path)[source]¶ Creates index containing the pois names given.
- Parameters
poi_gdf (geopandas.GeoDataFrame) – Contains pois to be stored in the index
path (str) – Path to save the index
- Returns
None
-
textual_features.
get_similarity_per_class
(poi_gdf, textual_index_path, nlabels)[source]¶ Creates a features array. For each poi p (each row) the array will contain a score in column c, representing how similar p’s name is with each poi category.
- Parameters
poi_gdf (geopandas.GeoDataFrame) – Contains pois for which the features will be created
textual_index_path (str) – Path to the stored index
nlabels (int) – Number of poi categories
- Returns
The features array of shape (n_samples, n_features), here (len(poi_gdf), nlabels)
- Return type
numpy.ndarray
-
textual_features.
get_top_k_fourgrams
(poi_gdf, names, k)[source]¶ Creates a features array. Firstly, the top k % fourgrams among names are considered (e.g. a set of fourgrams T). Then, for each poi p (each row) the array will contain 1 (True) in column c, if fourgrams T[c] appears in p’s name.
- Parameters
poi_gdf (geopandas.GeoDataFrame) – Contains pois for which the features will be created
names (list) – Contains the names of train pois
k (float) – Percentage of top fourgrams to be considered
- Returns
The features array of shape (n_samples, n_features), here (len(poi_gdf), len(T))
- Return type
numpy.ndarray
-
textual_features.
get_top_k_terms
(poi_gdf, names, k)[source]¶ Creates a features array. Firstly, the top k % terms among names are considered (e.g. a set of terms T). Then, for each poi p (each row) the array will contain 1 (True) in column c, if term T[c] appears in p’s name.
- Parameters
poi_gdf (geopandas.GeoDataFrame) – Contains pois for which the features will be created
names (list) – Contains the names of train pois
k (float) – Percentage of top terms to be considered
- Returns
The features array of shape (n_samples, n_features), here (len(poi_gdf), len(T))
- Return type
numpy.ndarray
-
textual_features.
get_top_k_trigrams
(poi_gdf, names, k)[source]¶ Creates a features array. Firstly, the top k % trigrams among names are considered (e.g. a set of trigrams T). Then, for each poi p (each row) the array will contain 1 (True) in column c, if trigram T[c] appears in p’s name.
- Parameters
poi_gdf (geopandas.GeoDataFrame) – Contains pois for which the features will be created
names (list) – Contains the names of train pois
k (float) – Percentage of top trigrams to be considered
- Returns
The features array of shape (n_samples, n_features), here (len(poi_gdf), len(T))
- Return type
numpy.ndarray