Features utilities¶

features_utilities.create_args_dict(poi_gdf, train_idxs, required_args, read_path, write_path)[source]¶

Initializes and prepares structures required during features extraction.

Parameters

poi_gdf (geopandas.GeoDataFrame) – Contains the pois for which features will be created
train_idxs (numpy.ndarray) – Contains the train indexes
required_args (set) – Contains the names of the required args
read_path (str) – Path to read from
write_path (str) – Path to write to

Returns

Containing arguments names as keys and their corresponding structures as values

Return type

dict

features_utilities.create_concatenated_features(poi_gdf, train_idxs, test_idxs, fold_path)[source]¶

Loads a list of included features arrays in order to concatenate them into the final X_train and X_test arrays. Then saves these arrays as well as the corresponding y_train and y_test arrays. Finally, writes the included features configuration into a file.

Parameters

poi_gdf (geopandas.GeoDataFrame) – Contains the pois for which the features will be created
train_idxs (numpy.ndarray) – Contains the train indexes
test_idxs (numpy.ndarray) – Contains the test indexes
fold_path (str) – Path to save features arrays

Returns

None

features_utilities.create_finetuned_features(poi_gdf, features_info, best_feature_params, features_path, results_path)[source]¶

Creates and saves the X_train features array for the model_training step.

Parameters

poi_gdf (geopandas.GeoDataFrame) – Contains the pois for which the features will be created
features_info (list) – Containing the features (and whether they should be normalized or not) to be extracted
best_feature_params (dict) – Containing the best found features parameters values
features_path (str) – Path in order to read required external files (like osm streets file)
results_path (str) – Path to write to

Returns

The features array for model_training step

Return type

numpy.ndarray

features_utilities.create_single_feature(f, args, train_idxs, norm, scaler)[source]¶

Creates the features array given a feature’s name f.

Parameters

f (str) – Feature name to be created
args (dict) – Containing the required arguments for feature f
train_idxs (numpy.ndarray) – Contains the train indexes
norm (boolean) – Indicating whether the feature should be normalized or not
scaler (sklearn.preprocessing.MinMaxScaler) – The scaler to be utilized

Returns

numpy.ndarray: The features array of feature f

sklearn.preprocessing.MinMaxScaler: The scaler utilized

Return type

tuple

features_utilities.create_single_features(poi_gdf, train_idxs, fold_path)[source]¶

Creates all the included features arrays and saves them in fold_path.

Parameters

poi_gdf (geopandas.GeoDataFrame) – Contains the pois for which the features will be created
train_idxs (numpy.ndarray) – Contains the train indexes
fold_path (str) – Path to save features arrays

Returns

None

features_utilities.create_test_args_dict(test_poi_gdf, required_args, read_path1, read_path2)[source]¶

Instantiate and prepare structures required during features extraction in model_deployment step.

Parameters

test_poi_gdf (geopandas.GeoDataFrame) – Contains the pois for which features will be created
required_args (set) – Contains the names of the required args
read_path1 (str) – Path to features_extraction step results
read_path2 (str) – Path to model_training step results

Returns

Containing arguments names as keys and their corresponding structures as values

Return type

dict

features_utilities.create_test_features(poi_gdf, features, features_path, model_training_path, results_path)[source]¶

Creates and saves the X_test features array for the model_deployment step.

Parameters

poi_gdf (geopandas.GeoDataFrame) – Contains the pois for which the features will be created
features (list) – Containing the features (as well as their best found configuration) to be extracted
features_path (str) – Path to features_extraction step results
model_training_path (str) – Path to model_training step results
results_path (str) – Path to write to

Returns

The features array for model_deployment step

Return type

numpy.ndarray

features_utilities.encode_labels(poi_gdf, encoder=None)[source]¶

Encodes target column to with integer values.

Parameters

poi_gdf (geopandas.GeoDataFrame) – The GeoDataFrame containing the column to be encoded
encoder (sklearn.preprocessing.LabelEncoder, optional) – The label encoder to be utilized

Returns

geopandas.GeoDataFrame: The GeoDataFrame with the encoded column

sklearn.preprocessing.LabelEncoder: The label encoder utilized

Return type

tuple

features_utilities.get_bbox_coords(poi_gdf)[source]¶

Returns a bounding box containing all poi_gdf’s pois.

Parameters: poi_gdf (geopandas.GeoDataFrame) – Contains the pois
Returns: The bounding box coords as (south, west, north, east)
Return type: tuple

features_utilities.get_pois_by_street(poi_gdf, street_gdf)[source]¶

Matches each poi in poi_gdf to its nearest street.

Parameters

poi_gdf (geopandas.GeoDataFrame) – Contains pois to be matched to a street
street_gdf (geopandas.GeoDataFrame) – Contains streets to search among them for the nearest to each poi

Returns

Has streets ids as keys and a list containing the pois which belong to each street as values

Return type

dict

features_utilities.get_required_external_files(poi_gdf, feature_sets_path)[source]¶

Checks if external files are required and if so, downloads them using the Overpass API.

Parameters

poi_gdf (geopandas.GeoDataFrame) – Contains pois in order to define the area to query with Overpass API
feature_sets_path (str) – Path to store the downloaded elements

Returns

None

features_utilities.get_top_k(names, k, mode='term')[source]¶

Extracts the top k % terms or ngrams of names, based on mode.

Parameters

names (list) – Contains the names to be considered
k (float) – Percentage of top terms or ngrams to be considered
mode (str, optional) – May be ‘term’, ‘trigram’ or ‘fourgram’

Returns

Contains the top k terms or ngrams

Return type

list

features_utilities.load_poi_gdf(poi_fpath)[source]¶

Loads pois in poi_fpath into a geopandas.GeoDataFrame and project their geometries.

Parameters: poi_fpath (str) – Path to file containing the pois
Returns: geopandas.GeoDataFrame

features_utilities.load_street_gdf(street_fpath)[source]¶

Loads streets in street_fpath into a geopandas.GeoDataFrame and project their geometries.

Parameters: street_fpath (str) – Path to file containing the streets
Returns: geopandas.GeoDataFrame

features_utilities.ngrams(n, word)[source]¶

Generator of all n-grams of word.

Parameters

n (int) – The length of character ngrams to be extracted
word (str) – The word of which the ngrams are to be extracted

Yields

str – ngram

features_utilities.normalize_features(X, train_idxs, scaler=None)[source]¶

Normalize features to [0, 1].

Parameters

X (numpy.ndarray) – Features array to be normalized
train_idxs (numpy.ndarray) – Contains the train indexes
scaler (sklearn.preprocessing.MinMaxScaler, optional) – Scaler to be utilized

Returns

numpy.ndarray: The normalized features array

sklearn.preprocessing.MinMaxScaler: The scaler utilized

Return type

tuple