Features utilities

features_utilities.create_args_dict(poi_gdf, train_idxs, required_args, read_path, write_path)[source]

Initializes and prepares structures required during features extraction.

Parameters
  • poi_gdf (geopandas.GeoDataFrame) – Contains the pois for which features will be created

  • train_idxs (numpy.ndarray) – Contains the train indexes

  • required_args (set) – Contains the names of the required args

  • read_path (str) – Path to read from

  • write_path (str) – Path to write to

Returns

Containing arguments names as keys and their corresponding structures as values

Return type

dict

features_utilities.create_concatenated_features(poi_gdf, train_idxs, test_idxs, fold_path)[source]

Loads a list of included features arrays in order to concatenate them into the final X_train and X_test arrays. Then saves these arrays as well as the corresponding y_train and y_test arrays. Finally, writes the included features configuration into a file.

Parameters
  • poi_gdf (geopandas.GeoDataFrame) – Contains the pois for which the features will be created

  • train_idxs (numpy.ndarray) – Contains the train indexes

  • test_idxs (numpy.ndarray) – Contains the test indexes

  • fold_path (str) – Path to save features arrays

Returns

None

features_utilities.create_finetuned_features(poi_gdf, features_info, best_feature_params, features_path, results_path)[source]

Creates and saves the X_train features array for the model_training step.

Parameters
  • poi_gdf (geopandas.GeoDataFrame) – Contains the pois for which the features will be created

  • features_info (list) – Containing the features (and whether they should be normalized or not) to be extracted

  • best_feature_params (dict) – Containing the best found features parameters values

  • features_path (str) – Path in order to read required external files (like osm streets file)

  • results_path (str) – Path to write to

Returns

The features array for model_training step

Return type

numpy.ndarray

features_utilities.create_single_feature(f, args, train_idxs, norm, scaler)[source]

Creates the features array given a feature’s name f.

Parameters
  • f (str) – Feature name to be created

  • args (dict) – Containing the required arguments for feature f

  • train_idxs (numpy.ndarray) – Contains the train indexes

  • norm (boolean) – Indicating whether the feature should be normalized or not

  • scaler (sklearn.preprocessing.MinMaxScaler) – The scaler to be utilized

Returns

numpy.ndarray: The features array of feature f

sklearn.preprocessing.MinMaxScaler: The scaler utilized

Return type

tuple

features_utilities.create_single_features(poi_gdf, train_idxs, fold_path)[source]

Creates all the included features arrays and saves them in fold_path.

Parameters
  • poi_gdf (geopandas.GeoDataFrame) – Contains the pois for which the features will be created

  • train_idxs (numpy.ndarray) – Contains the train indexes

  • fold_path (str) – Path to save features arrays

Returns

None

features_utilities.create_test_args_dict(test_poi_gdf, required_args, read_path1, read_path2)[source]

Instantiate and prepare structures required during features extraction in model_deployment step.

Parameters
  • test_poi_gdf (geopandas.GeoDataFrame) – Contains the pois for which features will be created

  • required_args (set) – Contains the names of the required args

  • read_path1 (str) – Path to features_extraction step results

  • read_path2 (str) – Path to model_training step results

Returns

Containing arguments names as keys and their corresponding structures as values

Return type

dict

features_utilities.create_test_features(poi_gdf, features, features_path, model_training_path, results_path)[source]

Creates and saves the X_test features array for the model_deployment step.

Parameters
  • poi_gdf (geopandas.GeoDataFrame) – Contains the pois for which the features will be created

  • features (list) – Containing the features (as well as their best found configuration) to be extracted

  • features_path (str) – Path to features_extraction step results

  • model_training_path (str) – Path to model_training step results

  • results_path (str) – Path to write to

Returns

The features array for model_deployment step

Return type

numpy.ndarray

features_utilities.encode_labels(poi_gdf, encoder=None)[source]

Encodes target column to with integer values.

Parameters
  • poi_gdf (geopandas.GeoDataFrame) – The GeoDataFrame containing the column to be encoded

  • encoder (sklearn.preprocessing.LabelEncoder, optional) – The label encoder to be utilized

Returns

geopandas.GeoDataFrame: The GeoDataFrame with the encoded column

sklearn.preprocessing.LabelEncoder: The label encoder utilized

Return type

tuple

features_utilities.get_bbox_coords(poi_gdf)[source]

Returns a bounding box containing all poi_gdf’s pois.

Parameters

poi_gdf (geopandas.GeoDataFrame) – Contains the pois

Returns

The bounding box coords as (south, west, north, east)

Return type

tuple

features_utilities.get_pois_by_street(poi_gdf, street_gdf)[source]

Matches each poi in poi_gdf to its nearest street.

Parameters
  • poi_gdf (geopandas.GeoDataFrame) – Contains pois to be matched to a street

  • street_gdf (geopandas.GeoDataFrame) – Contains streets to search among them for the nearest to each poi

Returns

Has streets ids as keys and a list containing the pois which belong to each street as values

Return type

dict

features_utilities.get_required_external_files(poi_gdf, feature_sets_path)[source]

Checks if external files are required and if so, downloads them using the Overpass API.

Parameters
  • poi_gdf (geopandas.GeoDataFrame) – Contains pois in order to define the area to query with Overpass API

  • feature_sets_path (str) – Path to store the downloaded elements

Returns

None

features_utilities.get_top_k(names, k, mode='term')[source]

Extracts the top k % terms or ngrams of names, based on mode.

Parameters
  • names (list) – Contains the names to be considered

  • k (float) – Percentage of top terms or ngrams to be considered

  • mode (str, optional) – May be ‘term’, ‘trigram’ or ‘fourgram’

Returns

Contains the top k terms or ngrams

Return type

list

features_utilities.load_poi_gdf(poi_fpath)[source]

Loads pois in poi_fpath into a geopandas.GeoDataFrame and project their geometries.

Parameters

poi_fpath (str) – Path to file containing the pois

Returns

geopandas.GeoDataFrame

features_utilities.load_street_gdf(street_fpath)[source]

Loads streets in street_fpath into a geopandas.GeoDataFrame and project their geometries.

Parameters

street_fpath (str) – Path to file containing the streets

Returns

geopandas.GeoDataFrame

features_utilities.ngrams(n, word)[source]

Generator of all n-grams of word.

Parameters
  • n (int) – The length of character ngrams to be extracted

  • word (str) – The word of which the ngrams are to be extracted

Yields

str – ngram

features_utilities.normalize_features(X, train_idxs, scaler=None)[source]

Normalize features to [0, 1].

Parameters
  • X (numpy.ndarray) – Features array to be normalized

  • train_idxs (numpy.ndarray) – Contains the train indexes

  • scaler (sklearn.preprocessing.MinMaxScaler, optional) – Scaler to be utilized

Returns

numpy.ndarray: The normalized features array

sklearn.preprocessing.MinMaxScaler: The scaler utilized

Return type

tuple