Features utilities¶
-
features_utilities.
create_args_dict
(poi_gdf, train_idxs, required_args, read_path, write_path)[source]¶ Initializes and prepares structures required during features extraction.
- Parameters
poi_gdf (geopandas.GeoDataFrame) – Contains the pois for which features will be created
train_idxs (numpy.ndarray) – Contains the train indexes
required_args (set) – Contains the names of the required args
read_path (str) – Path to read from
write_path (str) – Path to write to
- Returns
Containing arguments names as keys and their corresponding structures as values
- Return type
dict
-
features_utilities.
create_concatenated_features
(poi_gdf, train_idxs, test_idxs, fold_path)[source]¶ Loads a list of included features arrays in order to concatenate them into the final X_train and X_test arrays. Then saves these arrays as well as the corresponding y_train and y_test arrays. Finally, writes the included features configuration into a file.
- Parameters
poi_gdf (geopandas.GeoDataFrame) – Contains the pois for which the features will be created
train_idxs (numpy.ndarray) – Contains the train indexes
test_idxs (numpy.ndarray) – Contains the test indexes
fold_path (str) – Path to save features arrays
- Returns
None
-
features_utilities.
create_finetuned_features
(poi_gdf, features_info, best_feature_params, features_path, results_path)[source]¶ Creates and saves the X_train features array for the model_training step.
- Parameters
poi_gdf (geopandas.GeoDataFrame) – Contains the pois for which the features will be created
features_info (list) – Containing the features (and whether they should be normalized or not) to be extracted
best_feature_params (dict) – Containing the best found features parameters values
features_path (str) – Path in order to read required external files (like osm streets file)
results_path (str) – Path to write to
- Returns
The features array for model_training step
- Return type
numpy.ndarray
-
features_utilities.
create_single_feature
(f, args, train_idxs, norm, scaler)[source]¶ Creates the features array given a feature’s name f.
- Parameters
f (str) – Feature name to be created
args (dict) – Containing the required arguments for feature f
train_idxs (numpy.ndarray) – Contains the train indexes
norm (boolean) – Indicating whether the feature should be normalized or not
scaler (sklearn.preprocessing.MinMaxScaler) – The scaler to be utilized
- Returns
numpy.ndarray: The features array of feature f
sklearn.preprocessing.MinMaxScaler: The scaler utilized
- Return type
tuple
-
features_utilities.
create_single_features
(poi_gdf, train_idxs, fold_path)[source]¶ Creates all the included features arrays and saves them in fold_path.
- Parameters
poi_gdf (geopandas.GeoDataFrame) – Contains the pois for which the features will be created
train_idxs (numpy.ndarray) – Contains the train indexes
fold_path (str) – Path to save features arrays
- Returns
None
-
features_utilities.
create_test_args_dict
(test_poi_gdf, required_args, read_path1, read_path2)[source]¶ Instantiate and prepare structures required during features extraction in model_deployment step.
- Parameters
test_poi_gdf (geopandas.GeoDataFrame) – Contains the pois for which features will be created
required_args (set) – Contains the names of the required args
read_path1 (str) – Path to features_extraction step results
read_path2 (str) – Path to model_training step results
- Returns
Containing arguments names as keys and their corresponding structures as values
- Return type
dict
-
features_utilities.
create_test_features
(poi_gdf, features, features_path, model_training_path, results_path)[source]¶ Creates and saves the X_test features array for the model_deployment step.
- Parameters
poi_gdf (geopandas.GeoDataFrame) – Contains the pois for which the features will be created
features (list) – Containing the features (as well as their best found configuration) to be extracted
features_path (str) – Path to features_extraction step results
model_training_path (str) – Path to model_training step results
results_path (str) – Path to write to
- Returns
The features array for model_deployment step
- Return type
numpy.ndarray
-
features_utilities.
encode_labels
(poi_gdf, encoder=None)[source]¶ Encodes target column to with integer values.
- Parameters
poi_gdf (geopandas.GeoDataFrame) – The GeoDataFrame containing the column to be encoded
encoder (sklearn.preprocessing.LabelEncoder, optional) – The label encoder to be utilized
- Returns
geopandas.GeoDataFrame: The GeoDataFrame with the encoded column
sklearn.preprocessing.LabelEncoder: The label encoder utilized
- Return type
tuple
-
features_utilities.
get_bbox_coords
(poi_gdf)[source]¶ Returns a bounding box containing all poi_gdf’s pois.
- Parameters
poi_gdf (geopandas.GeoDataFrame) – Contains the pois
- Returns
The bounding box coords as (south, west, north, east)
- Return type
tuple
-
features_utilities.
get_pois_by_street
(poi_gdf, street_gdf)[source]¶ Matches each poi in poi_gdf to its nearest street.
- Parameters
poi_gdf (geopandas.GeoDataFrame) – Contains pois to be matched to a street
street_gdf (geopandas.GeoDataFrame) – Contains streets to search among them for the nearest to each poi
- Returns
Has streets ids as keys and a list containing the pois which belong to each street as values
- Return type
dict
-
features_utilities.
get_required_external_files
(poi_gdf, feature_sets_path)[source]¶ Checks if external files are required and if so, downloads them using the Overpass API.
- Parameters
poi_gdf (geopandas.GeoDataFrame) – Contains pois in order to define the area to query with Overpass API
feature_sets_path (str) – Path to store the downloaded elements
- Returns
None
-
features_utilities.
get_top_k
(names, k, mode='term')[source]¶ Extracts the top k % terms or ngrams of names, based on mode.
- Parameters
names (list) – Contains the names to be considered
k (float) – Percentage of top terms or ngrams to be considered
mode (str, optional) – May be ‘term’, ‘trigram’ or ‘fourgram’
- Returns
Contains the top k terms or ngrams
- Return type
list
-
features_utilities.
load_poi_gdf
(poi_fpath)[source]¶ Loads pois in poi_fpath into a geopandas.GeoDataFrame and project their geometries.
- Parameters
poi_fpath (str) – Path to file containing the pois
- Returns
geopandas.GeoDataFrame
-
features_utilities.
load_street_gdf
(street_fpath)[source]¶ Loads streets in street_fpath into a geopandas.GeoDataFrame and project their geometries.
- Parameters
street_fpath (str) – Path to file containing the streets
- Returns
geopandas.GeoDataFrame
-
features_utilities.
ngrams
(n, word)[source]¶ Generator of all n-grams of word.
- Parameters
n (int) – The length of character ngrams to be extracted
word (str) – The word of which the ngrams are to be extracted
- Yields
str – ngram
-
features_utilities.
normalize_features
(X, train_idxs, scaler=None)[source]¶ Normalize features to [0, 1].
- Parameters
X (numpy.ndarray) – Features array to be normalized
train_idxs (numpy.ndarray) – Contains the train indexes
scaler (sklearn.preprocessing.MinMaxScaler, optional) – Scaler to be utilized
- Returns
numpy.ndarray: The normalized features array
sklearn.preprocessing.MinMaxScaler: The scaler utilized
- Return type
tuple