utilities¶
related to classifiers¶
geocoding.clf_utilities.
create_clf_params_product_generator
(params_grid)[source]¶Generates all possible combinations of classifier’s hyperparameters values.
- Parameters
params_grid (dict) – Contains classifier’s hyperparameters names as keys and the correspoding search space as values
- Yields
dict – Contains a classifier’s hyperparameters configuration
geocoding.clf_utilities.
evaluate
(y_test, y_pred)[source]¶Evaluates model predictions through a series of metrics.
- Parameters
y_test (numpy.ndarray) – True labels
y_pred (numpy.ndarray) – Predicted labels
- Returns
Contains metrics names as keys and the corresponding values as values
- Return type
dict
geocoding.clf_utilities.
get_predictions
(model, X_test)[source]¶Makes predictions utilizing model over X_test.
- Parameters
model (object) – The model to be used for predictions
X_test (numpy.ndarray) – The test features array
- Returns
Contains predictions in (label, score) pairs
- Return type
list
geocoding.clf_utilities.
inverse_transform_labels
(encoder, preds)[source]¶Utilizes encoder to transform encoded labels back to the original strings.
- Parameters
encoder (sklearn.preprocessing.LabelEncoder) – The encoder to be utilized
k_preds (list) – Contains predictions in (label, score) pairs
- Returns
Contains predictions in (label, score) pairs, where label is now in the original string format
- Return type
list
geocoding.clf_utilities.
is_valid
(clf_name)[source]¶Checks whether clf_name is a valid classifier’s name with respect to the experiment setup.
- Parameters
clf_name (str) – Classifier’s name
- Returns
Returns True if given classifier’s name is valid
- Return type
bool
geocoding.clf_utilities.
normalize_scores
(scores)[source]¶Normalizes predictions scores to a probabilities-like format.
- Parameters
scores (list) – Contains the predictions scores as predicted by the model
- Returns
The normalized scores
- Return type
list
geocoding.clf_utilities.
train_classifier
(clf_name, X_train, y_train)[source]¶Trains a classifier through grid search.
- Parameters
clf_name (str) – Classifier’s name to be trained
X_train (numpy.ndarray) – Train features array
y_train (numpy.ndarray) – Train labels array
- Returns
The trained classifier
- Return type
object
related to OSM¶
geocoding.osm_utilities.
cluster_points
(X)[source]¶Clusters points given in X.
- Parameters
X (numpy.ndarray) – Contains the points to be clustered
- Returns
The predicted clusters labels
- Return type
numpy.ndarray
geocoding.osm_utilities.
download_cell
(cell, fpath)[source]¶Downloads cell from Overpass API, writes results in fpath and then parses them into a pandas.DataFrame.
- Parameters
cell (list) – Contains the bounding box coords
fpath (str) – Path to write results and then to read from in order to parse them
- Returns
Contains all street elements included in cell
- Return type
pandas.DataFrame
geocoding.osm_utilities.
extract_streets
(points, path)[source]¶A wrapper function that administrates the streets download.
- Parameters
points (numpy.ndarray) – Contains the data points that define the area to extract from Overpass API
path (str) – Path to write
- Returns
None
geocoding.osm_utilities.
get_clusters_bboxes
(X, labels)[source]¶Extracts a bounding box for each one of the clusters.
- Parameters
X (numpy.ndarray) – Contains the clustered points
labels (numpy.ndarray) – Contains the cluster label for each point in X
- Returns
Contains the cluster labels as keys and the corresponding bounding box as values
- Return type
dict
related to features¶
geocoding.features_utilities.
create_test_features
(df, in_path, scalers_path, out_path, features=None)[source]¶Creates all the included test features arrays and saves them in out_path.
- Parameters
df (pandas.DataFrame) – Contains the test points
in_path (str) – Path to read required items
scalers_path (str) – Path to load required scalers
out_path (str) – Path to write
features (list, optional) – Contains the names of the features to extract
- Returns
The test features array
- Return type
numpy.ndarray
geocoding.features_utilities.
create_train_features
(df, in_path, out_path, features=None)[source]¶Creates all the included train features arrays and saves them in out_path.
- Parameters
df (pandas.DataFrame) – Contains the train points
in_path (str) – Path to read required items
out_path (str) – Path to write
features (list, optional) – Contains the names of the features to extract
- Returns
The train features array
- Return type
numpy.ndarray
geocoding.features_utilities.
encode_labels
(df, encoder=None)[source]¶Encodes target column to with integer values.
- Parameters
df (pandas.DataFrame) – The DataFrame containing the column to be encoded
encoder (sklearn.preprocessing.LabelEncoder, optional) – The label encoder to be utilized
- Returns
pandas.DataFrame: The DataFrame with the encoded column
sklearn.preprocessing.LabelEncoder: The label encoder utilized
- Return type
tuple
geocoding.features_utilities.
filter
(values)[source]¶Filters values by replacing values greater than config.distance_thr with config.distance_thr.
- Parameters
values (list) – Contains distances created by various features
- Returns
Contains the filtered distances
- Return type
list
geocoding.features_utilities.
filter2
(values)[source]¶Filters values by replacing values greater than config.distance_thr with config.distance_thr.
- Parameters
values (list) – Contains distances created by various features
- Returns
Contains the filtered distances
- Return type
list
geocoding.features_utilities.
get_points
(df)[source]¶Builds an array of all points appearing in df. This array will have a shape of (len(df) * number_of_services, 2).
- Parameters
df (pandas.DataFrame) – Contains the data points
- Returns
numpy.ndarray
geocoding.features_utilities.
get_required_external_files
(df, path, features=None)[source]¶Checks if external files are required and if so, downloads them using the Overpass API.
- Parameters
df (pandas.DataFrame) – Contains points in order to define the area to query with Overpass API
path (str) – Path to save the downloaded elements
features (list, optional) – Contains the names of the included features
- Returns
None
geocoding.features_utilities.
load_points_df
(points_fpath)[source]¶Loads points in points_fpath into a pandas.DataFrame and project their geometries.
- Parameters
points_fpath (str) – Path to file containing the points
- Returns
pandas.DataFrame
geocoding.features_utilities.
load_street_gdf
(street_fpath)[source]¶Loads streets in street_fpath into a geopandas.GeoDataFrame and project their geometries.
- Parameters
street_fpath (str) – Path to file containing the streets
- Returns
geopandas.GeoDataFrame
geocoding.features_utilities.
normalize_features
(X, scaler=None)[source]¶Normalize features to [0, 1].
- Parameters
X (numpy.ndarray) – Features array to be normalized
scaler (sklearn.preprocessing.MinMaxScaler, optional) – Scaler to be utilized
- Returns
numpy.ndarray: The normalized features array
sklearn.preprocessing.MinMaxScaler: The scaler utilized
- Return type
tuple
geocoding.features_utilities.
prepare_feats_args
(df, required_args, path)[source]¶Prepares required arguments during features extraction.
- Parameters
df (pandas.DataFrame) – Contains the points for which features will be created
required_args (set) – Contains the names of the required args
path (str) – Path to read from
- Returns
Containing arguments names as keys and their corresponding structures as values
- Return type
dict