features

geocoding.features.get_centroid_coords_distances(df)[source]

Creates a features array. For each address (each row), calculate the distances between the corresponding centroid coords and the coords suggested from different services.

Parameters

df (pandas.DataFrame) – Contains data points for which the features will be created

Returns

The features array of shape (n_samples, n_features), here (len(df), number_of_services * 2)

Return type

numpy.ndarray

geocoding.features.get_centroid_points_distances(df)[source]

Creates a features array. For each address (each row), calculate the distances between the corresponding centroid and the points suggested from different services.

Parameters

df (pandas.DataFrame) – Contains data points for which the features will be created

Returns

The features array of shape (n_samples, n_features), here (len(df), number_of_services)

Return type

numpy.ndarray

geocoding.features.get_common_nearest_street_distance(df, street_gdf, k=3)[source]

Creates a features array. For each address (each row) and for each service, calculate the distance to the nearest street that is common to all geocoding sources.

Parameters
  • df (pandas.DataFrame) – Contains data points for which the features will be created

  • street_gdf (geopandas.GeoDataFrame) – Contains all streets extracted from OSM, along with their geometries

  • k (int) – The number of closest streets to fetch per geocoding source.

Returns

The features array of shape (n_samples, n_features), here (len(df), number_of_services)

Return type

numpy.ndarray

geocoding.features.get_intersects_on_common_nearest_street(df, street_gdf, k=3)[source]

Creates a features array. For each address (each row) and for each service, identify the nearest street that is common to all geocoding sources and return True if it intersects or touches it or False otherwise.

Parameters
  • df (pandas.DataFrame) – Contains data points for which the features will be created

  • street_gdf (geopandas.GeoDataFrame) – Contains all streets extracted from OSM, along with their geometries

  • k (int) – The number of closest streets to fetch per geocoding source.

Returns

The features array of shape (n_samples, n_features), here (len(df), number_of_services)

Return type

numpy.ndarray

geocoding.features.get_mean_centroids_coords_distances(df)[source]

Creates a features array. For each address (each row), calculate the mean distances between the corresponding centroid coords and the coords suggested from different services.

Parameters

df (pandas.DataFrame) – Contains data points for which the features will be created

Returns

The features array of shape (n_samples, n_features), here (len(df), 2)

Return type

numpy.ndarray

geocoding.features.get_mean_centroids_points_distances(df)[source]

Creates a features array. For each address (each row), calculate the mean distance between the corresponding centroid and the points suggested from different services.

Parameters

df (pandas.DataFrame) – Contains data points for which the features will be created

Returns

The features array of shape (n_samples, n_features), here (len(df), 1)

Return type

numpy.ndarray

geocoding.features.get_nearest_street_distance_by_centroid(df, street_gdf)[source]

Creates a features array. For each address (each row), the nearest street to the corresponding centroid is identified at first. Then, distances between this street and points suggested from different services are calculated.

Parameters
  • df (pandas.DataFrame) – Contains data points for which the features will be created

  • street_gdf (geopandas.GeoDataFrame) – Contains all streets extracted from OSM, along with their geometries

Returns

The features array of shape (n_samples, n_features), here (len(df), number_of_services)

Return type

numpy.ndarray

geocoding.features.get_nearest_street_distance_per_service(df, street_gdf)[source]

Creates a features array. For each address (each row) and for each service, calculate the distance to the nearest street.

Parameters
  • df (pandas.DataFrame) – Contains data points for which the features will be created

  • street_gdf (geopandas.GeoDataFrame) – Contains all streets extracted from OSM, along with their geometries

Returns

The features array of shape (n_samples, n_features), here (len(df), number_of_services)

Return type

numpy.ndarray

geocoding.features.get_normalized_coords(df)[source]

Creates a features array. Normalizes each longitude or latitude column, by subtracting the corresponding column’s mean value from it.

Parameters

df (pandas.DataFrame) – Contains data points for which the features will be created

Returns

The features array of shape (n_samples, n_features), here (len(df), number_of_services * 2)

Return type

numpy.ndarray

geocoding.features.get_pairwise_coords_distances(df)[source]

Creates a features array. For each address (each row), calculate the pairwise distances among coordinates suggested from different services.

Parameters

df (pandas.DataFrame) – Contains data points for which the features will be created

Returns

The features array of shape (n_samples, n_features), here (len(df), number_of_services * (number_of_services-1))

Return type

numpy.ndarray

geocoding.features.get_pairwise_points_distances(df)[source]

Creates a features array. For each address (each row), calculate the pairwise distances among points suggested from different services.

Parameters

df (pandas.DataFrame) – Contains data points for which the features will be created

Returns

The features array of shape (n_samples, n_features), here (len(df), (number_of_services * (number_of_services-1)) / 2)

Return type

numpy.ndarray

geocoding.features.get_points_area(df)[source]

Creates a features array. Calculate a polygon from the coordinates of all geocoding sources.

Parameters

df (pandas.DataFrame) – Contains data points for which the features will be created

Returns

The features array of shape (n_samples, n_features), here (len(df), 1)

Return type

numpy.ndarray

geocoding.features.get_polar_coords(df)[source]

Creates a features array. Transforms cartesian coordinates to polar coordinates.

Parameters

df (pandas.DataFrame) – Contains data points for which the features will be created

Returns

The features array of shape (n_samples, n_features), here (len(df), number_of_services * 2)

Return type

numpy.ndarray

geocoding.features.get_zip_codes(df)[source]

Creates a features array. For each address (each row), the first 2 digits of its zip code are extracted. Then for each row r, the array will contain 1 (True) in column c, if c represents the 2 digits that r’s zip code starts with.

Parameters

df (pandas.DataFrame) – Contains data points for which the features will be created

Returns

The features array of shape (n_samples, n_features), here (len(df), 76). This is due to the fact that there are 76 such valid combinations in Greece

Return type

numpy.ndarray

Return Home