Welcome to LGM-PolygonClassification’s documentation!¶
LGM-PolygonClassification¶
A python library for effectively classifying land parcel polygons with respect to their provenance information.
About LGM-PolygonClassification¶
LGM-PolygonClassification is a python library that implements a full Machine Learning workflow for training classification algorithms on annotated datasets that contain pairs of matched polygons each one of which belongs to a distinct polygon variant. LGM-PolygonClassification implements a series of training features by taking into consideration the individual characteristics of each polygon as well as including information about the geospatial relationship between each matched pair. Further, it encapsulates grid-search and cross-validation functionality, based on the scikit-learn toolkit, assessing as series of classification models and parameterizations, in order to find the most fitting model for the data at hand. Indicatively, we succeed a 98.44% accuracy with the Gradient Boosting Trees classifier (see References).
The source code was tested using Python 3 (>=3.6) and Scikit-Learn 0.22.1 on a Linux server.
Dependencies¶
python>=3.6
click==7.1.1
fiona==1.8.13.post1
geopandas==0.7.0
numpy==1.18.1
pandas==1.0.2
scikit-learn==0.22.1
scipy==1.4.1
shapely==1.7.0
tabulate==0.8.6
xgboost==1.0.2
Setup procedure¶
Download the latest version from the GitHub repository, change to the main directory and run:
pip install -r pip_requirements.txt
It should install all required dependencies automatically.
Usage¶
The input dataset need to be in CSV format. Specifically, a valid dataset should have at least the following fields/columns:
The geometry of the initial, land allocated polygon.
The geometry of final polygon.
The ORI_TYPE label, e.g., {1, 4}, that denotes the dominant provenance of final polygon, i.e., land parcel.
The library implements the following distinct processes:
- Features extraction
The build function constructs a set of training features to use within classifiers for toponym interlinking.
- Algorithm and model selection
The functionality of the fineTuneClassifiers function is twofold. Firstly, it chooses among a list of supported machine learning algorithms the one that achieves the highest average accuracy score on the examined dataset. Secondly, it searches for the best model, i.e., the best hyper-parameters for the best identified algorithm in the first step.
- Model training
The trainClassifier trains the best selected model on previous process, i.e., an ML algorithm with tuned hyperparameters that best fits data, on the whole train dataset, without splitting it in folds.
- Model deployment
The testClassifier applies the trained model on new untested data.
A complete pipeline of the above processes, i.e., features extraction, training and evaluating state-of-the-art classifiers, for polygon classification, i.e., provenance recommendation of a land parcel, can be executed with the following command:
$ python -m interlinking.cli run --train_dataset <path/to/train-dataset>
--test_dataset <path/to/test-dataset>
Additionally, help is available on the command line interface (CLI). Enter the following to list all supported commands or options for a given command with a short description.
$ python -m polygon_classification.cli –h
Usage: cli.py [OPTIONS] COMMAND [ARGS]...
Options:
-h, --help Show this message and exit.
Commands:
evaluate evaluate the effectiveness of the proposed methods
run A complete process of distinct steps in figuring out the best ML algorithm with optimal hyperparameters...
train tune various classifiers and select the best hyper-parameters on a train dataset
Documentation¶
Source code documentation is available from linkgeoml.github.io.
References¶
Kaffes et al. Determining the provenance of land parcel polygons via machine learning. SSDBM ’20.