Edge classification problem for dummies

ifitch · March 3, 2020, 11:09am

I have a task of classifying spatial data from a geographic information system. More precisely, I need a way to filter out unnecessary line segments from the CAD system before loading into the GIS (see the attached picture, colors for illustrative purposes only).

The problem is that there are much more variations of objects than in the picture. The task is difficult to solve in an algorithmic way.

I tried to apply a bunch of classification algorithms from the Scikit-learn package and, in general, got significant results. GradientBoostingClassifier and ExtraTreeClassifier achive an accuracy about 96-98%, but:

this accuracy is achieved in the context of individual segments into which I explode the source objects (hundreds of thousands of objects) After reverse aggregation of objects it may turn out that one of their segments in each object is classified incorrectly. The error in the context of the source objects is high
significant computational resources and time are required for the preparation of the source data, and calculation of features for classifiers
it is impossible to use the source 2d coordinates of objects in algorithms, but only their derivatives

I tried to find good examples of using deep neural networks for this kind of tasks / for spatial data, but it seems that this direction is just starting to develop.

From what I learned, I realized that it would be a good idea to transform my spatial data into a graph and use graph neural networks to classify edge types (significant line / unnecessary line).

I converted data to a graph. As features of the nodes, I assigned the values of X and Y of the vertices of the geometry, and assigned to the edges the values of the length of the edge and its azimuth.

However, it turned out that if classification algorithms for edges exist, it is extremely difficult to find their implementation (I did not succeed). All the graph neural networks that I found are focused on the classification of nodes, whole graph and the prediction of the presence of edges, but not their classification.

Converting a graph into an line graph with the subsequent classification of nodes seems to be a possible solution, but it seems that when converting to an edge graph, the features of nodes and edges that are important for classification are lost (line_graph() function from the networkX python package).

I would like to get some guidance on solving my problem using DGL. Which algorithms I need to pay attention to, what examples to look at, or maybe there are simpler ways to solve my problem.

Also, please note that in general, I am new to deep learning and a simpler explanation may be required.

kristianeschenburg · June 24, 2021, 10:55pm

Hi @ifitch

I’m not sure if this would be possible for you, but have you tried converting your graph G_{o} = (V_{o},E_{o}) into a new one where the edges in the old graph become nodes in the new graph G_{n} = (V_{n}, E_{n})?

You could try defining G_{n} such that if two edges e_{o, i}, e_{o, j} \in E_{o} are incident on the same node v_{o, i} \in V_{o}, then they share an edge e_{n, k} \in E_{n}. You could compute some sort of feature interpolation to create node features in this new graph.

Some food for thought.

k