Plotting OpenStreetMap relations does not generate continous lines

Plotting OpenStreetMap relations does not generate continous lines - python

All,
I have been working on an index of all MTB trails worldwide. I'm a Python person so for all steps involved I try to use Python modules.
I was able to grab relations from the OSM overpass API like this:
from OSMPythonTools.overpass import Overpass
overpass = Overpass()
def fetch_relation_coords(relation):
rel = overpass.query('rel(%s); (._;>;); out;' % relation)
return rel
rel = fetch_relation_coords("6750628")
I'm choosing this particular relation (6750628) because it is one of several that is resulting in discontinuous (or otherwise erroneous) plots.
I process the "rel" object to get a pandas.DataFrame like this:
elements = pd.DataFrame(rel.toJSON()['elements'])
"elements" looks like this:
The Elements pandas.DataFrame contains rows of the types "relation" (1 in this case), several of the type "way" and many of the type "node". It was my understanding that I would use the "relation" row, "members" column to extract the order of the ways (which point to the nodes), and use that order to make a list of the latitudes and longitudes of the nodes (for later use in leaflet), in the correct order, that is, the order that leads to continuous path on a map.
However, that is not the case. For this particular relation, I end up with the following plot:
If we compare that with the way the relation is displayed on openstreetmap.org itself, we see that it goes wrong (focus on the middle, eastern part of the trail). I have many examples of this happening, although there are also a lot of relations that do display correctly.
So I was wondering, what am I missing? Are there nodes with tags that need to be ignored? I already tried several things, including leaving out nodes with any tags, this does not help. Somewhere my processing is wrong but I don't understand where.

You need to sort the ways inside the relation yourself. Only a few relation types require sorted members, for example some route relations such as route=bus and route=tram. Others may have sorted members, such as route=hiking, route=bicycle etc., but they don't require them. Various other relations, such as boundary relations (type=boundary), usually don't have sorted members.
I'm pretty sure there are already various tools for sorting relation members, obviously this includes the openstreetmap.org website where this relation is shown correctly. Unfortunately I'm not able to point you to these tools but I guess a little bit research will reveal others.

If I opt to just plot the different way on top of each other, I indeed get a continuous plot (index contains the indexes for all nodes per way):
In the Database I would have preferred to have the nodes sorted anyway because I could use them to make a GPX file on the fly. But I guess I did answer my own question with this approach, thank you #scai for tipping me into this direction.

You could have a look at shapely.ops.linemerge, which seems to be smart enough to chain multiple linestrings even if the directions are inconsistent. For example (adapted from here):
from shapely import geometry, ops
line_a = geometry.LineString([[0,0], [1,1]])
line_b = geometry.LineString([[1,0], [2,5], [1,1]]) # <- switch direction
line_c = geometry.LineString([[1,0], [2,0]])
multi_line = geometry.MultiLineString([line_a, line_b, line_c])
merged_line = ops.linemerge(multi_line)
print(merged_line)
# output:
LINESTRING (0 0, 1 1, 2 5, 1 0, 2 0)
Then you just need to make sure that the endpoints match exactly.

Related

How to select specific dictionaries (with no key names) from a list?

I am currently working with road networks and followed this post that explained exactly what I was looking for.
I, however, am facing a problem. I tried running the script on a small part of my domain and came across the error shown bellow.
NetworkXError: Edge tuple (35,) must be a 2-tuple or 3-tuple.
After checking every variable I was working with, I found out that the problems is from the l = [set(x) for x in G.edges()]
I basically have a list of dictionaries that contain two set of coordinates (which represent the start and the end points of each roads of my domain) but some of the roads only have one set of coordinates in the dictionary.
Here's a simplified version of what I get when I print(l):
[{(2.455183, 48.7774425), (2.4551873, 48.7776523)},
{(2.4574735, 48.7736999), (2.4577528, 48.7738954)},
{(2.4574735, 48.7736999)},
{(2.4577528, 48.7738954), (2.4578287, 48.7723847)},
{(2.4585674, 48.7823935), (2.4586793, 48.7825114)}]
What I would like to know is if there is a way to select the dictionaries that only have one set of coordinates and either duplicate it inside the dictionary (so instead of having {(2.4574735, 48.7736999)}, I would have {(2.4574735, 48.7736999), (2.4574735, 48.7736999)}) or delete them if the previous proposition is not doable.
I found several ways of selecting dictionaries in a list based on their key names but I don't have any keys in mine so nothing I tried worked so far.
Any help would be greatly appreciated.

These are not dictionaries. These are sets. And your first requirement is not possible, as sets can not contain duplicate entries. So you can just check the lengths of the sets, and skip the ones that are less than length 2. So something like this:
l = [set(x) for x in G.edges() if len(set(x))>1]

Reverse selection in ArcPy?

I am trying to find out the counties that don't contain any stores in ArcGIS using python.
I have a point layer (representing the stores) and a polygon layer (counties). I have managed to write some code to find out the counties that DO contain the stores. The code is below.
import arcpy
arcpy.env.overwriteOutput = True
path="C:/Users/XARDAS/Documents/ArcGIS/Packages/Romania1000k_9E5B7FEC-6005-4D3A-81EA-E95FAACEF69E/v101/ro1mil.gdb"
arcpy.MakeFeatureLayer_management(path+"/Counties", "Counties_lyr")
arcpy.MakeFeatureLayer_management(path+"/Stores", "Stores_lyr")
arcpy.SelectLayerByAttribute_management("Stores_lyr", "NEW_SELECTION","Type=1")
arcpy.SelectLayerByLocation_management("Counties_lyr","INTERSECT","Stores_lyr",0,"NEW_SELECTION")
So this gives me the counties that have stores but I would like to somehow inverse the intersection for the program to give me the ones that don't have any stores. I have thought about just deleting the selected counties but I don't think that it would be too nice.

Since you have selected everything that you don't want selected, inverting (or switching) the selection will give you what you want. (ref help page)
Add this line at the end:
arcpy.SelectLayerByAttribute_management("Counties_lyr", "SWITCH_SELECTION")

The answer above works but SelectLayerByLocation allows you to invert the selection within the function:
arcpy.SelectLayerByLocation_management("Counties_lyr","INTERSECT","Stores_lyr",0,"NEW_SELECTION","INVERT")

How to use distancematrix function from Biopython?

I would like to calculate the distance matrix (using genetic distance function) on a data set using http://biopython.org/DIST/docs/api/Bio.Cluster.Record-class.html#distancematrix, but I seem to keep getting errors, typically telling me the rank is not of 2. I'm not actually sure what it wants as an input since the documentation never says and there are no examples online.
Say I read in some aligned gene sequences:
SingleLetterAlphabet() alignment with 7 rows and 52 columns
AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIRL...SKA COATB_BPIKE/30-81
AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIKL...SRA Q9T0Q8_BPIKE/1-52
DGTSTATSYATEAMNSLKTQATDLIDQTWPVVTSVAVAGLAIRL...SKA COATB_BPI22/32-83
AEGDDP---AKAAFNSLQASATEYIGYAWAMVVVIVGATIGIKL...SKA COATB_BPM13/24-72
AEGDDP---AKAAFDSLQASATEYIGYAWAMVVVIVGATIGIKL...SKA COATB_BPZJ2/1-49
AEGDDP---AKAAFDSLQASATEYIGYAWAMVVVIVGATIGIKL...SKA Q9T0Q9_BPFD/1-49
FAADDATSQAKAAFDSLTAQATEMSGYAWALVVLVVGATVGIKL...SRA COATB_BPIF1/22-73
which would be done by
data = Align.read("dataset.fasta","fasta")
But the distance matrix in Cluster.Record class does not accept this. How can I get it to work! ie
dist_mtx = distancematrix(data)

The short answer: You don't.
From the documentation:
A Record stores the gene expression data and related information
The Cluster object is used for gene expression data and not for MSA.
I would recommend using an external tool like MSARC which runs in Python as well.

Implementing Disjoint Set Data Structure in Python

I'm working on a small project involving cluster, and I think the code given here https://www.ics.uci.edu/~eppstein/PADS/UnionFind.py might be a good starting point for my work. However, I have come across a few difficulties implementing it to my work:
If I make a set containing all my clusters cluster=set([0,1,2,3,4,...,99]) (there are 100 points with the numbers labelling them), then I would like to to group the numbers into cluster, do I simply write cluster=UnionFind()? Now what is the data type of cluster?
How can I perform the usual operations for set on cluster? For instance, I would like to read all the points (which may have been grouped together) in cluster, but type print cluster results in <main.UnionFind instance at 0x00000000082F6408>. I would also like to keep adding new elements to cluster, how do I do it? Do I need to write the specific methods for UnionFind()?
How do I know all the members of a group with one of its member is called? For instance, 0,1,3,4 are grouped together, then if I call 3, I want it to print 0,1,3,4, how do I do this?
thanks

Here's a small sample code on how to use the provided UnionFind class.
Initialization
The only way to create a set using the provided class is to FIND it, because it creates a set for a point only when it doesn't find it. You might want to create an initialization method instead.
union_find = UnionFind()
clusters = set([0,1,2,3,4])
for i in clusters:
union_find[i]
Union
# Merge clusters 0 and 1
union_find.union(0, 1)
# Add point 2 to the same set
union_find.union(0, 2)
Find
# Get the set for clusters 0 and 1
print union_find[0]
print union_find[1]
Getting all Clusters
# print all clusters and their sets
for cluster in union_find:
print cluster, union_find[cluster]
Note:
There is no direct way that gets you all the points given a cluster number. You can loop over all the points and pick the ones that have the required cluster number. You might want to modify the given class to support that operation more efficiently.

How to Sort Arrays in Dictionary?

I'm currently writing a program in Python to track statistics on video games. An example of the dictionary I'm using to track the scores :
ten = 1
sec = 9
fir = 10
thi5 = 6
sec5 = 8
games = {
'adom': [ten+fir+sec+sec5, "Ancient Domain of Mysteries"],
'nethack': [fir+fir+fir+sec+thi5, "Nethack"]
}
Right now, I'm going about this the hard way, and making a big long list of nested ifs, but I don't think that's the proper way to go about it. I was trying to figure out a way to sort the dictionary, via the arrays, and then, finding a way to display the first ten that pop up... instead of having to work deep in the if statements.
So... basically, my question is : Do you have any ideas that I could use to about making this easier, instead of wayyyy, way harder?
===== EDIT ====
the ten+fir produces numbers. I want to find a way to go about sorting the lists (I lack the knowledge of proper terminology) to go by the number (basically, whichever ones have the highest number in the first part of the array go first.
Here's an example of my current way of going about it (though, it's incomplete, due to it being very tiresome : Example Nests (paste2) (let's try this one?)
==== SECOND EDIT ====
In case someone doesn't see my comment below :
ten, fir, et cetera - these are just variables for scores. Basically, it goes from a top ten list into a variable number.
ten = 1, nin = 2, fir = 10, fir5 = 10, sec5 = 8, sec = 9...
so : 'adom': [ten+fir+sec+sec5, "Ancient Domain of Mysteries"] actually registers as : 'adom': [1+10+9+8, "Ancient Domain of Mysteries"] , which ends up looking like :
'adom': [28, "Ancient Domain of Mysteries"]
So, basically, if I ended up doing the "top two" out of my example, it'd be :
((1)) Nethack (48)
((2)) ADOM (28)
I'd write an actual number, but I'm thinking of changing a few things up, so the numbers might be a touch different, and I wouldn't want to rewrite it.
== THIRD (AND HOPEFULLY THE FINAL) EDIT ==
Fixed my original code example.

How about something like this:
scores = games.items()
scores.sort(key = lambda key, value: value[0])
return scores[:10]
This will return the first 10 items, sorted by the first item in the array.
I'm not sure if this is what you want though, please update the question (and fix the example link) if you need something else...

import heapq
return heapq.nlargest(10, games.iteritems(), key=lambda k, v: v[0])
is the most direct way to get the top ten key / value pairs, sorted by the first item of each "value" list. If you can define more precisely what output you want (just the names, the name / value pairs, or what else?) and the sorting criterion, this is easy to adjust, of course.

Wim's solution is good, but I'd say that you should probably go the extra mile and push this work off onto a database, rather than relying on Python. Python interfaces well with most types of databases, where much of what you're exploring is already a solved problem.
For example, instead of worrying about shifting your dictionaries to various other data types in order to properly sort them, you can simply get all the data for each pertinent entry pre-sorted based on the criteria of your query. There goes the need for convoluted sorting and resorting right there.
While dictionaries are tempting to use, because they give the illusion of database-like abilities to access data based on its attributes, I still think they stumble quite a bit with respect to implementation. I don't really have any numbers to throw at you, but just from personal experience, anything you do on Python when it comes to manipulating large amounts of data, you can do much faster and more efficient both in code and computation with something like MySQL.
I'm not sure what you have planned as far as the structure of your data goes, but along with adding data, changing its structure is a lot easier using a database, too.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plotting OpenStreetMap relations does not generate continous lines - python

Related

How to select specific dictionaries (with no key names) from a list?

Reverse selection in ArcPy?

How to use distancematrix function from Biopython?

Implementing Disjoint Set Data Structure in Python

How to Sort Arrays in Dictionary?

Categories

Resources