python - mpltPath.Path(polygon).contains_points(points) - python

I have a list of polygon's corners points (polygon) and a list of x,y points (points).
when I do mpltPath.Path(polygon).contains_points(points) I get a list of True/False on the index if each point in the points list.
my problem is that I don't know how to directly extract these specific points in a fast way.
I tried too do for loop and get all the indexes of the "True" but the problem is that it takes too long (its a list of 100M points) and I wondered if there is a faster way to get it directly from mpltPath package.
here is what I tried to get the indexes but it took too long:
list(locate(mpltPath.Path(polygon).contains_points(points), lambda x: x == 'True'))

Related

Python: map two arrays with similar values

I have two arrays, lets say A and B. I want to find all elements in B that are within a certain value of all elements in A.
To be exact, I am working with database of galaxies and I want to compare simulated galaxies (S) with observed (O).
I have 2 arrays: brightness (Br) and distance (D) for both simulated and observations. (so 4 arrays in total)
for each simulated galaxy, i want to find a set of galaxies from the observed list, that has similar brightness and distance.
I have already made a code that could work but my program says it will take around 9-10 hours to run the code. Is there any other way I can efficiently change the code to speed up the process:
Here is my code
list1= flux_hb
list2= flux_inp_ha
flux_MC=[[]]*len(list1) # map flux from MAMBO to COSMOS (mambo--> (list of cosmos galaxies))
noise=[[]]*len(list1)
z_MC=[[]]*len(list1)
index=np.zeros(len(list1))
flux_ha_temp=np.zeros(len(list1))
for i in tqdm(range(len(list1))):
temp_flux_diff = abs(list1[i] - list2)/list1[i] #Matrix subtraction
temp_z_diff = abs(z_geo[i] - zinp)/z_geo[i]
for j in range(len(temp_flux_diff)):
if (temp_flux_diff[j]<0.01 and temp_z_diff[j]<0.01):
flux_MC[i].append(list2[j])
noise[i].append(snr_ha[j])
z_MC[i].append(zinp[j])
My approach is to make an empty list to save the information I need.
For each element in list1, find the difference in values of list1 from list2 (temp_flux_diff)
Now for each element in list2, if any of this difference is smaller than required difference, i call them similar values and save them in the empty list.
Hopefully, I was able to explain what I need and how I did it. I just want to speed up the code.
Update 1 As per the suggestion in comments by #СергейКох, i changed my second loop to:
index = np.where(np.logical_and(temp_flux_diff<0.01, temp_z_diff<0.01))[0]
flux_MC[i].append(flux_inp_ha[index])
noise[i].append(snr_ha[index])
z_MC[i].append(zinp[index])
and my code now takes around 1 hour instead of 9 hours.

Getting Keys Within Range/Finding Nearest Neighbor From Dictionary Keys Stored As Tuples

I have a dictionary which has coordinates as keys. They are by default in 3 dimensions, like dictionary[(x,y,z)]=values, but may be in any dimension, so the code can't be hard coded for 3.
I need to find if there are other values within a certain radius of a new coordinate, and I ideally need to do it without having to import any plugins such as numpy.
My initial thought was to split the input into a cube and check no points match, but obviously that is limited to integer coordinates, and would grow exponentially slower (radius of 5 would require 729x the processing), and with my initial code taking at least a minute for relatively small values, I can't really afford this.
I heard finding the nearest neighbor may be the best way, and ideally, cutting down the keys used to a range of +- a certain amount would be good, but I don't know how you'd do that when there's more the one point being used.Here's how I'd do it with my current knowledge:
dimensions = 3
minimumDistance = 0.9
#example dictionary + input
dictionary[(0,0,0)]=[]
dictionary[(0,0,1)]=[]
keyToAdd = [0,1,1]
closestMatch = 2**1000
tooClose = False
for keys in dictionary:
#calculate distance to new point
originalCoordinates = str(split( dictionary[keys], "," ) ).replace("(","").replace(")","")
for i in range(dimensions):
distanceToPoint = #do pythagors with originalCoordinates and keyToAdd
#if you want the overall closest match
if distanceToPoint < closestMatch:
closestMatch = distanceToPoint
#if you want to just check it's not within that radius
if distanceToPoint < minimumDistance:
tooClose = True
break
However, performing calculations this way may still run very slow (it must do this to millions of values). I've searched the problem, but most people seem to have simpler sets of data to do this to. If anyone can offer any tips I'd be grateful.
You say you need to determine IF there are any keys within a given radius of a particular point. Thus, you only need to scan the keys, computing the distance of each to the point until you find one within the specified radius. (And if you do comparisons to the square of the radius, you can avoid the square roots needed for the actual distance.)
One optimization would be to sort the keys based on their "Manhattan distance" from the point (that is, add the component offsets), since the Euclidean distance will never be less than this. This would avoid some of the more expensive calculations (though I don't think you need and trigonometry).
If, as you suggest later in the question, you need to handle multiple points, you can obviously process each individually, or you could find the center of those points and sort based on that.

Mapping arrays with same values but different orders

I have two arrays of coordinates from two separate files from a CFD calculation. One is a mesh file which contains the connectivity information and the other is the results file.
My problem is that the coordinates from each file are not in the same order. What I would like to be able to do is order ALL the arrays from the results file to be in the same order as the mesh file.
My idea would be to find the matching values of xyz coordinates and create a mapping such that the rest of the result arrays can be ordered.
I was thinking something like:
mapping = np.empty(len(co_mesh))
for i,coord in enumerate(co_mesh):
for j in range(len(co_res)):
if (coord[0]==co_res[j,0]) and (coord[1]==co_res[j,1]) and (coord[2]==co_res[j,2]):
mapping[i] = j
where co_mesh, co_res are arrays containing the x,y,z coords.
The problem is that I suspect this loop will take a long time. At the moment I'm only looping over around 70000 points but in future this could increase to 1 million or more.
Is there a faster way to write this in Python.
I'm using Python 2.6.5.
Ben
For those who are interested this is what I am currently using:
mesh_coords = zip(xm_list,ym_list,zm_list,range(len(x_po)))
res_coords = zip(xr_list,yr_list,zr_list,range(len(x)))
mesh_coords = sorted(mesh_coords , key = lambda x:(x[0],x[1],x[2]))
res_coords = sorted(res_coords , key = lambda x:(x[0],x[1],x[2]))
mapping = zip(np.array(listym)[:,-1],np.array(listyr)[:,-1])
mapping = sorted(mapping , key = lambda x:(x[0]))
How about sorting coordinate vectors in both files along x than y and than least z coordinate?
You can do this efficient and fast if you use numpy arrays for vectors.
Update:
If you don't have the node ids of the nodes in the result mesh. But the coordinates are the same. Do the following:
Add a numbering as an additional information to your vectors. Sort both mesh by x,y,z add the now unsorted numbering of your mesh to your comesh and sort the comesh along that axis. Now the comesh contains the exact order as the original mesh.

Most Efficient Way to Automate Grouping of List Entries

Background:I have a very large list of 3D cartesian coordinates, I need to process this list to group the coordinates by their Z coordinate (ie all coordinates in that plane). Currently, I manually create groups from the list using a loop for each Z coordinate, but if there are now dozens of possible Z (was previously handling only 2-3 planes)coordinates this becomes impractical. I know how to group lists based on like elements of course, but I am looking for a method to automate this process for n possible values of Z.Question:What's the most efficient way to automate the process of grouping list elements of the same Z coordinate and then create a unique list for each plane?
Code Snippet:
I'm just using a simple list comprehension to group individual planes:
newlist=[x for x in coordinates_xyz if insert_possible_Z in x]
I'm looking for it to automatically make a new unique list for every Z plane in the data set.
Data Format:
((x1,y1,0), (x2, y2, 0), ... (xn, yn, 0), (xn+1,yn+1, 50),(xn+2,yn+2, 50), ... (x2n+1,y2n+1, 100), (x2n+2,y2n+2, 100)...)etc. I want to automatically get all coordinates where Z=0, Z=50, Z=100 etc. Note that the value of Z (increments of 50) is an example only, the actual data can have any value.Notes:My data is imported either from a file or generated by a separate module in lists. This is necessary for interface with another program (that I have not written).
The most efficient way to group elements by Z and make a list of them so grouped is to not make a list.
itertools.groupby does the grouping you want without the overhead of creating new lists.
Python generators take a little getting used to when you aren't familiar with the general mechanism. The official generator documentation is a good starting point for learning why they are useful.
If I am interpreting this correctly, you have a set of coordinates C = (X,Y,Z) with a discrete number of Z values. If this is the case, why not use a dictionary to associate a list of the coordinates with the associated Z value as a key?
You're data structure would look something like:
z_ordered = {}
z_ordered[3] = [(x1,y1,z1),(x2,y2,z2),(x3,y3,z3)]
Where each list associated with a key has the same Z-value.
Of course, if your Z-values are continuous, you may need to modify this, say by making the key only the whole number associated with a Z-value, so you are binning in increments of 1.
So this is the simple solution I came up with:
groups=[]
groups[:]=[]
No_Planes=#Number of planes
dz=#Z spacing variable here
for i in range(No_Planes):
newlist=[x for x in coordinates_xyz if i*dz in x]
groups.append(newlist)
This lets me manipulate any plane within my data set simply with groups[i]. I can also manipulate my spacing. This is also an extension of my existing code, as I realised after reading #msw's response about itertools, looping through my current method was staring me in the face, and far more simple than I imagined!

GPS co-ordinate search --R-trees

I have a list of lists in the form of
[ [ x1,.....,x8],[x1,.......,x8],...............,[x1,.....x[8]] ] . The number of lists in that list can go upto a million. Each list has 4 gps co-ordinates which show the four points of a rectangle ( assumed that each segment is in the form of a rectangle].
Problem : Given a new point, I need to determine which segment the point falls on and create a new one if it falls in none of them. I am not uploading the data into MySQL as of now, it comes in as a simple text file. I find out the co-ordinates from the text file for any given car.
What I tried : I am thinking of using R-trees to find all points which are near to the given point . ( Near== 200 meters maximum) . But even in R-trees, there seem to be too many options . R,R*,Hilbert.
Q1. Which one should be opted for ?
Q2. Is there a better option than R-trees?Can something be done by searching faster within the list ?
Thanks a lot.
[ {a1:[........]},{a2:[.......]},{a3:[.........]},.... ,{a20:[.....]}] .
Isn't the problem "find whether a given point falls within a certain rectangle in 2D space"?
That could be separated dimensionally, couldn't it? Give each rectangle an ID, then separate into lists of one-dimensional ranges ((id, x0, x1), (id, y0, y1)) and find all the ranges in both dimensions the point falls in. (I'm fairly sure there are very efficient algorithms for this. Heck, you could even leverage, say, sqlite already.) Then just intersect the ID sets you get and you should find all rectangles the point falls in, if any. (Of course you can exit early if either of the single dimensional queries returns no result.)
Not sure if this'd be faster or smarter than R-trees or other spatial indexes though. Hope this helps anyway.
import random as ra
# my _data will hold tuples of gps readings
# under the key of (row,col), knowing that the size of
# the row and col is 10, it will give an overall grid coverage.
# Another dict could translate row/col coordinates into some
# more usefull region names
my_data = {}
def get_region(x,y,region_size=10):
"""Build a tuple of row/col based on
the values provided and region square dimension.
It's for demonstration only and it uses rather naive calculation as
coordinate / grid cell size"""
row = int(x / region_size)
col = int(y / region_size)
return (row,col)
#make some examples and build my_data
for loop in range(10000):
#simule some readings
x = ra.choice(range(100))
y = ra.choice(range(100))
my_coord = get_region(x,y)
if my_data.get(my_coord):
my_data[my_coord].append((x,y))
else:
my_data[my_coord]= [(x,y),]
print my_data

Categories

Resources