Why is my dict being overwritten in this loop in python?

Why is my dict being overwritten in this loop in python? - python

I have a dict, coords_dict, in a strange format. Which is currently being used to store a set of Cartesian coordinate points (x,y,z). The structure of the dict (which is unfortunately out of my control) is as follows.
The keys of the dict are a series of z values of a plane, and each entry consists of a single element list, which itself is a list of lists containing the coordinate points. For example, two elements in the dict can be specified as
coords_dict['3.5']=[[[1.62,2.22,3.50],[4.54,5.24,3.50]]]
coords_dict['5.0']=[[[0.33,6.74,5.00],[2.54,12.64,5.00]]]
So, I now want to apply some translational shift to all coordinate points in this dict by some shift vector [-1,-1,-1], i.e. I want all x, y, and z coordinates to be 1 less than they were before (rounded to 2 decimal places). And I want to assign the result of this translation to a new dictionary, coords_dict_translated, while also updating the dict keys to match the z locations of all points
My attempt at a solution is below
import numpy as np
shift_vector=[-1,-1,-1]
coords_dict_translated={}
for key,plane in coords_dict.items(): #iterate over dictionary, k are keys representing each plane
key=str(float(key)+shift_vector[2]) #the new key should match the z location
#print(key)
for point_index in range(0,len(plane[0])): #loop over points in this plane
plane[0][point_index]=list(np.around(np.array(plane[0][point_index])
+np.array(shift_vector),decimals=2)) #add shift vector to all points
coords_dict_translated[key]=plane
However, I notice that if I do this that that the original values of coords_dict are also changing. I want coords_dict to stay the same but return a completely new and entirely separate dict. I am not quite sure where the issue lies, I have tried using for key,plane in list(coords_dict.items()): as well but this did not work. Why does this loop change the values of the original dictionary?

when you are iterating over the dictionary in the for loop you are referencing the elements in your list/array:
for key,plane in coords_dict.items(): #iterate over dictionary, k are keys representing each plane
If you don't want to change the items, you should just make a copy of the variable you are using instead of setting plane directly:
import copy
key=str(float(key)+shift_vector[2]) #the new key should match the z location
#print(key)
c = copy.deepcopy(plane)
for point_index in range(0,len(plane[0])): #loop over points in this plane
c[0][point_index]=list(np.around(np.array(plane[0][point_index])
+np.array(shift_vector),decimals=2)) #add shift vector to all points
coords_dict_translated[key] = c

The most likely issue here is that you have a list that is being referenced from two different variables. This can happen even using .copy() when you have nested structure (as you do here).
If this is the problem, you can probably overcome it by using need to make sure you are making a (deep) copy of lists you want to update independently. copy.deepcopy will iteratively make copies of lists within lists etc. to avoid double references to lower-level lists.
(comment made into answer).

Related

Data Structure for fast insertion and random access in already sorted data

p = random_point(a,b)
#random_point() returns a tuple/named-tuple (x,y)
#0<x<a 0<y<b
if centers.validates(p):
centers.insert(p)
#centers is the data structure to store points
In the centers data structure all x and y coordinates are stored in two separate sorted(ascending) lists, one for x and other for y. Each node in x points to the corresponding y, and vice versa, so that they can be separately sorted and still hold the pair property: centers.get_x_of(y) and centers.get_y_of(x)
Properties that I require in data structure:
Fast Insertion, in already sorted data (preferably log n)
Random access
Sort x and y separately, without losing pair property
Initially I thought of using simple Lists, and using Binary search to get the index for inserting any new element. But I found, that, it can be improved using self balancing trees like AVL or B-trees. I could make two trees each for x and y, with each node having an additional pointer that could point from x-tree node to y-tree node.
But I don't know how to build random access functionality in these trees. The function centers.validate() tries to insert x & y, and runs some checks with the neighboring elements, which requires random access:
def validate(p):
indices = get_index(p)
#returns a named tuple of indices to insert x and y, Eg: (3,7)
condition1 = func(x_list[indices.x-1], p.x) and func(x_list[indices.x+1], p.x)
condition2 = func(y_list[indices.y-1], p.y) and func(y_list[indices.y+1], p.y)
#func is some mathematical condition on neighboring elements of x and y
return condition1 and condition2
In the above function I need to access neighboring elements of x & y
data structure. I think implementing this in trees would complicate it. Are there any combination of data structure that can achieve this? I am writing this in Python(if that can help)

Class with 2 dicts that hold the values with the keys being the key of the other dict that contains the related value to the value in this dict. It would need to maintain a list per dict for the current order to call elements of that dict in when calling it (your current sort of that dicts values). You would need a binary or other efficient sort to operate on each dict for insertion, though it would really be using the order list for that dict to find each midpoint key and then checking against value from that key.

Change next list element during iteration?

Imagine you have a list of points in the 2D-space. I am trying to find symmetric points.
For doing that I iterate over my list of points and apply symmetry operations. So suppose I apply one of these operations to the first point and after this operation it is equal to other point in the list. These 2 points are symmetric.
So what I want is to erase this other point from the list that I am iterating so in this way my iterating variable say "i" won't take this value. Because I already know that it is symmetric with the first point.
I have seen similar Posts but they remove a value in the list that they have already taken. What I want is to remove subsequent values.

Whatever symmetric points turn out to be True add them to a set, since set maintains unique elements and look up is O(1) you can use if point not in set condition.
if point not in s:
#test for symmetry
if symmetric:
s.add(point)

In general it is a bad idea to remove values from a list you are iterating over. There are, however, another ways to skip the symmetric points. For example, you can check for each point if you have seen a symmetric one before:
for i, point in enumerate(points):
if symmetric(point) not in points[:i]:
# Do whatever you want to do
Here symmetric produces a point according to your symmetry operation. If your symmetry operation connects more that two points you can do
for i, point in enumerate(points):
for sympoint in symmetric(point):
if sympoint in points[:i]:
break
else:
# Do whatever you want to do

Mapping arrays with same values but different orders

I have two arrays of coordinates from two separate files from a CFD calculation. One is a mesh file which contains the connectivity information and the other is the results file.
My problem is that the coordinates from each file are not in the same order. What I would like to be able to do is order ALL the arrays from the results file to be in the same order as the mesh file.
My idea would be to find the matching values of xyz coordinates and create a mapping such that the rest of the result arrays can be ordered.
I was thinking something like:
mapping = np.empty(len(co_mesh))
for i,coord in enumerate(co_mesh):
for j in range(len(co_res)):
if (coord[0]==co_res[j,0]) and (coord[1]==co_res[j,1]) and (coord[2]==co_res[j,2]):
mapping[i] = j
where co_mesh, co_res are arrays containing the x,y,z coords.
The problem is that I suspect this loop will take a long time. At the moment I'm only looping over around 70000 points but in future this could increase to 1 million or more.
Is there a faster way to write this in Python.
I'm using Python 2.6.5.
Ben
For those who are interested this is what I am currently using:
mesh_coords = zip(xm_list,ym_list,zm_list,range(len(x_po)))
res_coords = zip(xr_list,yr_list,zr_list,range(len(x)))
mesh_coords = sorted(mesh_coords , key = lambda x:(x[0],x[1],x[2]))
res_coords = sorted(res_coords , key = lambda x:(x[0],x[1],x[2]))
mapping = zip(np.array(listym)[:,-1],np.array(listyr)[:,-1])
mapping = sorted(mapping , key = lambda x:(x[0]))

How about sorting coordinate vectors in both files along x than y and than least z coordinate?
You can do this efficient and fast if you use numpy arrays for vectors.
Update:
If you don't have the node ids of the nodes in the result mesh. But the coordinates are the same. Do the following:
Add a numbering as an additional information to your vectors. Sort both mesh by x,y,z add the now unsorted numbering of your mesh to your comesh and sort the comesh along that axis. Now the comesh contains the exact order as the original mesh.

Using a dictionary to index parallel arrays?

I have 4 parallel arrays based on a table representing attributes of a map. Each array has approx. 500 values, but all have the same number of values.
The arrays are:
start = location of the endpoint with the smaller flow accumulation,
end = location of the other endpoint (with the larger flow accumulation),
length = segment length, and;
shape = actual shape, oriented to run from start to end.
I am attempting to create a data structure from which I can use a recursive function on to determine the start and end points every 2000m along the length.
The following question and answer describe what I am attempting to accomplish:
https://gis.stackexchange.com/questions/87649/select-points-approx-2000-metres-from-another-point-along-a-river
How do I store these 4 parallel arrays in a dictionary keyed by start?
I am new to writing functions, dictionaries and using arrays in dictionaries. I am attempting to do this task in Python.

I think this is what you mean:
d = {}
for i in range(len(start)):
d[start[i]] = (shape[i],length[i],end[i])
so now d[some_start_value] will hold the corresponding shape length and end values.

If you want to do things a little bit more Python-esque, you can use enumerate:
d = {}
for (i,st) in enumerate(start):
d[st] = (shape[i],length[i],end[i])
or even better - zip:
d = {}
for (st,sh,le,en) in zip(start,shape,length,end):
d[st] = (sh,le,en)
Note that you can leave out the parantheses around the first part of the for loops (i.e. between the for and in keywords). I used them solely for enhanced code readability.
As with WeaselFox's answer, d[some_start_value] will now hold the corresponding shape, length and end values.

In addition to the above answers, I would recommend using namedtuple to simplify accesses:
from collections import namedtuple
# This creates a namedtuple called GISData. Name of the object and name in the first argument
# should be the same.
GISData = namedtuple('GISData', 'start shape length end')
# zip creates 1 list of 4-tuples from 4 single lists
# There are other ways to write this; this is just the shortest for me.
# Note that if you need this ordered, you should use an OrderedDict,
# which is in the collections module in python 2.7+, or you can find
# backported versions for python 2.6+. In those, the keys preserve ordering,
# so can still be searched as a list, which is useful if you need to find e.g.
# 479, which is not in the dictionary, but 400 and 500 are and you have to interpolate etc.
GISDict = dict((x[0], GISData(*x)) for x in zip(start, shape, length, end))
# The dictionary for any given start value
# Access the 4 individual pieces by name, or by index
GISDict[start_lookup].shape
etc.

Most Efficient Way to Automate Grouping of List Entries

Background:I have a very large list of 3D cartesian coordinates, I need to process this list to group the coordinates by their Z coordinate (ie all coordinates in that plane). Currently, I manually create groups from the list using a loop for each Z coordinate, but if there are now dozens of possible Z (was previously handling only 2-3 planes)coordinates this becomes impractical. I know how to group lists based on like elements of course, but I am looking for a method to automate this process for n possible values of Z.Question:What's the most efficient way to automate the process of grouping list elements of the same Z coordinate and then create a unique list for each plane?
Code Snippet:
I'm just using a simple list comprehension to group individual planes:
newlist=[x for x in coordinates_xyz if insert_possible_Z in x]
I'm looking for it to automatically make a new unique list for every Z plane in the data set.
Data Format:
((x1,y1,0), (x2, y2, 0), ... (xn, yn, 0), (xn+1,yn+1, 50),(xn+2,yn+2, 50), ... (x2n+1,y2n+1, 100), (x2n+2,y2n+2, 100)...)etc. I want to automatically get all coordinates where Z=0, Z=50, Z=100 etc. Note that the value of Z (increments of 50) is an example only, the actual data can have any value.Notes:My data is imported either from a file or generated by a separate module in lists. This is necessary for interface with another program (that I have not written).

The most efficient way to group elements by Z and make a list of them so grouped is to not make a list.
itertools.groupby does the grouping you want without the overhead of creating new lists.
Python generators take a little getting used to when you aren't familiar with the general mechanism. The official generator documentation is a good starting point for learning why they are useful.

If I am interpreting this correctly, you have a set of coordinates C = (X,Y,Z) with a discrete number of Z values. If this is the case, why not use a dictionary to associate a list of the coordinates with the associated Z value as a key?
You're data structure would look something like:
z_ordered = {}
z_ordered[3] = [(x1,y1,z1),(x2,y2,z2),(x3,y3,z3)]
Where each list associated with a key has the same Z-value.
Of course, if your Z-values are continuous, you may need to modify this, say by making the key only the whole number associated with a Z-value, so you are binning in increments of 1.

So this is the simple solution I came up with:
groups=[]
groups[:]=[]
No_Planes=#Number of planes
dz=#Z spacing variable here
for i in range(No_Planes):
newlist=[x for x in coordinates_xyz if i*dz in x]
groups.append(newlist)
This lets me manipulate any plane within my data set simply with groups[i]. I can also manipulate my spacing. This is also an extension of my existing code, as I realised after reading #msw's response about itertools, looping through my current method was staring me in the face, and far more simple than I imagined!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why is my dict being overwritten in this loop in python? - python

Related

Data Structure for fast insertion and random access in already sorted data

Change next list element during iteration?

Mapping arrays with same values but different orders

Using a dictionary to index parallel arrays?

Most Efficient Way to Automate Grouping of List Entries

Categories

Resources