Python array is getting changed - python

My function takes the points a polyline and removes the multiple points along any straight line segment.
The points fed in are as follows:
pts=[['639.625', '-180.719'], ['629.625', '-180.719'], ['619.625', '-180.719'], ['617.312', '-180.719'], ['610.867', '-182.001'], ['605.402', '-185.652'], ['601.751', '-191.117'], ['600.469', '-197.562'], ['600.469', '-207.562'], ['600.469', '-208.273']]
pta=[None]*2
ptb=[None]*2
ptc=[None]*2
simplepts=[]
for pt in pts:
if pta[0]==None:
simplepts.append(pt)
pta[:]=pt
continue
if ptb[0]==None:
ptb[:]=pt
continue
if ptb==pta:
ptb[:]=pt
continue
ptc[:]=pt
print simplepts#<--[['639.625', '-180.719'], ['605.402', '-185.652']]
# we check if a, b and c are on a straight line
# if they are, then b becomes c and the next point is allocated to c.
# if the are not, then a becomes b and the next point is allocate to c
if testforStraightline(pta,ptb,ptc):
ptb[:]=ptc # if it is straight
else:
simplepts.append(ptb)
print simplepts#<--[['639.625', '-180.719'], ['617.312', '-180.719']]
pta[:]=ptb # if it's not straight
If the section is not straight, then the ptb is appended to the simplepts array, which is now (correctly) [['639.625', '-180.719'], ['617.312', '-180.719']]
However, on the next pass the simplepts array has changed to [['639.625', '-180.719'], ['605.402', '-185.652']] which is baffling.
I presume that the points in my array are being held by reference only and changing other values updates the values in the array.
How do I make sure that my array values retain the values as they are assigned?
Thank you.

You are appending a list ptb in simplepts and then you are modifying it in place.Not sure if you can improve your design. But a quick solution with current design-
import copy
simplepts.append(copy.deepcopy(ptb))

Related

Optimizing python function for Numba

The following is the python function which I am trying to rewrite in python:
def freeoh_count_nojit(coord=np.array([[]]),\
molInterfaceIndex=np.array([]),\
hNeighbourList=np.array([]),\
topol=np.array([[]]),\
cos_HAngle=0.0,\
cellsize=np.array([]),\
is_orig_def=False,is_new_def=True):
labelArray=[]
freeOHcosDA=[]; freeOHcosDAA=[]
mol1Coord=np.zeros((3,3),dtype=float)
labelArray=np.empty(molInterfaceIndex.shape[0], dtype="U10")
for i in range(molInterfaceIndex.shape[0]): # loop over selected molecules
mol2CoordList=[]; timesave=[]
mol1Coord=np.array([coord[k] for k in topol[molInterfaceIndex[i]]]) # extract center molecule
gen = np.array([index for index in hNeighbourList[i] if index!=-1]) # remove padding
for j in range(gen.shape[0]):
mol2CoordList.append([coord[k] for k in topol[gen[j]]]) # extract neighbors
mol2Coord=np.array(mol2CoordList).reshape(-1,3)
if is_orig_def:
acceptor,donor,cosAngle=interface_hbonding_orig(mol1Coord,mol2Coord,cos_HAngle,cellsize)
labelArray[i]="D"*np.abs(2-np.sum(donor))+"A"*np.clip(np.array([np.sum(acceptor)]),1,2)[0]
elif is_new_def:
acceptor,donor,cosAngle=interface_hbonding_new(mol1Coord,mol2Coord,cos_HAngle,cellsize)
labelArray[i]="D"*np.abs(2-np.sum(donor))+"A"*np.sum(acceptor)
if labelArray[i] in "DA":
freeOHcosDA.append(cosAngle)
elif labelArray[i] in "DAA":
freeOHcosDAA.append(cosAngle)
freeOHcos=freeOHcosDA+freeOHcosDAA
return labelArray, freeOHcos
The function takes in a coordinates frame of the simulation molecules. The code selects a central molecule from an index list molInterfaceIndex and extracts its neighbouring molecules coordinates from a pre-generated neighbour (generated from scipy.spatial.KDTree hence cannot be called from a jitted function). The central molecule and its neighbour are send to a jitted function which then returns two [1,1] and a scaler which are then used to label the central molecule.
My attempt at rewriting the above python function is below:
#njit(cache=True,parallel=True)
def freeoh_count_jit(coord=np.array([[]]),\
molInterfaceIndex=np.array([]),\
hNeighbourList=np.array([]),\
topol=np.array([[]]),\
cos_HAngle=0.0,\
cellsize=np.array([]),\
is_orig_def=False,is_new_def=True):
NAtomsMol=3 #No. of atoms in a molecule
_M=molInterfaceIndex.shape[0]
_N=hNeighbourList.shape[1]
mol1Coord=np.zeros((NAtomsMol,3),dtype=np.float64)
mol2Coord=np.zeros((_N*NAtomsMol,3),dtype=np.float64)
acceptor=np.zeros((_M,2),dtype=int)
donor=np.zeros((_M,2),dtype=int)
cosAngle=np.zeros(_M,dtype=np.float64)
gen=np.zeros(_M,dtype=int)
freeOHMask = np.zeros(_M, dtype=int) == 0
labelArray=np.empty(_M, dtype="U10")
for i in range(_M): # loop over selected molecules
for index,j in enumerate(topol[molInterfaceIndex[i]]):
mol1Coord[index]=coord[j] # extract center molecule
for indexJ,j in enumerate(hNeighbourList[i]):
for indexK,k in enumerate(topol[j]):
mol2Coord[indexK+topol[j].shape[0]*indexJ]=coord[k] # extract neighbors
gen[i] = len(np.array([index for index in hNeighbourList[i] if index!=-1]))*NAtomsMol # get actual number of neighbor atoms
if is_orig_def:
acceptor[i],donor[i],cosAngle[i]=interface_hbonding_orig(mol1Coord,mol2Coord[:gen[i]],cos_HAngle,cellsize)
labelArray[i]="D"*np.abs(2-np.sum(donor[i]))+"A"*np.clip(np.array([np.sum(acceptor[i])]),1,2)[0]
elif is_new_def:
acceptor[i],donor[i],cosAngle[i]=interface_hbonding_new(mol1Coord,mol2Coord[:gen[i]],cos_HAngle,cellsize)
labelArray[i]="D"*np.abs(2-np.sum(donor[i]))+"A"*np.sum(acceptor[i])
freeOHMask[np.where(cosAngle > 1.0)] = False
return acceptor, donor, labelArray, freeOHMask
The main issue is that #jit function seem to be providing incorrect results while using numba.prange for the outer loop. Also, the execution time for the function increases per call which is a bit confusing. The functions interface_hbonding_orig() and interface_hbonding_new() are already jitted so I think they are out of scope of discussion here. One of the bigger questions is that whether I even to jit this function at all as the most time consuming part is supposing to be the array selection in the initial few initial lines in the outer loop. If anyone has any suggestions for rewriting this function or even for algorithm design, it would be really helpful.

Create an (n,1) array from int values in a for loop

I simply want to create an (n,1) array from the int values (dist) in my for a loop.
For now, I only have a succession of int values, since I'm printing the "dist" value in each iteration.
How do I incorporate each dist into an array (in this case only an n-vector), so that array[i][0] is the dist value from the ith iteration of the for loop?
I know it must be really simple, I'm only starting out on NumPy. I've tried out with insert, append, doesn't seem to work. My distmatrix initialized in the beginning is not used...
here is the code for now:
distmat=np.zeros((len(CoordNodesRad[:,0]),1),int)
lat = CoordNodesRad[:,0]
lng = CoordNodesRad[:,1]
for i in range(len(CoordNodesRad[:,0])):
dist = distanceGPS(depot[0], depot[1], lat[i], lng[i])
#print(int(dist))
#distmat=np.append(distmat,dist,axis=0)
print(dist)
distmat=np.insert(dist,i,dist)
print(distmat)
thanks for the help
Eventually I used a simple list, and append my "dist" values to it.
If someone has something quicker than passing by the list, then creating the array...
here is my new code (the commented lines don't matter)
disttodepot=[]
lat = CoordNodesRad[:,0]
lng = CoordNodesRad[:,1]
for i in range(len(CoordNodesRad[:,0])):
# for j in CoordNodesRad:
# dist = np.append(dist, [[i, j]], axis=0)
dist = distanceGPS(depot[0], depot[1], lat[i], lng[i])
#print(int(dist))
#distmat=np.append(distmat,dist,axis=0)
#print(dist)
disttodepot.append(int(dist))
print(disttodepot)

Convert Matlab to Python

I'm converting matlab code to python, and I'm having a huge doubt on the following line of code:
BD_teste = [BD_teste; grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l];
the whole code is this:
BD_teste = [];
por_treino = 0;
for l = 1:k
quant_elementos_t = int64((length(grupos.(['g',int2str(l)]).('elementos')) * por_treino)/100);
for element_c = 1 : quant_elementos_t
ind_element = randi([1 length(grupos.(['g',int2str(l)]).('elementos'))]);
BD_teste = [BD_teste; grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l];
grupos.(['g',int2str(l)]).('elementos')(ind_element,:) = [];
end
end
This line of code below is a structure, as I am converting to python, I used a list and inside it, a dictionary with its list 'elementos':
'g',int2str(l)]).('elementos')
So my question is just in the line I quoted above, I was wondering what is happening and how it is occurring, and how I would write in python.
Thank you very much in advance.
BD_teste = [BD_teste; grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l];
Is one very weird line. Let's break it down into pieces:
int2str(l) returns the number l as a char array (will span from '1' until k).
['g',int2str(l)] returns the char array g1, then g2 and so on along with the value of l.
grupos.(['g',int2str(l)]) will return the value of the field named g1, g2 and so on that belongs to the struct grupos.
grupos.(['g',int2str(l)]).('elementos') Now assumes that grupos.(['g',int2str(l)]) is itself a struct, and returns the value of its field named 'elementos'.
grupos.(['g',int2str(l)]).('elementos')(ind_element,:) Assuming that grupos.(['g',int2str(l)]) is a matrix, this line returns a line-vector containing the ind_element-th line of said matrix.
grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l appends the number one to the vector obtained before.
[BD_teste; grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l] appends the line vector [grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l] to the matrix BD_teste, at its bottom. and creates a new matrix.
Finally:
BD_teste = [BD_teste; grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l];``assignes the value of the obtained matrix to the variableBD_teste`, overwriting its previous value. Effectively, this just appends the new line, but because of the overwriting step, it is not very effective.
It would be recommendable to append with:
BD_teste(end+1,:) = [grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l];
Now, how you will rewrite this in Python is a whole different story, and will depend on how you want to define the variable grupos mostly.

Remove elements from array of arrays

I have an array of arrays from which I want to remove specific elements according to a logical command.
I have an array of arrays such that galaxies = ([[z1,ra1,dec1,distance1],[z2,ra2,dec2,distance2]...])and i want to remove all elements whose distance term is greater than 1. Ive tried to write "from galaxies[i], remove all galaxies such that galaxies[i][4]>1"
My code right now is:
galaxies_in_cluster = []
for i in range(len(galaxies)):
galacticcluster = galaxies[~(galaxies[i][4]<=1)]
galaxies_in_cluster.append(galacticcluster)
where
galaxies = [array([1.75000000e-01, 2.43794800e+02, 5.63820000e+01, 6.80000000e+00,
7.07290131e-02]),
array([1.75000000e-01, 2.40898000e+02, 5.15900000e+01, 7.10000000e+00,
5.60800387e+00]),
array([1.80000000e-01, 2.43792000e+02, 5.63990000e+01, 6.50000000e+00,
5.00059297e+02]),
array([1.75000000e-01, 2.43805000e+02, 5.62190000e+01, 7.80000000e+00,
2.16588562e-01])]
I want it to return
galaxies_in_cluster = [array([1.75000000e-01, 2.43794800e+02, 5.63820000e+01, 6.80000000e+00,
7.07290131e-02]), array([1.75000000e-01, 2.43805000e+02, 5.62190000e+01, 7.80000000e+00,
2.16588562e-01])]
(basically eliminating the second and third entry) but its returning the first and second entry twice, which doesn't make sense to me, especially since in the second entry, galaxies[2][4]>1.
Any help would be much appreciated.

checking for nan's in 2d numpy arrays

I am working on a small piece of code that starts with an interpolated surface I made previsouly. The interpolation filled in gaps in the surface with nan's. Part of my processing involves looking at a local window around a particular point, and calculating some measures using the local surface. I would ideally like this code to only be able to do any calculations if the entire local surface does not contain nan values. The code iterates through the original large surface and checks to see if the local window about a point has a nan.
I know this is not the most efficent way to go about doing it, time-efficiency is not something I have to worry about.
Here is what I have so far:
for in in range(startz,endx):
imin = i - half_tile
imax = i + half_tile +1
for j in range(starty,endy):
jmin = i - half_tile
jmax = i + half_tile +1
#Test the local surface for nan's
z = surface[imin:imax,jmin:jmax]
Test = np.isnan(sum(z))
#conditional statement
if Test:
print 'We have a nan'
#set measures I want to calculate to zero
else:
print 'We have a complete window'
#do a set of calculations
the variable surface is the interpolated surface I created originally. The half_tile variables are just defining the size of the local window I want to use. The startx,endx,starty,endy are defining the size of the original surface to iterate through.
Where I am running into issues is that my conditional statement doesn't seem to be working. It will tell me that the local window that I am evaluating doesn't have any nan's in it, but then the rest of my code (which I didn't show here) will not work because it says there are nan's in the array.
An example of this might be:
[[ 7.07494104 7.04592032 7.01689961 6.98787889 6.95885817 6.92983745
6.90081674 6.87179602 6.8427753 6.81375458 6.78473387 6.75571315
6.72669243]
[ 7.10077447 7.07175376 7.04273304 7.01371232 6.98469161 6.95567089
6.92665017 6.89762945 6.86860874 6.83958802 6.8105673 6.78154658
6.75252587]
[ 7.12660791 7.09758719 7.06856647 7.03954576 7.01052504 6.98150432
6.9524836 6.92346289 6.89444217 6.86542145 6.83640073 6.80738002
6.7783593 ]
[ 7.15244134 7.12342063 7.09439991 7.06537919 7.03635847 7.00733776
6.97831704 6.94929632 6.9202148 6.89105825 6.86190169 6.83274514
6.80358859]
[ 7.17804068 7.14888413 7.11972758 7.09057103 7.06141448 7.03225793
7.00310137 6.97394482 6.94478827 6.91563172 6.88647517 6.85731862
nan]]
Here is an example of the local window that my code is evaluating. In my code this would be z. The entire array has good values except for the last value which is a nan.
The "checking" function in my code is not picking up that there is a nan in the array. The conditional statement is returning a false when it should be a true to indicate that there is a nan present. I am missing anything fundamental in the way I am checking the array? or are my methods just totally wrong?
isnan() returns an array with true or false for each element in array. you need np.any() in addition to isnan(). see below example
import numpy as np
a = np.array([[1,2,3,4],[1,2,3,np.NaN]])
print np.isnan(a)
print np.any(np.isnan(a))
results in
[[False False False False]
[False False False True]]
True

Categories

Resources