Pulling data from a numpy array - python

I have a numpy ndarray that I made using numpy.loadtxt. I want to pull an entire row from it based on a condition in the third column. Something like : if array[2][i] is meeting my conditions, then get array[0][i] and array [1][i] as well. I'm new to python, and all of the numpy features, so I'm looking for the best way to do this. Ideally, I'd like to pull 2 rows at a time, but I wont always have an even number of rows, so I imagine that is a problem
import numpy as np
'''
Created on Jan 27, 2013
#author:
'''
class Volume:
f ='/Users/Documents/workspace/findMinMax/crapc.txt'
m = np.loadtxt(f, unpack=True, usecols=(1,2,3), ndmin = 2)
maxZ = max(m[2])
minZ = min(m[2])
print("Maximum Z value: " + str(maxZ))
print("Minimum Z value: " + str(minZ))
zIncrement = .5
steps = maxZ/zIncrement
currentStep = .5
b = []
for i in m[2]:#here is my problem
while currentStep < steps:
if m[2][i] < currentStep and m[2][i] > currentStep - zIncrement:
b.append(m[2][i])
if len(b) < 2:
currentStep + zIncrement
print(b)
Here is some code that I did in java that is the general idea of what I want:
while( e < a.length - 1){
for(int i = 0; i < a.length - 1; i++){
if(a[i][2] < stepSize && a[i][2] > stepSize - 2){
x.add(a[i][0]);
y.add(a[i][1]);
z.add(a[i][2]);
}
if(x.size() < 1){
stepSize += 1;
}
}
}

First of all, you probably don't want to put your code in that class definition...
import numpy as np
def main():
m = np.random.random((3, 4))
mask = (m[2] > 0.5) & (m[2] < 0.8) # put your conditions here
# instead of 0.5 and 0.8 you can use
# an array if you like
m[:, mask]
if __name__ == '__main__':
main()
mask is a boolean array, m[:, mask] is the array you want
m[2] is the third row of m. If you type m[2] + 2 you get a new array with the old values + 2. m[2] > 0.5 creates an array with boolean values. It is best to try this stuff out with ipython (www.ipython.org)
In the expression m[:, mask] the : means "take all rows", mask describes which columns should be included.
Update
Next try :-)
for i in range(0, len(m), 2):
two_rows = m[i:i+2]

If you can write your condition as a simple function
def condition(value):
# return True or False depending on value
then you could select your subarrays like this:
cond = condition(a[2])
subarray0 = a[0,cond]
subarray1 = a[1,cond]

Related

Extracting contiguous rows in an array

The following code in python
df['tag'] = df['Value'] < 1.0
df['mask'] = np.where(df['tag'],1,0)
first = df.index[df['tag'] & ~ df['tag'].shift(1).fillna(False)]
last = df.index[df['tag'] & ~ df['tag'].shift(-1).fillna(False)]
pr = [(i, j) for i, j in zip(first, last) if j > i + 1]
returns an array, pr, that contains tuples of contiguous rows lesser than the Value of 1. I have tried to translate this Julia to a partial extent as follows:
df[:tag]=df[:Value] .< 1.0
df[:mask]=zeros(length(df[:tag]))
df[:mask][df[:tag].==true] .= 1
df[:mask][df[:tag].==false] .= 0
How can I replicate the values for first, last, pr in Julia?
I will give you two possible approaches to this problem. The first is faster, but requires a bit more code. The second is slower, but shorter.
function getblocks1(vs)
blocks = Tuple{Int, Int}[]
inblock, start = false, 0, 0
for (i, v) in enumerate(vs)
if inblock
if v >= 1.0
push!(blocks, (start, i-1))
inblock = false
end
else
if v < 1.0
start = i
inblock = true
end
end
end
inblock && push!(blocks, (start, length(vs)))
blocks
end
function getblocks2(vs)
t = [false; vs .< 1.0; false]
dt = diff(t)
f = findall(==(1), dt)
l = findall(==(-1), dt) .- 1
collect(zip(f, l))
end
The crucial thing to know, that in Julia getblocks1 will be fast because loops in Julia are fast and the function tries to minimize the number of allocations and does everything in one pass. The second implementation is more Python-like, but allocates more and does several passes through the whole vector.

How can I implement this point in polygon code in Python?

So, for my Computer Graphics class I was tasked with doing a Polygon Filler, my software renderer is currently being coded in Python. Right now, I want to test this pointInPolygon code I found at: How can I determine whether a 2D Point is within a Polygon? so I can make my own method later on basing myself on that one.
The code is:
int pnpoly(int nvert, float *vertx, float *verty, float testx, float testy)
{
int i, j, c = 0;
for (i = 0, j = nvert-1; i < nvert; j = i++) {
if ( ((verty[i]>testy) != (verty[j]>testy)) &&
(testx < (vertx[j]-vertx[i]) * (testy-verty[i]) / (verty[j]-verty[i]) + vertx[i]) )
c = !c;
}
return c;
}
And my attempt to recreate it in Python is as following:
def pointInPolygon(self, nvert, vertx, verty, testx, testy):
c = 0
j = nvert-1
for i in range(nvert):
if(((verty[i]>testy) != (verty[j]>testy)) and (testx < (vertx[j]-vertx[i]) * (testy-verty[i]) / (verty[j]-verty[i] + vertx[i]))):
c = not c
j += 1
return c
But this obviously will return a index out of range in the second iteration because j = nvert and it will crash.
Thanks in advance.
You're reading the tricky C code incorrectly. The point of j = i++ is to both increment i by one and assign the old value to j. Similar python code would do j = i at the end of the loop:
j = nvert - 1
for i in range(nvert):
...
j = i
The idea is that for nvert == 3, the values would go
j | i
---+---
2 | 0
0 | 1
1 | 2
Another way to achieve this is that j equals (i - 1) % nvert,
for i in range(nvert):
j = (i - 1) % nvert
...
i.e. it is lagging one behind, and the indices form a ring (like the vertices do)
More pythonic code would use itertools and iterate over the coordinates themselves. You'd have a list of pairs (tuples) called vertices, and two iterators, one of which is one vertex ahead the other, and cycling back to the beginning because of itertools.cycle, something like:
# make one iterator that goes one ahead and wraps around at the end
next_ones = itertools.cycle(vertices)
next(next_ones)
for ((x1, y1), (x2, y2)) in zip(vertices, next_ones):
# unchecked...
if (((y1 > testy) != (y2 > testy))
and (testx < (x2 - x1) * (testy - y1) / (y2-y1 + x1))):
c = not c

Strange behaviour of simple pycuda kernel

I'm quite new to cuda and pycuda.
I need a kernel that creates a matrix (of dimension n x d) out of an array (1 x d), by simply "repeating" the same array n times:
for example, suppose we have n = 4 and d = 3, then if the array is [1 2 3]
the result of my kernel should be:
[1 2 3
1 2 3
1 2 3
1 2 3]
(a matrix 4x3).
Basically, it's the same as doing numpy.tile(array, (n, 1))
I've written the code below:
kernel_code_template = """
__global__ void TileKernel(float *in, float *out)
{
// Each thread computes one element of out
int y = blockIdx.y * blockDim.y + threadIdx.y;
int x = blockIdx.x * blockDim.x + threadIdx.x;
if (y > %(n)s || x > %(d)s) return;
out[y * %(d)s + x] = in[x];
}
"""
d = 64
n = 512
blockSizex = 16
blockSizey = 16
gridSizex = (d + blockSizex - 1) / blockSizex
gridSizey = (n + blockSizey - 1) / blockSizey
# get the kernel code from the template
kernel_code = kernel_code_template % {
'd': d,
'n': n
}
mod = SourceModule(kernel_code)
TileKernel = mod.get_function("TileKernel")
vec_cpu = np.arange(d).astype(np.float32) # just as an example
vec_gpu = gpuarray.to_gpu(vec_cpu)
out_gpu = gpuarray.empty((n, d), np.float32)
TileKernel.prepare("PP")
TileKernel.prepared_call((gridSizex, gridSizey), (blockSizex, blockSizey, 1), vec_gpu.gpudata, out_gpu.gpudata)
out_cpu = out_gpu.get()
Now, if I run this code with d equals a power of 2 >= 16 I get the right result (just like numpy.tile(vec_cpu, (n, 1)) );
but if I set d equals to anything else (let's say for example 88) I get that every element of the output matrix has the
correct value, except the first column: some entries are right but others have another value, apparently random, same for every wrong element, but different every run,
and also the entries of the first column that have the wrong value are different every run.
Example:
[0 1 2
0 1 2
6 1 2
0 1 2
6 1 2
...]
I really can't figure out what is causing this problem, but maybe it's just something simple that I'm missing...
Any help will be appreciated, thanks in advance!
The bounds checking within your kernel code is incorrect. This
if (y > n || x > d) return;
out[y * d + x] = in[x];
should be:
if (y >= n || x >= d) return;
out[y * d + x] = in[x];
or better still:
if ((y < n) && (x < d))
out[y * d + x] = in[x];
All array valid indexing in the array lies on 0 < x < d and 0 < y < n. By allowing x=d you have undefined behaviour, allowing the first entry in the next row of the output array to be overwritten with an unknown value. This explains why sometimes the results were correct and other times not.

Vectorizing outer and inner loop when these contain calculations and deletes

I've been checking out how to vectorize an outer and inner for loop. These have some calculations and also a delete inside them - that seems to make it much less straight forward.
How would this be vectorized best?
import numpy as np
flattenedArray = np.ndarray.tolist(someNumpyArray)
#flattenedArray is a python list of lists.
c = flattenedArray[:]
for a in range (len(flattenedArray)):
for b in range(a+1, len(flattenedArray)):
if a == b:
continue
i0 = flattenedArray[a][0]
j0 = flattenedArray[a][1]
z0 = flattenedArray[a][2]
i1 = flattenedArray[b][0]
i2 = flattenedArray[b][1]
z1 = flattenedArray[b][2]
if ((np.square(z0-z1)) <= (np.square(i0-i1) + (np.square(j0-j2)))):
if (np.square(i0-i1) + (np.square(j0-j1))) <= (np.square(z0+z1)):
c.remove(flattenedArray[b])
#MSeifert is, of course, as so often right. So the following full vectorisation is only to show "how it's done"
import numpy as np
N = 4
data = np.random.random((N, 3))
# vectorised code
j, i = np.tril_indices(N, -1) # chose tril over triu to have contiguous columns
# useful later
sqsum = np.square(data[i,0]-data[j,0]) + np.square(data[i,1]-data[j,1])
cond = np.square(data[i, 2] + data[j, 2]) >= sqsum
cond &= np.square(data[i, 2] - data[j, 2]) <= sqsum
# because equal 'b's are grouped together we can use reduceat:
cond = np.r_[False, np.logical_or.reduceat(
cond, np.add.accumulate(np.arange(N-1)))]
left = data[~cond, :]
# original code (modified to make it run)
flattenedArray = np.ndarray.tolist(data)
#flattenedArray is a python list of lists.
c = flattenedArray[:]
for a in range (len(flattenedArray)):
for b in range(a+1, len(flattenedArray)):
if a == b:
continue
i0 = flattenedArray[a][0]
j0 = flattenedArray[a][1]
z0 = flattenedArray[a][2]
i1 = flattenedArray[b][0]
j1 = flattenedArray[b][1]
z1 = flattenedArray[b][2]
if ((np.square(z0-z1)) <= (np.square(i0-i1) + (np.square(j0-j1)))):
if (np.square(i0-i1) + (np.square(j0-j1))) <= (np.square(z0+z1)):
try:
c.remove(flattenedArray[b])
except:
pass
# check they are the same
print(np.alltrue(c == left))
Vectorizing the inner loop isn't much of a problem if you work with a mask:
import numpy as np
# I'm using a random array
flattenedArray = np.random.randint(0, 100, (10, 3))
mask = np.zeros(flattenedArray.shape[0], bool)
for idx, row in enumerate(flattenedArray):
# Calculate the broadcasted elementwise addition/subtraction of this row
# with all following
added_squared = np.square(row[None, :] + flattenedArray[idx+1:])
subtracted_squared = np.square(row[None, :] - flattenedArray[idx+1:])
# Check the conditions
col1_col2_added = subtracted_squared[:, 0] + subtracted_squared[:, 1]
cond1 = subtracted_squared[:, 2] <= col1_col2_added
cond2 = col1_col2_added <= added_squared[:, 2]
# Update the mask
mask[idx+1:] |= cond1 & cond2
# Apply the mask
flattenedArray[mask]
If you also want to vectorize the outer loop one has to do it by broadcasting, that however will use a lot of memory O(n**2) instead of O(n). Given that the critical inner loop is already vectorized there won't be a lot of speedup by vectorizing the outer loop.

In numpy, Python, how to conditionally rewrite part of an array, when the values I want to set are in an array of a different size?

Let's say I have three arrays:
A[size1] of {0..size1}
B[size2] of {0..size1}
C[size2] of boolean
What I want:
for (int e = 0; e < size2; ++e) :
if C[e] == some_condition, then B[e] = A[B[e]]
Since Python is slow, I have to implement it via numpy arithmetic on arrays. How can I do that?
Example:
A = np.array([np.random.randint(0,n,size1), np.random.randint(0,size1,size1)])
B = np.random.randint(0,size1,size2)
C = np.random.randint(0,n,size2)
#that's the part I want to do in numpy:
for i in range (size2) :
if (C[i] > A[0][B[i]]) :
B[i] = A[1][B[i]]
You could simply use boolean-indexing -
mask = C > A[0][B] # Create mask to select valid ones from B
B[mask] = A[1][B[mask]] # Use mask to select and assign values

Categories

Resources