Finding relative maximums of a 2-D numpy array

Finding relative maximums of a 2-D numpy array - python

I have a 2-D numpy array that can be subdivided into 64 boxes (think of a chessboard).
The goal is a function that returns the position and value of the maximum in each box. Something like:
FindRefs(array) --> [(argmaxX00, argmaxY00, Max00), ...,(argmaxX63, argmaxY63, Max63)]
where argmaxXnn and argmaxYnn are the indexes of the whole array (not of the box), and Maxnn is the max value in each box. In other words,
Maxnn = array[argmaxYnn,argmaxYnn]
I've tryed the obvious "nested-for" solution:
def FindRefs(array):
Height, Width = array.shape
plumx = []
plumy = []
lum = []
w = int(Width/8)
h = int(Height/8)
for n in range(0,8): # recorrer boxes
x0 = n*w
x1 = (n+1)*w
for m in range(0,8):
y0 = m*h
y1 = (m+1)*h
subflatind = a[y0:y1,x0:x1].argmax() # flatten index of box
y, x = np.unravel_index(subflatind, (h, w))
X = x0 + x
Y = y0 + y
lum.append(a[Y,X])
plumx.append(X)
plumy.append(Y)
refs = []
for pt in range(0,len(plumx)):
ptx = plumx[pt]
pty = plumy[pt]
refs.append((ptx,pty,lum[pt]))
return refs
It works, but is neither elegant nor eficient.
So I've tryed this more pythonic version:
def FindRefs(a):
box = [(n*w,m*h) for n in range(0,8) for m in range(0,8)]
flatinds = [a[b[1]:h+b[1],b[0]:w+b[0]].argmax() for b in box]
unravels = np.unravel_index(flatinds, (h, w))
ur = [(unravels[1][n],unravels[0][n]) for n in range(0,len(box))]
absinds = [map(sum,zip(box[n],ur[n])) for n in range(0,len(box))]
refs = [(absinds[n][0],absinds[n][1],a[absinds[n][1],absinds[n][0]]) for n in range(0,len(box))]
return refs
It works fine but, to my surprise, is not more efficient than the previous version!
The question is: Is there a more clever way to do the task?
Note that efficiency matters, as I have many large arrays for processing.
Any clue is welcome. :)

Try this:
from numpy.lib.stride_tricks import as_strided as ast
import numpy as np
def FindRefs3(a):
box = tuple(x/8 for x in a.shape)
z=ast(a, \
shape=(8,8)+box, \
strides=(a.strides[0]*box[0],a.strides[1]*box[1])+a.strides)
v3 = np.max(z,axis=-1)
i3r = np.argmax(z,axis=-1)
v2 = np.max(v3,axis=-1)
i2 = np.argmax(v3,axis=-1)
i2x = np.indices(i2.shape)
i3 = i3r[np.ix_(*[np.arange(x) for x in i2.shape])+(i2,)]
i3x = np.indices(i3.shape)
ix0 = i2x[0]*box[0]+i2
ix1 = i3x[1]*box[1]+i3
return zip(np.ravel(ix0),np.ravel(ix1),np.ravel(v2))
Note that your first FindRefs reverses indices, so that for a tuple (i1,i2,v), a[i1,i2] won't return the right value, whereas a[i2,i1] will.
So here's what the code does:
It first calculates the dimensions that each box needs to have (box) given the size of your array. Note that this doesn't do any checking: you need to have an array that can be divided evenly into an 8 by 8 grid.
Then z with ast is the messiest bit. It takes the 2d array, and turns it into a 4d array. The 4d array has dimensions (8,8,box[0],box[1]), so it lets you choose which box you want (the first two axes) and then what position you want in the box (the next two). This lets us deal with all the boxes at once by doing operations on the last two axes.
v3 gives us the maximum values along the last axis: in other words, it contains the maximum of each column in each box. i3r contains the index of which row in the box contained that max value.
v2 takes the maximum of v3 along its own last axis, which is now dealing with rows in the box: it takes the column maxes, and finds the maximum of them, so that v2 is a 2d array containing the maximum value of each box. If all you wanted were the maximums, this is all you'd need.
i2 is the index of the column in the box that holds the maximum value.
Now we need to get the index of the row in the box... that's trickier. i3r contains the row index of the max of each column in the box, but we want the row for the specific column that's specified in i2. We do this by choosing an element from i3r using i2, which gives us i3.
At this point, i2 and i3 are 8 by 8 arrays containing the row and column indexes of the maximums relative to each box. We want the absolute indexes. So we create i2x and i3x (actually, this is pointless; we could just create one, as they are the same), which are just arrays of what the indexes for i2 and i3 are (0,1,2,...,8 etc in one dimension, and so on). We then multiply these by the box sizes, and add the relative max indexes, to get the absolute max indexes.
We then combine these to get the same output that you had. Note that if you keep them as arrays, though, instead of making tuples, it's much faster.

Related

How to split data matrix into 3 parts

I have the following data x-y-data block x in [390, 1669] and y in [377, 751]. I have the value of the central row y=597. I want to split my data into 3 parts, a central (row 397), an upper (rows 598-751) and a lower part (rows 377-596). I want to automise the function to find the upper bounds for y_max and y_min and don't put them in manually.
def split_data(y_cent_index, data):
'''splitting data matrix into upper, central and lower part'''
center_row = data[y_cent_index,:]
y_max_index =
y_min_index = ?
upper_rows = data[y_cent_index+i,:]
lower_rows = data[y_cent_index-i,:]
return center_rows, upper_rows, lower_rows

If I understand correctly, you want to finde the indices of the biggest and the smallest element in your data?
The you would need to use the function argmax in numpy (argmin respectively)
y_max_index = np.unravel_index(np.argmax(data, axis=None), data.shape)
y_min_index = np.unravel_index(np.argmin(data, axis=None), data.shape)
This would return tuples for the indices, if you just want to get the y part of the index, just access the second value of the tuples.

(Python) Equally Spacing Trues in a Series of Booleans

I am trying to make a custom PWM script to work with my Charlieplexed series of LEDs. However, I am struggling to make certain intensity values look smooth with no flashing. Different brightness values will have the LED on for a different amount of ticks. In order to make it feel smooth, I need to optimize the spacing of on and off ticks for the LEDs but I can quite figure out how to do so.
If you have a pattern that has x amount of booleans and n amount of them are true, how would you go about equally spacing out the trues as much as possible?
Here is an example of what I am looking for:
x = 10, n = 7
Desired result: 1,1,1,0,1,1,0,1,1,0
x = 10, n = 4
Desired result: 1,0,1,0,0,1,0,1,0,0

You can use the linspace function of numpy for this.
import numpy as np
x = 10
n = 7
# Gets the indices that should be true
def get_spaced_indices(array_length, num_trues):
array = np.arange(array_length) # Create array of "indices" 0 to x
indices = np.round(np.linspace(0, len(array) - 1, num_trues)).astype(int) # Evenly space the indices
values = array[indices]
return values
true_indices = get_spaced_indices(x, n)
print("true_indices:", true_indices)
result = [0] * x # Initialize your whole result to "false"
for i in range(x):
if i in true_indices: # Set evenly-spaced values to true in your result based on indices from get_spaced_indices
result[i] = 1
print("result:", result)

Using Mann Kendall in python with a lot of data

I have a set of 46 years worth of rainfall data. It's in the form of 46 numpy arrays each with a shape of 145, 192, so each year is a different array of maximum rainfall data at each lat and lon coordinate in the given model.
I need to create a global map of tau values by doing an M-K test (Mann-Kendall) for each coordinate over the 46 years.
I'm still learning python, so I've been having trouble finding a way to go through all the data in a simple way that doesn't involve me making 27840 new arrays for each coordinate.
So far I've looked into how to use scipy.stats.kendalltau and using the definition from here: https://github.com/mps9506/Mann-Kendall-Trend
EDIT:
To clarify and add a little more detail, I need to perform a test on for each coordinate and not just each file individually. For example, for the first M-K test, I would want my x=46 and I would want y=data1[0,0],data2[0,0],data3[0,0]...data46[0,0]. Then to repeat this process for every single coordinate in each array. In total the M-K test would be done 27840 times and leave me with 27840 tau values that I can then plot on a global map.
EDIT 2:
I'm now running into a different problem. Going off of the suggested code, I have the following:
for i in range(145):
for j in range(192):
out[i,j] = mk_test(yrmax[:,i,j],alpha=0.05)
print out
I used numpy.stack to stack all 46 arrays into a single array (yrmax) with shape: (46L, 145L, 192L) I've tested it out and it calculates p and tau correctly if I change the code from out[i,j] to just out. However, doing this messes up the for loop so it only takes the results from the last coordinate in stead of all of them. And if I leave the code as it is above, I get the error: TypeError: list indices must be integers, not tuple
My first guess was that it has to do with mk_test and how the information is supposed to be returned in the definition. So I've tried altering the code from the link above to change how the data is returned, but I keep getting errors relating back to tuples. So now I'm not sure where it's going wrong and how to fix it.
EDIT 3:
One more clarification I thought I should add. I've already modified the definition in the link so it returns only the two number values I want for creating maps, p and z.

I don't think this is as big an ask as you may imagine. From your description it sounds like you don't actually want the scipy kendalltau, but the function in the repository you posted. Here is a little example I set up:
from time import time
import numpy as np
from mk_test import mk_test
data = np.array([np.random.rand(145, 192) for _ in range(46)])
mk_res = np.empty((145, 192), dtype=object)
start = time()
for i in range(145):
for j in range(192):
out[i, j] = mk_test(data[:, i, j], alpha=0.05)
print(f'Elapsed Time: {time() - start} s')
Elapsed Time: 35.21990394592285 s
My system is a MacBook Pro 2.7 GHz Intel Core I7 with 16 GB Ram so nothing special.
Each entry in the mk_res array (shape 145, 192) corresponds to one of your coordinate points and contains an entry like so:
array(['no trend', 'False', '0.894546014835', '0.132554125342'], dtype='<U14')
One thing that might be useful would be to modify the code in mk_test.py to return all numerical values. So instead of 'no trend'/'positive'/'negative' you could return 0/1/-1, and 1/0 for True/False and then you wouldn't have to worry about the whole object array type. I don't know what kind of analysis you might want to do downstream but I imagine that would preemptively circumvent any headaches.

Thanks to the answers provided and some work I was able to work out a solution that I'll provide here for anyone else that needs to use the Mann-Kendall test for data analysis.
The first thing I needed to do was flatten the original array I had into a 1D array. I know there is probably an easier way to go about doing this, but I ultimately used the following code based on code Grr suggested using.
`x = 46
out1 = np.empty(x)
out = np.empty((0))
for i in range(146):
for j in range(193):
out1 = yrmax[:,i,j]
out = np.append(out, out1, axis=0) `
Then I reshaped the resulting array (out) as follows:
out2 = np.reshape(out,(27840,46))
I did this so my data would be in a format compatible with scipy.stats.kendalltau 27840 is the total number of values I have at every coordinate that will be on my map (i.e. it's just 145*192) and the 46 is the number of years the data spans.
I then used the following loop I modified from Grr's code to find Kendall-tau and it's respective p-value at each latitude and longitude over the 46 year period.
`x = range(46)
y = np.zeros((0))
for j in range(27840):
b = sc.stats.kendalltau(x,out2[j,:])
y = np.append(y, b, axis=0)`
Finally, I reshaped the data one for time as shown:newdata = np.reshape(y,(145,192,2)) so the final array is in a suitable format to be used to create a global map of both tau and p-values.
Thanks everyone for the assistance!

Depending on your situation, it might just be easiest to make the arrays.
You won't really need them all in memory at once (not that it sounds like a terrible amount of data). Something like this only has to deal with one "copied out" coordinate trend at once:
SIZE = (145,192)
year_matrices = load_years() # list of one 145x192 arrays per year
result_matrix = numpy.zeros(SIZE)
for x in range(SIZE[0]):
for y in range(SIZE[1]):
coord_trend = map(lambda d: d[x][y], year_matrices)
result_matrix[x][y] = analyze_trend(coord_trend)
print result_matrix
Now, there are things like itertools.izip that could help you if you really want to avoid actually copying the data.
Here's a concrete example of how Python's "zip" might works with data like yours (although as if you'd used ndarray.flatten on each year):
year_arrays = [
['y0_coord0_val', 'y0_coord1_val', 'y0_coord2_val', 'y0_coord2_val'],
['y1_coord0_val', 'y1_coord1_val', 'y1_coord2_val', 'y1_coord2_val'],
['y2_coord0_val', 'y2_coord1_val', 'y2_coord2_val', 'y2_coord2_val'],
]
assert len(year_arrays) == 3
assert len(year_arrays[0]) == 4
coord_arrays = zip(*year_arrays) # i.e. `zip(year_arrays[0], year_arrays[1], year_arrays[2])`
# original data is essentially transposed
assert len(coord_arrays) == 4
assert len(coord_arrays[0]) == 3
assert coord_arrays[0] == ('y0_coord0_val', 'y1_coord0_val', 'y2_coord0_val', 'y3_coord0_val')
assert coord_arrays[1] == ('y0_coord1_val', 'y1_coord1_val', 'y2_coord1_val', 'y3_coord1_val')
assert coord_arrays[2] == ('y0_coord2_val', 'y1_coord2_val', 'y2_coord2_val', 'y3_coord2_val')
assert coord_arrays[3] == ('y0_coord2_val', 'y1_coord2_val', 'y2_coord2_val', 'y3_coord2_val')
flat_result = map(analyze_trend, coord_arrays)
The example above still copies the data (and all at once, rather than a coordinate at a time!) but hopefully shows what's going on.
Now, if you replace zip with itertools.izip and map with itertools.map then the copies needn't occur — itertools wraps the original arrays and keeps track of where it should be fetching values from internally.
There's a catch, though: to take advantage itertools you to access the data only sequentially (i.e. through iteration). In your case, it looks like the code at https://github.com/mps9506/Mann-Kendall-Trend/blob/master/mk_test.py might not be compatible with that. (I haven't reviewed the algorithm itself to see if it could be.)
Also please note that in the example I've glossed over the numpy ndarray stuff and just show flat coordinate arrays. It looks like numpy has some of it's own options for handling this instead of itertools, e.g. this answer says "Taking the transpose of an array does not make a copy". Your question was somewhat general, so I've tried to give some general tips as to ways one might deal with larger data in Python.

I ran into the same task and have managed to come up with a vectorized solution using numpy and scipy.
The formula are the same as in this page: https://vsp.pnnl.gov/help/Vsample/Design_Trend_Mann_Kendall.htm.
The trickiest part is to work out the adjustment for the tied values. I modified the code as in this answer to compute the number of tied values for each record, in a vectorized manner.
Below are the 2 functions:
import copy
import numpy as np
from scipy.stats import norm
def countTies(x):
'''Count number of ties in rows of a 2D matrix
Args:
x (ndarray): 2d matrix.
Returns:
result (ndarray): 2d matrix with same shape as <x>. In each
row, the number of ties are inserted at (not really) arbitary
locations.
The locations of tie numbers in are not important, since
they will be subsequently put into a formula of sum(t*(t-1)*(2t+5)).
Inspired by: https://stackoverflow.com/a/24892274/2005415.
'''
if np.ndim(x) != 2:
raise Exception("<x> should be 2D.")
m, n = x.shape
pad0 = np.zeros([m, 1]).astype('int')
x = copy.deepcopy(x)
x.sort(axis=1)
diff = np.diff(x, axis=1)
cated = np.concatenate([pad0, np.where(diff==0, 1, 0), pad0], axis=1)
absdiff = np.abs(np.diff(cated, axis=1))
rows, cols = np.where(absdiff==1)
rows = rows.reshape(-1, 2)[:, 0]
cols = cols.reshape(-1, 2)
counts = np.diff(cols, axis=1)+1
result = np.zeros(x.shape).astype('int')
result[rows, cols[:,1]] = counts.flatten()
return result
def MannKendallTrend2D(data, tails=2, axis=0, verbose=True):
'''Vectorized Mann-Kendall tests on 2D matrix rows/columns
Args:
data (ndarray): 2d array with shape (m, n).
Keyword Args:
tails (int): 1 for 1-tail, 2 for 2-tail test.
axis (int): 0: test trend in each column. 1: test trend in each
row.
Returns:
z (ndarray): If <axis> = 0, 1d array with length <n>, standard scores
corresponding to data in each row in <x>.
If <axis> = 1, 1d array with length <m>, standard scores
corresponding to data in each column in <x>.
p (ndarray): p-values corresponding to <z>.
'''
if np.ndim(data) != 2:
raise Exception("<data> should be 2D.")
# alway put records in rows and do M-K test on each row
if axis == 0:
data = data.T
m, n = data.shape
mask = np.triu(np.ones([n, n])).astype('int')
mask = np.repeat(mask[None,...], m, axis=0)
s = np.sign(data[:,None,:]-data[:,:,None]).astype('int')
s = (s * mask).sum(axis=(1,2))
#--------------------Count ties--------------------
counts = countTies(data)
tt = counts * (counts - 1) * (2*counts + 5)
tt = tt.sum(axis=1)
#-----------------Sample Gaussian-----------------
var = (n * (n-1) * (2*n+5) - tt) / 18.
eps = 1e-8 # avoid dividing 0
z = (s - np.sign(s)) / (np.sqrt(var) + eps)
p = norm.cdf(z)
p = np.where(p>0.5, 1-p, p)
if tails==2:
p=p*2
return z, p
I assume your data come in the layout of (time, latitude, longitude), and you are examining the temporal trend for each lat/lon cell.
To simulate this task, I synthesized a sample data array of shape (50, 145, 192). The 50 time points are taken from Example 5.9 of the book Wilks 2011, Statistical methods in the atmospheric sciences. And then I simply duplicated the same time series 27840 times to make it (50, 145, 192).
Below is the computation:
x = np.array([0.44,1.18,2.69,2.08,3.66,1.72,2.82,0.72,1.46,1.30,1.35,0.54,\
2.74,1.13,2.50,1.72,2.27,2.82,1.98,2.44,2.53,2.00,1.12,2.13,1.36,\
4.9,2.94,1.75,1.69,1.88,1.31,1.76,2.17,2.38,1.16,1.39,1.36,\
1.03,1.11,1.35,1.44,1.84,1.69,3.,1.36,6.37,4.55,0.52,0.87,1.51])
# create a big cube with shape: (T, Y, X)
arr = np.zeros([len(x), 145, 192])
for i in range(arr.shape[1]):
for j in range(arr.shape[2]):
arr[:, i, j] = x
print(arr.shape)
# re-arrange into tabular layout: (Y*X, T)
arr = np.transpose(arr, [1, 2, 0])
arr = arr.reshape(-1, len(x))
print(arr.shape)
import time
t1 = time.time()
z, p = MannKendallTrend2D(arr, tails=2, axis=1)
p = p.reshape(145, 192)
t2 = time.time()
print('time =', t2-t1)
The p-value for that sample time series is 0.63341565, which I have validated against the pymannkendall module result. Since arr contains merely duplicated copies of x, the resultant p is a 2d array of size (145, 192), with all 0.63341565.
And it took me only 1.28 seconds to compute that.

Easiest way to select the subset of 2-d array by 2 corners

With an global distributed variable in the shape of (len(latitude), len(longitude)), I want to get an subset corresponding to the area of interest.
The specific area is defined by 2 corners(left-low latitude/longitude, right-top latitude/longitude). So, this array is what I have now:
VALUE is a 2-d array representing the global distribution
Lon is a 1-d array from (-180.0, 179.875) owning 2880 elements
Lat is a 1-d array from (-90, 89.875) owning 1440 elements.
llcrnrlat, urcrnrlat, llcrnrlon, urcrnrlon = 15, 50, 90, 150
Noticing that the llcrnrlat etc may not contain in the Lon or Lat, I can't use
VALUE_SELECT = VALUE[np.where(Lat ==llcrnrlat):np.where(Lat ==urcrnrlat),
np.where(Lon ==urcrnrlat):np.where(Lon ==urcrnrlat)]
So, my attempt is to loop the Lat and Lon aiming to find the indice for nearest value.
def find_nearest(array,value): ## This function was clipped from website
idx = (np.abs(array-value)).argmin()
return array[idx]
llcrnrlon,urcrnrlon = 90,150
llcrnrlat, urcrnrlat = 15, 50
nx_st = np.where(lon == (find_nearest(lon,llcrnrlon )))[0]
nx_en = np.where(lon == (find_nearest(lon,urcrnrlon )))[0]
ny_st = np.where(lat == (find_nearest(lat,llcrnrlat )))[0]
ny_en = np.where(lat == (find_nearest(lat,urcrnrlat )))[0]
lon_select,lat_select = lon[nx_st:nx_en+1], lat[ny_st:ny_en+1]
value_select =VALUE[ny_st:ny_en+1,nx_st:nx_en+1]
After execute these subroutine, there is an warning right here:
/Users/anaconda/lib/python2.7/site-packages/ipykernel/main.py:1: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
if name == 'main':
How to avoid this warning or potential error?
Is there any easier way to get an subset of 2-d array based on my case?

That find_nearest + np.where is a whole lot of computational work that is completely unneeded if the values are evenly distributed. Do you really understand what that code is doing? Read up on each of those functions. You're doing a subtraction on every value in an array, and then finding the index of the minimum of that offset. Then you look up the value at that minimum. Then with the value, you again look through every value in the array and test if it matches, creating a new array of True/False values. Then you sort through that array and find the index of what was True. The same index you already found with argmin()
Each lat/lon is split into 8 sections. So you just need to multiply by 8, right? Then make it an integer, which automatically applies a floor()
function lat_conv(y):
return int((y + 90) * 8)
function lon_conv(x):
return int((x + 180) * 8)
value_select = VALUES[lon_conv(start_lon):lon_conv(stop_lon),
lat_conv(start_lat):lat_conv(stop_lat)]

plot a huge amount of data points

I have encountered a strange problem: when I store a huge amount of data points from a nonlinear equation to 3 arrays (x, y ,and z) and then tried to plot them in a 2D graph (theta-phi plot, hence its 2D).
I tried to eliminate points needed to be plotted by sampling points from every 20 data points, since the z-data is approximately periodic. I picked those points with z value just above zero to make sure I picked one point for every period.
The problem arises when I tried to do the above. I got only a very limited number of points on the graph, approximately 152 points, regardless of how I changed my initial number of data points (as long as it surpassed a certain number of course).
I suspect that it might be some command I use wrongly or the capacity of array is smaller then I expected (seems unlikely), could anyone help me find out where is the problem?
def drawstaticplot(m,n, d_n, n_o):
counter=0
for i in range(0,m):
n=vector.rungekutta1(n, d_n)
d_n=vector.rungekutta2(n, d_n, i)
x1 = n[0]
y1 = n[1]
z1 = n[2]
if i%20==0:
xarray.append(x1)
yarray.append(y1)
zarray.append(z1)
for j in range(0,(m/20)-20):
if (((zarray[j]-n_o)>0) and ((zarray[j+1]-n_o)<0)):
counter= counter +1
print zarray[j]-n_o,counter
plotthetaphi(xarray[j],yarray[j],zarray[j])
def plotthetaphi(x,y,z):
phi= math.acos(z/math.sqrt(x**2+y**2+z**2))
theta = math.acos(x/math.sqrt(x**2 + y**2))
plot(theta, phi,'.',color='red')
Besides, I tried to apply the code in the following SO question to my code, I want a very similar result except that my data points are not randomly generated.

Shiuan,
I am still investigating your problem, how ever a few notes:
Instead of looping and appending to an array you could do:
select every nth element:
# inside IPython console:
[2]: a=np.arange(0,10)
In [3]: a[::2] # here we select every 2nd element.
Out[3]: array([0, 2, 4, 6, 8])
so instead of calcultating runga-kutta on all elements of m:
new_m = m[::20] # select every element of m.
now call your function like this:
def drawstaticplot(new_m,n, d_n, n_o):
n=vector.rungekutta1(n, d_n)
d_n=vector.rungekutta2(n, d_n, i)
x1 = n[0]
y1 = n[1]
z1 = n[2]
xarray.append(x1)
yarray.append(y1)
zarray.append(z1)
...
about appending, and iterating over large data sets:
append in general is slow, because it copies the whole array and then
stacks the new element. Instead, you already know the size of n, so you could do:
def drawstaticplot(new_m,n, d_n, n_o):
# create the storage based on n,
# notice i assumed that rungekutta, returns n the size of new_m,
# but you can change it.
x,y,z = np.zeros(n.shape[0]),np.zeros(n.shape[0]), np.zeros(n.shape[0])
for idx, itme in enumerate(new_m): # notice the function enumerate, make it your friend!
n=vector.rungekutta1(n, d_n)
d_n=vector.rungekutta2(n, d_n, ite,)
x1 = n[0]
y1 = n[1]
z1 = n[2]
#if i%20==0: # we don't need to check for the 20th element, m is already filtered...
xarray[idx] = n[0]
yarray[idx] = n[1]
zarray[idx] = n[2]
# is the second loop necessary?
if (((zarray[idx]-n_o)>0) and ((zarray[j+1]-n_o)<0)):
print zarray[idx]-n_o,counter
plotthetaphi(xarray[idx],yarray[idx],zarray[idx])

You can use the approach suggested here:
Efficiently create a density plot for high-density regions, points for sparse regions
e.g. histogram where you have too many points and points where the density is low.
Or also you can use rasterized flag for matplotlib, which speeds up matplotlib.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.