Not plotting 'zero' in matplotlib or change zero to None [Python] - python

I have the code below and I would like to convert all zero's in the data to None's (as I do not want to plot the data here in matplotlib). However, the code is notworking and 0. is still being printed
sd_rel_track_sum=np.sum(sd_rel_track, axis=1)
for i in sd_rel_track_sum:
print i
if i==0:
i=None
return sd_rel_track_sum
Can anyone think of a solution to this. Or just an answer for how I can transfer all 0 to None. Or just not plot the zero values in Matplotlib.

Why not use numpy for this?
>>> values = np.array([3, 5, 0, 3, 5, 1, 4, 0, 9], dtype=np.double)
>>> values[ values==0 ] = np.nan
>>> values
array([ 3., 5., nan, 3., 5., 1., 4., nan, 9.])
It should be noted that values cannot be an integer type array.

Using numpy is of course the better choice, unless you have any good reasons not to use it ;) For that, see Daniel's answer.
If you want to have a bare Python solution, you might do something like this:
values = [3, 5, 0, 3, 5, 1, 4, 0, 9]
def zero_to_nan(values):
"""Replace every 0 with 'nan' and return a copy."""
return [float('nan') if x==0 else x for x in values]
print(zero_to_nan(values))
gives you:
[3, 5, nan, 3, 5, 1, 4, nan, 9]
Matplotlib won't plot nan (not a number) values.

Related

Python : Mapping values to other values without gap

I have the following question. Is there somekind of method with numpy or scipy , which I can use to get an given unsorted array like this
a = np.array([0,0,1,1,4,4,4,4,5,1891,7]) #could be any number here
to something where the numbers are interpolated/mapped , there is no gap between the values and they are in the same order like before?:
[0,0,1,1,2,2,2,2,3,5,4]
EDIT
Is it furthermore possible to swap/shuffle the numbers after the mapping, so that
[0,0,1,1,2,2,2,2,3,5,4]
become something like:
[0,0,3,3,5,5,5,5,4,1,2]
Edit: I'm not sure what the etiquette is here (should this be a separate answer?), but this is actually directly obtainable from np.unique.
>>> u, indices = np.unique(a, return_inverse=True)
>>> indices
array([0, 0, 1, 1, 2, 2, 2, 2, 3, 5, 4])
Original answer: This isn't too hard to do in plain python by building a dictionary of what index each value of the array would map to:
x = np.sort(np.unique(a))
index_dict = {j: i for i, j in enumerate(x)}
[index_dict[i] for i in a]
Seems you need to rank (dense) your array, in which case use scipy.stats.rankdata:
from scipy.stats import rankdata
rankdata(a, 'dense')-1
# array([ 0., 0., 1., 1., 2., 2., 2., 2., 3., 5., 4.])

SciPy sparse matrix (COO,CSR): Clear row

For creating a scipy sparse matrix, I have an array or row and column indices I and J along with a data array V. I use those to construct a matrix in COO format and then convert it to CSR,
matrix = sparse.coo_matrix((V, (I, J)), shape=(n, n))
matrix = matrix.tocsr()
I have a set of row indices for which the only entry should be a 1.0 on the diagonal. So far, I go through I, find all indices that need wiping, and do just that:
def find(lst, a):
# From <http://stackoverflow.com/a/16685428/353337>
return [i for i, x in enumerate(lst) if x in a]
# wipe_rows = [1, 55, 32, ...] # something something
indices = find(I, wipe_rows) # takes too long
I = numpy.delete(I, indices).tolist()
J = numpy.delete(J, indices).tolist()
V = numpy.delete(V, indices).tolist()
# Add entry 1.0 to the diagonal for each wipe row
I.extend(wipe_rows)
J.extend(wipe_rows)
V.extend(numpy.ones(len(wipe_rows)))
# construct matrix via coo
That works alright, but find tends to take a while.
Any hints on how to speed this up? (Perhaps wiping the rows in COO or CSR format is a better idea.)
If you intend to clear multiple rows at once, this
def _wipe_rows_csr(matrix, rows):
assert isinstance(matrix, sparse.csr_matrix)
# delete rows
for i in rows:
matrix.data[matrix.indptr[i]:matrix.indptr[i+1]] = 0.0
# Set the diagonal
d = matrix.diagonal()
d[rows] = 1.0
matrix.setdiag(d)
return
is by far the fastest method. It doesn't really remove the lines, but sets all entries to zeros, then fiddles with the diagonal.
If the entries are actually to be removed, one has to do some array manipulation. This can be quite costly, but if speed is no issue: This
def _wipe_row_csr(A, i):
'''Wipes a row of a matrix in CSR format and puts 1.0 on the diagonal.
'''
assert isinstance(A, sparse.csr_matrix)
n = A.indptr[i+1] - A.indptr[i]
assert n > 0
A.data[A.indptr[i]+1:-n+1] = A.data[A.indptr[i+1]:]
A.data[A.indptr[i]] = 1.0
A.data = A.data[:-n+1]
A.indices[A.indptr[i]+1:-n+1] = A.indices[A.indptr[i+1]:]
A.indices[A.indptr[i]] = i
A.indices = A.indices[:-n+1]
A.indptr[i+1:] -= n-1
return
replaces a given row i of the matrix matrix by the entry 1.0 on the diagonal.
np.in1d should be a faster way of finding the indices:
In [322]: I # from a np.arange(12).reshape(4,3) matrix
Out[322]: array([0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], dtype=int32)
In [323]: indices=[i for i, x in enumerate(I) if x in [1,2]]
In [324]: indices
Out[324]: [2, 3, 4, 5, 6, 7]
In [325]: ind1=np.in1d(I,[1,2])
In [326]: ind1
Out[326]:
array([False, False, True, True, True, True, True, True, False,
False, False], dtype=bool)
In [327]: np.where(ind1) # same as indices
Out[327]: (array([2, 3, 4, 5, 6, 7], dtype=int32),)
In [328]: I[~ind1] # same as the delete
Out[328]: array([0, 0, 3, 3, 3], dtype=int32)
Direct manipulation of the coo inputs like this often a good way. But another is to take advantage of the csr math abilities. You should be able to construct a diagonal matrix that zeros out the correct rows, and then adds the ones back in.
Here's what I have in mind:
In [357]: A=np.arange(16).reshape(4,4)
In [358]: M=sparse.coo_matrix(A)
In [359]: M.A
Out[359]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [360]: d1=sparse.diags([(1,0,0,1)],[0],(4,4))
In [361]: d2=sparse.diags([(0,1,1,0)],[0],(4,4))
In [362]: (d1*M+d2).A
Out[362]:
array([[ 0., 1., 2., 3.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 12., 13., 14., 15.]])
In [376]: x=np.ones((4,),bool);x[[1,2]]=False
In [378]: d1=sparse.diags([x],[0],(4,4),dtype=int)
In [379]: d2=sparse.diags([~x],[0],(4,4),dtype=int)
Doing this with lil format looks easy:
In [593]: Ml=M.tolil()
In [594]: Ml.data[wipe]=[[1]]*len(wipe)
In [595]: Ml.rows[wipe]=[[i] for i in wipe]
In [596]: Ml.A
Out[596]:
array([[ 0, 1, 2, 3],
[ 0, 1, 0, 0],
[ 0, 0, 1, 0],
[12, 13, 14, 15]], dtype=int32)
It's sort of what you are doing with csr format, but it's easy to replace each row list with the appropriate [1] and [i] list. But conversion times (tolil etc) can hurt run times.

Ranking a List of Numbers

I have a list:
somelist = [500, 600, 200, 1000]
I want to generate the rank order of that list:
rankorderofsomelist = [3, 2, 4, 1]
There are some complex solutions, but does anyone have any simple methods?
Since you've tagged this question scipy, you could use scipy.stats.rankdata:
>>> rankdata(somelist)
array([ 2., 3., 1., 4.])
>>> len(somelist) - rankdata(somelist)
array([ 2., 1., 3., 0.])
>>> len(somelist) - rankdata(somelist) + 1
array([ 3., 2., 4., 1.])
The real advantage is that you can specify how you want the corner cases to be treated:
>>> rankdata([0,1,1,2])
array([ 1. , 2.5, 2.5, 4. ])
>>> rankdata([0,1,1,2], method='min')
array([ 1, 2, 2, 4])
>>> rankdata([0,1,1,2], method='dense')
array([ 1, 2, 2, 3])
Simplest I can think of:
rankorder = sorted(range(len(thelist)), key=thelist.__getitem__)
This will, of course, produce [2, 1, 3, 0], because Python indexing is always zero-based -- if for some absolutely weird reason you need to add one to each index you can of course easily do so:
rankorder_weird = [1+x for x in rankorder]
Try this one-liner:
rankorderofsomelist = [sorted(somelist).index(x) for x in somelist]
Note that it'll behave as expected for a list with multiple entries of the same value (e.g. four instances of the same value, all of them the second-largest in the list, will all be ranked 2). Also note that Pythonic sorting is ascending (smallest to largest) and zero-based, so you may have to apply a final pass over the list to increment the ranks, reverse them, etc.
You can include that pass in the one-liner. To yield your desired result, just use:
rankorderofsomelist = [len(somelist)-(sorted(somelist).index(x)) for x in somelist]

How to calculate the mean of a stack of arrays?

my stack is something like this
array([[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]],
[[2, 2, 2],
[2, 2, 2],
[2, 2, 2]]])
I want this result:
array([[ 1.5, 2. , 2.5],
[ 3. , 3.5, 4. ],
[ 4.5, 5. , 5.5]])
I updated my question I think it's more clearer now.
Well, first, you don't have a stack of 2D arrays, you have three separate variables.
Fortunately, most functions in NumPy take an array_like argument. And the tuple (a, b, c) is "array-like" enough—it'll be converted into the 3D array that you should have had in the first place.
Anyway, the obvious function to take the mean is np.mean. As the docs say:
The average is taken over the flattened array by default, otherwise over the specified axis.
So just specify the axis you want—the newly-created axis 0.
np.mean((a,b,c), axis=0)
In your updated question, you now have a single 2x3x3 array, a, instead of three 2x2 arrays, a, b, and c, and you want the mean across the first axis (the one with dimension 2). This is the same thing, but slightly easier:
np.mean(a, axis=0)
Or course the mean of 4, 7, and 3 is 4.666666666666667, not 4. In your updated question, that seems to be what you want; in your original question… I'm not sure if you wanted to truncate or round, or if you wanted the median or something else rather than the mean, or anything else, but those are all easy (add dtype=int64 to the call, call .round() on the result, call median instead of mean, etc.).
>>> a = np.array([[1,2],[3,4]])
>>> b = np.array([[1,5],[6,7]])
>>> c = np.array([[1,8],[8,3]])
>>> np.mean((a,b,c), axis=0)
array([[ 1. , 5. ],
[ 5.66666667, 4.66666667]])
As per your output it seems you are looking for median rather than mean.
>>> np.median((a,b,c), axis=0)
array([[ 1., 5.],
[ 6., 4.]])

Does KNeighborsClassifier compare lists with different sizes?

I have to use Scikit Lean's KNeighborsClassifier to compare time series using an user defined function in Python.
knn = KNeighborsClassifier(n_neighbors=1,weights='distance',metric='pyfunc',func=dtw_dist)
The problem is that KNeighborsClassifier doens't seem to support my training data. They are time series, so they are lists with different sizes. KNeighborsClassifier gives me this error message when I try to use fit method (knn.fit(X,Y)):
ValueError: data type not understood
It seems KNeighborsClassifier only supports same size training sets (only time series with same lenght would be accepted, but that is not my case), but my teacher told me to use KNeighborsClassifier. So I don't know what to do...
Any ideas?
Two (or one...) options as far as I can tell:
Precompute the distances (not directly supported by KNeighborsClassifier it seems, other clustering algorithms do, e.g., Spectral Clustering).
Convert your data to be square using NaNs, and handling these accordingly in your custom distance function.
'Square' your data using NaNs
So, option 2 it is.
Say we have the following data, where every row represents a time series:
import numpy as np
series = [
[1,2,3,4],
[1,2,3],
[1],
[1,2,3,4,5,6,7,8]
]
We simply make the data square by adding nans:
def make_square(jagged):
# Careful: this mutates the series list of list
max_cols = max(map(len, jagged))
for row in jagged:
row.extend([None] * (max_cols - len(row)))
return np.array(jagged, dtype=np.float)
make_square(series)
array([[ 1., 2., 3., 4., nan, nan, nan, nan],
[ 1., 2., 3., nan, nan, nan, nan, nan],
[ 1., nan, nan, nan, nan, nan, nan, nan],
[ 1., 2., 3., 4., 5., 6., 7., 8.]])
Now the data 'fits' into the algorithm. You just have to adapt your distance function to account for the NaNs.
Precompute and use a cache function
Oh we can probably do option 1 too (assuming you have N time series):
Precompute the distances into a (N, N) distance matrix D
Create a (N, 1) matrix that is just a range between [0, N) (i.e., the index of the series in the distance matrix)
Create a distance function wrapper
Use this wrapper as the distance function.
wrapper function:
def wrapper(row1, row2):
# might have to fiddle a bit here, but i think this retrieves the indices.
i1, i2 = row1[0], row2[0]
return D[i1, i2]
Ok I hope its clear.
Complete example
#!/usr/bin/env python2.7
# encoding: utf-8
'''
'''
from mlpy import dtw_std # I dont know if you are using this one: it doesnt matter.
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
# Example data
series = [
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3],
[1],
[1, 2, 3, 4, 5, 6, 7, 8],
[1, 2, 5, 6, 7, 8],
[1, 2, 4, 5, 6, 7, 8],
]
# I dont know.. these seemed to make sense to me!
y = np.array([
0,
0,
0,
0,
1,
2,
2,
2
])
# Compute the distance matrix
N = len(series)
D = np.zeros((N, N))
for i in range(N):
for j in range(i+1, N):
D[i, j] = dtw_std(series[i], series[j])
D[j, i] = D[i, j]
print D
# Create the fake data matrix: just the indices of the timeseries
X = np.arange(N).reshape((N, 1))
# Create the wrapper function that returns the correct distance
def wrapper(row1, row2):
# cast to int to prevent warnings: sklearn converts our integer indices to floats.
i1, i2 = int(row1[0]), int(row2[0])
return D[i1, i2]
# Only the ball_tree algorith seems to accept a custom function
knn = KNeighborsClassifier(weights='distance', algorithm='ball_tree', metric='pyfunc', func=wrapper)
knn.fit(X, y)
print knn.kneighbors(X[0])
# (array([[ 0., 0., 0., 1., 6.]]), array([[1, 2, 0, 3, 4]]))
print knn.kneighbors(X[0])
# (array([[ 0., 0., 0., 1., 6.]]), array([[1, 2, 0, 3, 4]]))
print knn.predict(X)
# [0 0 0 0 1 2 2 2]

Categories

Resources