I'm trying to iterate through a two dimensional array in Python and compare items in the array to ints, however I am faced with a ton of various errors whenever I attempt to do such. I'm using numpy and pandas.
My dataset is created as follows:
filename = "C:/Users/User/My Documents/JoeTest.csv"
datas = pandas.read_csv(filename)
dataset = datas.values
Then, I attempt to go through the data, grabbing certain elements of it.
def model_building(data):
global blackKings
flag = 0;
blackKings.append(data[0][1])
for i in data:
if data[i][39] == 1:
if data[i][40] == 1:
values.append(1)
else:
values.append(-1)
else:
if data[i][40] == 1:
values.append(-1)
else:
values.append(1)
for j in blackKings:
if blackKings[j] != data[i][1]:
flag = 1
if flag == 1:
blackKings.append(data[i][1])
flag = 0;
However, doing so leaves me with a ValueError: The Truth value of an array with more than one element is ambiguous. Use a.any() or a.all(). I don't want to use either of these, as I'm looking to compare the actual value of that one specific instance. Is there another way around this problem?
You need to tell us something about this: dataset = datas.values
It's probably a 2d array, since it derives from a load of a csv. But what shape and dtype? Maybe even a sample of the array.
Is that the data argument in the function?
What are blackKings and values? You treat them like lists (with append).
for i in data:
if data[i][39] == 1:
This doesn't make sense. for i in data, if data is 2d, i is the the first row, then the second row, etc. If you want i to in an index, you use something like
for i in range(data.shape[0]):
2d array indexing is normally done with data[i,39].
But in your case data[i][39] is probably an array.
Anytime you use an array in a if statement, you'll get this ValueError, because there are multiple values.
If i were proper indexes, then data[i,39] would be a single value.
To illustrate:
In [41]: data=np.random.randint(0,4,(4,4))
In [42]: data
Out[42]:
array([[0, 3, 3, 2],
[2, 1, 0, 2],
[3, 2, 3, 1],
[1, 3, 3, 3]])
In [43]: for i in data:
...: print('i',i)
...: print('data[i]',data[i].shape)
...:
i [0 3 3 2] # 1st row
data[i] (4, 4)
i [2 1 0 2] # a 4d array
data[i] (4, 4)
...
Here i is a 4 element array; using that to index data[i] actually produces a 4 dimensional array; it isn't selecting one value, but rather many values.
Instead you need to iterate in one of these ways:
In [46]: for row in data:
...: if row[3]==1:
...: print(row)
[3 2 3 1]
In [47]: for i in range(data.shape[0]):
...: if data[i,3]==1:
...: print(data[i])
[3 2 3 1]
To debug a problem like this you need to look at intermediate values, and especially their shapes. Don't just assume. Check!
I'm going to attempt to rewrite your function
def model_building(data):
global blackKings
blackKings.append(data[0, 1])
# Your nested if statements were performing an xor
# This is vectorized version of the same thing
values = np.logical_xor(*(data.T[[39, 40]] == 1)) * -2 + 1
# not sure where `values` is defined. If you really wanted to
# append to it, you can do
# values = np.append(values, np.logical_xor(*(data.T[[39, 40]] == 1)) * -2 + 1)
# Your blackKings / flag logic can be reduced
mask = (blackKings[:, None] != data[:, 1]).all(1)
blackKings = np.append(blackKings, data[:, 1][mask])
This may not be perfect because it is difficult to parse your logic considering you are missing some pieces. But hopefully you can adopt some of what I've included here and improve your code.
Related
I'm trying to do a sorting algorithm and I need to do something like this:
arr = numpy.array([2,4,5,6])
#some function here or something
array([5,2,4,6])
#element "5" moved from position 3 to 1, and all of the others moved up one position
What I mean is I want to change an element's position (index) and move all the other elements up one position. Is this possible?
You could use numpy.roll() with a subset assignment:
arr = numpy.array([2,4,5,6])
arr[:3] = numpy.roll(arr[:3],1)
print(arr)
[5 2 4 6]
If you know the position/index of the element to be shifted, then you could do:
indx = 2
np.r_[arr[indx], np.delete(arr, indx)]
Out[]: array([5, 2, 4, 6])
You could do this in-place without the need of creating first an intermediate array of n-1 elements and then creating another final array by concatenation. Instead, you could try this:
idx = 2
tmp = arr[idx]
arr[1:idx+1] = arr[0:idx]
arr[0] = tmp
So there are more than one ways of doing this, and the choice depends upon your algorithm's constraints.
I have an array y_filtered that contains some masked values. I want to replace these values by some value I calculate based on their neighbouring values. I can get the indices of the masked values by using masked_slices = ma.clump_masked(y_filtered). This returns a list of slices, e.g. [slice(194, 196, None)].
I can easily get the values from my masked array, by using y_filtered[masked_slices], and even loop over them. However, I need to access the index of the values as well, so i can calculate its new value based on its neighbours. Enumerate (logically) returns 0, 1, etc. instead of the indices I need.
Here's the solution I came up with.
# get indices of masked data
masked_slices = ma.clump_masked(y_filtered)
y_enum = [(i, y_i) for i, y_i in zip(range(len(y_filtered)), y_filtered)]
for sl in masked_slices:
for i, y_i in y_enum[sl]:
# simplified example calculation
y_filtered[i] = np.average(y_filtered[i-2:i+2])
It is very ugly method i.m.o. and I think there has to be a better way to do this. Any suggestions?
Thanks!
EDIT:
I figured out a better way to achieve what I think you want to do. This code picks every window of 5 elements and compute its (masked) average, then uses those values to fill the gaps in the original array. If some index does not have any unmasked value close enough it will just leave it as masked:
import numpy as np
from numpy.lib.stride_tricks import as_strided
SMOOTH_MARGIN = 2
x = np.ma.array(data=[1, 2, 3, 4, 5, 6, 8, 9, 10],
mask=[0, 1, 0, 0, 1, 1, 1, 1, 0])
print(x)
# [1 -- 3 4 -- -- -- -- 10]
pad_data = np.pad(x.data, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant')
pad_mask = np.pad(x.mask, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant',
constant_values=True)
k = 2 * SMOOTH_MARGIN + 1
isize = x.dtype.itemsize
msize = x.mask.dtype.itemsize
x_pad = np.ma.array(
data=as_strided(pad_data, (len(x), k), (isize, isize), writeable=False),
mask=as_strided(pad_mask, (len(x), k), (msize, msize), writeable=False))
x_avg = np.ma.average(x_pad, axis=1).astype(x_pad.dtype)
fill_mask = ~x_avg.mask & x.mask
result = x.copy()
result[fill_mask] = x_avg[fill_mask]
print(result)
# [1 2 3 4 3 4 10 10 10]
(note all the values are integers here because x was originally of integer type)
The original posted code has a few errors, firstly it both reads and writes values from y_filtered in the loop, so the results of later indices are affected by the previous iterations, this could be fixed with a copy of the original y_filtered. Second, [i-2:i+2] should probably be [max(i-2, 0):i+3], in order to have a symmetric window starting at zero or later always.
You could do this:
from itertools import chain
# get indices of masked data
masked_slices = ma.clump_masked(y_filtered)
for idx in chain.from_iterable(range(s.start, s.stop) for s in masked_slices):
y_filtered[idx] = np.average(y_filtered[max(idx - 2, 0):idx + 3])
I want to make a function which takes a n-dimensional array, the dimension and the column index, and it will return the (n-1)-dimensional array after removing all the other columns of that specific dimension.
Here is the code I am using now
a = np.arange(6).reshape((2, 3)) # the n-dimensional array
axisApplied = 1
colToKeep = 0
colsToDelete = np.delete(np.arange(a.shape[axisApplied]), colToKeep)
a = np.squeeze(np.delete(a, colsToDelete, axisApplied), axis=axisApplied)
print(a)
# [0, 3]
Note that I have to manually calculate the n-1 indices (the complement of the specific column index) to use np.delete(), and I am wondering whether there is a more convenient way to achieve my goal, e.g. specify which column to keep directly.
Thank you for reading and I am welcome to any suggestions.
In [1]: arr = np.arange(6).reshape(2,3)
In [2]: arr
Out[2]:
array([[0, 1, 2],
[3, 4, 5]])
Simple indexing:
In [3]: arr[:,0]
Out[3]: array([0, 3])
Or if you need to used the general axis parameter, try take:
In [4]: np.take(arr,0,axis=1)
Out[4]: array([0, 3])
Picking one element, or a list of elements, along an axis is a lot easier than deleting some. Look at the code for np.delete.
I have a question regarding python and selecting elements within a range.
If I have a n x m matrix with n row and m columns, I have a defined range for each column (so I have m min and max values).
Now I want to select those rows, where all values are within the range.
Looking at the following example:
input = matrix([[1, 2], [3, 4],[5,6],[1,8]])
boundaries = matrix([[2,1],[8,5]])
#Note:
#col1min = 2
#col1max = 8
#col2min = 1
#col2max = 5
print(input)
desired_result = matrix([[3, 4]])
print(desired_result)
Here, 3 rows where discarded, because they contained values beyond the boundaries.
While I was able to get values within one range for a given array, I did not manage to solve this problem efficiently.
Thank you for your help.
I believe that there is more elegant solution, but i came to this:
def foo(data, boundaries):
zipped_bounds = list(zip(*boundaries))
output = []
for item in data:
for index, bound in enumerate(zipped_bounds):
if not (bound[0] <= item[index] <= bound[1]):
break
else:
output.append(item)
return output
data = [[1, 2], [3, 4], [5, 6], [1, 8]]
boundaries = [[2, 1], [8, 5]]
foo(data, boundaries)
Output:
[[3, 4]]
And i know that there is not checking and raising exceptions if the sizes of arrays won't match each concrete size. I leave it OP to implement this.
Your example data syntax is not correct matrix([[],..]) so it needs to be restructured like this:
matrix = [[1, 2], [3, 4],[5,6],[1,8]]
bounds = [[2,1],[8,5]]
I'm not sure exactly what you mean by "efficient", but this solution is readable, computationally efficient, and modular:
# Test columns in row against column bounds or first bounds
def row_in_bounds(row, bounds):
for ci, colVal in enumerate(row):
bi = ci if len(bounds[0]) >= ci + 1 else 0
if not bounds[1][bi] >= colVal >= bounds[0][bi]:
return False
return True
# Use a list comprehension to apply test to n rows
print ([r for r in matrix if row_in_bounds(r,bounds)])
>>>[[3, 4]]
First we create a reusable test function for rows accepting a list of bounds lists, tuples are probably more appropriate, but I stuck with list as per your specification.
Then apply the test to your matrix of n rows with a list comprehension. If n exceeds the bounds column index or the bounds column index is falsey use the first set of bounds provided.
Keeping the row iterator out of the row parser function allows you to do things like get min/max from the filtered elements as required. This way you will not need to define a new function for every manipulation of the data required.
I have a 2D numpy array that I need to take the max of along a specific axis. I then need to later know which indexes were selected for this operation as a mask for another operation which is only done on those same indexes but on another array of the same shape.
Right how I'm doing it by using 2d array indexing, but it's slow and kind of convoluted, particularly the mgrid hack to generate the row indexes. It's just [0,1] for this example but I need the robustness to work with arbitrary shapes.
a = np.array([[0,0,5],[0,0,5]])
b = np.array([[1,1,1],[1,1,1]])
columnIndexes = np.argmax(a,axis=1)
rowIndexes = np.mgrid[0:a.shape[0],0:columnIdx.size-1][0].flatten()
b[rowIndexes,columnIndexes] = b[rowIndexes,columnIndexes]+1
B should now be array([[1,1,2],[1,1,2]]) since it preformed the operation on b for only the indexes of the max along the columns of a.
Anyone know a better way? Preferably using just boolean masking arrays so that I can port this code to run on a GPU without too much hassle. Thanks!
I will suggest an answer but with slightly different data.
c = np.array([[0,1,1],[2,1,0]]) # note that this data has dupes for max in row 1
d = np.array([[0,10,10],[20,10,0]]) # data to be chaged
c_argmax = np.argmax(c,axis=1)[:,np.newaxis]
b_map1 = c_argmax == np.arange(c.shape[1])
# now use the bool map as you described
d[b_map1] += 1
d
[out]
array([[ 0, 11, 10],
[21, 10, 0]])
Note that I created an original with a duplicate of the largest number. The above works with argmax as you requested but you might have wanted to increment all max values. as in:
c_max = np.max(c,axis=1)[:,np.newaxis]
b_map2 = c_max == c
d[b_map2] += 1
d
[out]
array([[ 0, 12, 11],
[22, 10, 0]])