I'm trying to create an array based on values from another data frame in Python. I want it to fill the array as such.
If x > or = 3 in the dataframe then it inputs a 0 in the array.
If x < 3 in the dataframe then it inputs a 1 in the array.
If x = 0 in the dataframe then it inputs a 0 in the array.
Below is the code I have so far but the result is coming out as just [0]
array = np.array([])
for x in df["disc"]:
for y in array:
if x >= 3:
y=0
elif x < 3:
y=1
else:
y=0
Any help would be much appreciated thank you.
When working with numpy arrays, it is more efficient if you can avoid using explicit loops in Python at all. (The actual looping takes place inside compiled C code.)
disc = df["disc"]
# make an array containing 0 where disc >= 3, elsewhere 1
array = np.where(disc >= 3, 0, 1)
# now set it equal to 0 in any places where disc == 0
array[disc == 0] = 0
It could also be done in a single statement (other than the initial assignment of disc) using:
array = np.where((disc >= 3) | (disc == 0), 0, 1)
Here the | does an element-by-element "or" test on the boolean arrays. (It has higher precedence than comparison operators, so the parentheses around the comparisons are needed.)
This is a simple problem. There are many ways to solve this, I think the easiest way is to use a list. You can use a list and append the values according to the conditions.
array = []
for x in df["disc"]:
if x >= 3:
array.append(0)
elif x < 3:
array.append(1)
else:
array.append(0)
Your code doesn't seem to be doing anything to the array, as you are trying to modify the variable y, rather than the array itself. y doesn't reference the array, it just holds the values found. The second loop also doesn't do anything due to the array being empty - it's looping through 0 elements. What you need rather than another for loop is to simply append to the array.
With a list, you would use the .append() method to add an element, however as you appear to be using numpy, you'd want to use the append(arr, values) function it provides, like so:
array = np.array([])
for x in df["disc"]:
if x >= 3:
array = np.append(array, 0)
elif x < 3:
array = np.append(array, 1)
else:
array = np.append(array, 0)
I'll also note that these conditions can be simplified to combine the two branches which append a 0. Namely, if x < 3 and x is not 0, then add a 1, otherwise add a 0. Thus, the code can be rewriten as follows.
array = np.array([])
for x in df["disc"]:
if x < 3 and x != 0:
array = np.append(array, 1)
else:
array = np.append(array, 0)
Related
determine whether all values of a certain range are used in the array and at the same time there are no values in the array that will not be in the range. For example, the range is [1,5], and the array is [1,2,3,4,5] - everything is correct. Or the range [1,5], and the array [1,2,1,2,3,3,4,5] - everything is also true.
the range is [1,5], and the array [0,2,2,3,3,4,5] is already incorrect since there is no 0 in the range, and there is also a 1 missing in the array
I did this, but it's slow with big values and terrible:
def func(segment, arr):
segment_arr = []
for i in range(segment[0], segment[1] + 1):
segment_arr.append(i)
arr_corr = True
if min(arr) != min(segment_arr) or max(arr) != max(segment_arr):
arr_corr = False
else:
for i in range(len(arr)):
if arr[i] in segment_arr:
for a in range(len(segment_arr)):
if segment_arr[a] in arr:
continue
else:
arr_corr = False
else:
arr_corr = False
return arr_corr
To test if every member of a list/array/range x is in another list/array y:
all(e in y for e in x)
to test if only members of a list/array/range x are in another list/array y:
all(e in x for e in y)
The speed of these operations depends on the container type. all short circuits, so it will be faster when the test fails. in is very fast on set but can be slow on lists. Creation of set is slow and can eliminate the gains from the speed of doing in on a set. If you are working with numpy arrays, it will be faster to use numpy's intersect.
This should do what you're asking. If it is too slow, you will need to optimize the types. At that point, you will probably need to edit the question to give some clear examples of when it is too slow and what your constraints are:
def func(segment, arr):
return all(e in segment for e in arr) and all(e in arr for e in segment)
def func(segment, arr):
return set(list(range(segment[0], segment[1]+1))) == set(arr)
Build a set from allowed values
If the resulting set has "full" length, all is correct.
left, right = [1, 5]
values = [0, 2, 3, 4, 5]
values_are_correct = (
len({i for i in values if left <= i <= right}) == (right - left) + 1
)
I am having problems with while loops in python right now:
while(j < len(firstx)):
trainingset[j][0] = firstx[j]
trainingset[j][1] = firsty[j]
trainingset[j][2] = 1 #top
print(trainingset)
j += 1
print("j is " + str(j))
i = 0
while(i < len(firsty)):
trainingset[j+i][0] = twox[i]
trainingset[j+i][1] = twoy[i]
trainingset[j+i][2] = 0 #bottom
i += 1
Where trainingset = [[0,0,0]]*points*2 where points is a number. Also firstx and firsty and twox and twoy are all numpy arrays.
I want the training set to have 2*points array entries which go [firstx[0], firsty[0], 1] all the way to [twox[points-1], twoy[points-1], 0].
After some debugging, I am realizing that for each iteration, **every single value **in the training set is being changed, so that when j = 0 all the training set values are replaced with firstx[0], firsty[0], and 1.
What am I doing wrong?
In this case, I would recommend a for loop instead of a while loop; for loops are great for when you want the index to increment and you know what the last increment value should be, which is true here.
I've had to make some assumptions about the shape of your arrays based on your code. I'm assuming that:
firstx, firsty, twox, and twoy are 1D NumPy arrays with either shape (length,) or (length, 1).
trainingset is a 2D NumPy array with least len(firstx) + len(firsty) rows and at least 3 columns.
j starts at 0 before the while loop begins.
Given these assumptions, here's some code that gives you the output you want:
len_firstx = len(firstx)
# Replace each row with index j with the desired values
for j in range(len(firstx)):
trainingset[j][0:3] = [firstx[j], firsty[j], 1]
# Because i starts at 0, len_firstx needs to be added to the trainingset row index
for i in range(len(firsty)):
trainingset[i + len_firstx][0:3] = [twox[i], twoy[i], 0]
Let me know if you have any questions.
EDIT: Alright, looks like the above doesn't work correctly on rare occasions, not sure why, so if it's being fickle, you can change trainingset[j][0:3] and trainingset[i + len_firstx][0:3] to trainingset[j, 0:3] and trainingset[i + len_firstx, 0:3].
I think it has something to do with the shape of the trainingset array, but I'm not quite sure.
EDIT 2: There's also a more Pythonic way to do what you want instead of using loops. It standardizes the shapes of the four arrays assumed to be 1D (firstx, twox, etc. -- also, if you could let me know exactly what shape these arrays have, that would be super helpful and I could simplify the code!) and then makes the appropriate rows and columns in trainingset have the corresponding values.
# Function to reshape the 1D arrays.
# Works for shapes (length,), (length, 1), and (1, length).
def reshape_1d_array(arr):
shape = arr.shape
if len(shape) == 1:
return arr[:, None]
elif shape[0] >= shape[1]:
return arr
else:
return arr.T
# Reshape the 1D arrays
firstx = reshape_1d_array(firstx)
firsty = reshape_1d_array(firsty)
twox = reshape_1d_array(twox)
twoy = reshape_1d_array(twoy)
len_firstx = len(firstx)
# The following 2 lines do what the loops did, but in 1 step.
arr1 = np.concatenate((firstx, firsty[0:len_firstx], np.array([[1]*len_firstx]).T), axis=1)
arr2 = np.concatenate((twox, twoy[0:len_firstx], np.array([[0]*len_firstx]).T), axis=1)
# Now put arr1 and arr2 where they're supposed to go in trainingset.
trainingset[:len_firstx, 0:3] = arr1
trainingset[len_firstx:len_firstx + len(firsty), 0:3] = arr2
This gives the same result as the two for loops I wrote before, but is faster if firstx has more than ~50 elements.
I have an array of coordinates, and I would like to split the array into two arrays dependent on the Y value when there is a large gap in the Y value. This post: Split an array dependent on the array values in Python does it dependent on the x value, and the method I use is like this:
array = [[1,5],[3,5],[6,7],[8,7],[25,25],[26,50],.....]
n = len(array)
for i in range(n-1):
if abs(array[i][0] - array[i+1][0]) >= 10:
arr1 = array[:i+1]
arr2 = array[i+1:]
I figured that when I want to split it dependent on the Y value I could just change:
if abs(array[i][0] - array[i+1][0]) to if abs(array[0][i] - array[0][i+1])
This does not work and I get IndexError: list index out of range.
I'm quite new to coding and I'm wondering why this does not work for finding gap in Y value when it works for finding the gap in the X value?
Also, how should I go about splitting the array depending on the Y value?
Any help is much appreciated!
you have to switch to this:
array = [[1,5],[3,5],[6,7],[8,7],[25,25],[26,50]]
n = len(array)
for i in range(n-1):
if abs(array[i][1] - array[i+1][1]) >= 10:
arr1 = array[:i+1]
arr2 = array[i+1:]
I have a numpy pixel array (either 0 or 255), using .where I pull the tuples of where it is > 0. I now want to use those tuples to add 1 to a separate 2D numpy array. Is the best way to just use a for loop as shown below or is there a better numpy-like way?
changedtuples = np.where(imageasarray > 0)
#If there's too much movement change nothing.
if frame_size[0]*frame_size[1]/2 < changedtuples[0].size:
print "No change- too much movement."
elif changedtuples[0].size == 0:
print "No movement detected."
else:
for x in xrange(changedtuples[0].size):
movearray[changedtuples[1][x],changedtuples[0][x]] = movearray[changedtuples[1][x],changedtuples[0][x]]+1
movearray[imageasarray.T > 0] += 1
where is redundant. You can index an array with a boolean mask, like that produced by imageasarray.T > 0, to select all array cells where the mask has a True. The += then adds 1 to all those cells. Finally, the T is a transpose, since it looks like you're switching the indices around when you increment the cells of movearray.
I am working with data from netcdf files, with multi-dimensional variables, read into numpy arrays. I need to scan all values in all dimensions (axes in numpy) and alter some values. But, I don't know in advance the dimension of any given variable. At runtime I can, of course, get the ndims and shapes of the numpy array.
How can I program a loop thru all values without knowing the number of dimensions, or shapes in advance? If I knew a variable was exactly 2 dimensions, I would do
shp=myarray.shape
for i in range(shp[0]):
for j in range(shp[1]):
do_something(myarray[i][j])
You should look into ravel, nditer and ndindex.
# For the simple case
for value in np.nditer(a):
do_something_with(value)
# This is similar to above
for value in a.ravel():
do_something_with(value)
# Or if you need the index
for idx in np.ndindex(a.shape):
a[idx] = do_something_with(a[idx])
On an unrelated note, numpy arrays are indexed a[i, j] instead of a[i][j]. In python a[i, j] is equivalent to indexing with a tuple, ie a[(i, j)].
You can use the flat property of numpy arrays, which returns a generator on all values (no matter the shape).
For instance:
>>> A = np.array([[1,2,3],[4,5,6]])
>>> for x in A.flat:
... print x
1
2
3
4
5
6
You can also set the values in the same order they're returned, e.g. like this:
>>> A.flat[:] = [x / 2 if x % 2 == 0 else x for x in A.flat]
>>> A
array([[1, 1, 3],
[2, 5, 3]])
I am not sure the order in which flat returns the elements is guaranteed in any way (as it iterates through the elements as they are in memory, so depending on your array convention you are likely to have it always being the same, unless you are really doing it on purpose, but be careful...)
And this will work for any dimension.
** -- Edit -- **
To clarify what I meant by 'order not guaranteed', the order of elements returned by flat does not change, but I think it would be unwise to count on it for things like row1 = A.flat[:N], although it will work most of the time.
This might be the easiest with recursion:
a = numpy.array(range(30)).reshape(5, 3, 2)
def recursive_do_something(array):
if len(array.shape) == 1:
for obj in array:
do_something(obj)
else:
for subarray in array:
recursive_do_something(subarray)
recursive_do_something(a)
In case you want the indices:
a = numpy.array(range(30)).reshape(5, 3, 2)
def do_something(x, indices):
print(indices, x)
def recursive_do_something(array, indices=None):
indices = indices or []
if len(array.shape) == 1:
for obj in array:
do_something(obj, indices)
else:
for i, subarray in enumerate(array):
recursive_do_something(subarray, indices + [i])
recursive_do_something(a)
Look into Python's itertools module.
Python 2: http://docs.python.org/2/library/itertools.html#itertools.product
Python 3: http://docs.python.org/3.3/library/itertools.html#itertools.product
This will allow you to do something along the lines of
for lengths in product(shp[0], shp[1], ...):
do_something(myarray[lengths[0]][lengths[1]]