Creating a 2D python array to store data - python

Looking for a way to store this code in a 2D array in python. I've tried making a 1D array and then turning it into a 2D array but my code is still cumbersome and not working. The gap between 4 and 6 is not a typo. Any help would be greatly appreciated.
recno1inds11 = nonzero(data11[:,1]==no1)[0]
recno2inds11 = nonzero(data11[:,1]==no2)[0]
recno3inds11 = nonzero(data11[:,1]==no3)[0]
recno4inds11 = nonzero(data11[:,1]==no4)[0]
recno6inds11 = nonzero(data11[:,1]==no6)[0]
recno7inds11 = nonzero(data11[:,1]==no7)[0]
recno8inds11 = nonzero(data11[:,1]==no8)[0]
recno9inds11 = nonzero(data11[:,1]==no9)[0]
recno10inds11 = nonzero(data11[:,1]==no10)[0]
recno11inds11 = nonzero(data11[:,1]==no11)[0]
recno12inds11 = nonzero(data11[:,1]==no12)[0]
recno13inds11 = nonzero(data11[:,1]==no13)[0]
recno14inds11 = nonzero(data11[:,1]==no14)[0]
recno15inds11 = nonzero(data11[:,1]==no15)[0]
recno16inds11 = nonzero(data11[:,1]==no16)[0]
recno17inds11 = nonzero(data11[:,1]==no17)[0]
recno18inds11 = nonzero(data11[:,1]==no18)[0]
recno19inds11 = nonzero(data11[:,1]==no19)[0]
recno20inds11 = nonzero(data11[:,1]==no20)[0]
recno21inds11 = nonzero(data11[:,1]==no21)[0]
recno22inds11 = nonzero(data11[:,1]==no22)[0]
recno23inds11 = nonzero(data11[:,1]==no23)[0]
recno24inds11 = nonzero(data11[:,1]==no24)[0]
recno25inds11 = nonzero(data11[:,1]==no25)[0]
recno26inds11 = nonzero(data11[:,1]==no26)[0]
recno27inds11 = nonzero(data11[:,1]==no27)[0]
recno28inds11 = nonzero(data11[:,1]==no28)[0]
recno29inds11 = nonzero(data11[:,1]==no29)[0]
recno30inds11 = nonzero(data11[:,1]==no30)[0]

Normally, you don't want to have 30 separate variables like this, you want to have an array of 30 values.
And if you had that, this would be a one-liner; you could need to transpose the right-hand array into the second axis, then use the == operator.
>>> data11 = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> data11[:,1]
array([2, 5, 8])
>>> no1to5 = np.array([1, 2, 3, 4, 5])
>>> data11[:,1] == no1to5.reshape((5,1))
array([[False, False, False],
[ True, False, False],
[False, False, False],
[False, False, False],
[False, True, False]], dtype=bool)
Of course you can also apply nonzero, grab the first axis, … whatever you want to do, you can vectorize it as long as you have a vector in the first place, instead of a big collection of separate values that are only related by the meta-information in the variable names you happen to have bound them to.

Related

Find intervals of true values in vector

I am looking for a quick way to find the start and end indexes of each "block" of consecutive trues in a Vector.
Both julia or python would do the job for me. I'll write my example in julia syntax:
Say I have a vector
a = [false, true, true, true, false, true, false, true, true, false]
what I want to get is something like this (with 1-based indexing):
[[2, 4], [6, 6], [8, 9]]
The exact form/type of the returned value does not matter, I am mostly looking for a quick and syntactically easy solution. Single trues surrounded by falses should also be detected, as given in my example.
My use-case with this is that I want to find intervals in a Vector of data where the values are below a certain threshold. So I get a boolean array from my data where this is true. Ultimately I want to shade these intervals in a plot, for which I need the start and end indeces of each interval.
function intervals(a)
jumps = diff([false; a; false])
zip(findall(jumps .== 1), findall(jumps .== -1) .- 1)
end
Quick in terms of keystrokes, maybe not in performance or readability :)
My use-case with this is that I want to find intervals in a Vector of data where the values are below a certain threshold.
Let's say your vector is v and your threshold is 7:
julia> println(v); threshold
[9, 6, 1, 9, 5, 9, 4, 5, 6, 1]
7
You can use findall to get the indices where the value is below the threshold, and get the boundaries from that:
julia> let start = 1, f = findall(<(threshold), v), intervals = Tuple{Int, Int}[]
for i in Iterators.drop(eachindex(f), 1)
if f[i] - f[i - 1] > 1
push!(intervals, (f[start], f[i - 1]))
start = i
end
end
push!(intervals, (f[start], last(f)))
end
3-element Vector{Tuple{Int64, Int64}}:
(2, 3)
(5, 5)
(7, 10)
Here's a version that avoids running findall first, and is a bit faster as a consequence:
function intervals(v)
ints = UnitRange{Int}[]
i = firstindex(v)
while i <= lastindex(v)
j = findnext(v, i) # find next true
isnothing(j) && break
k = findnext(!, v, j+1) # find next false
isnothing(k) && (k = lastindex(v)+1)
push!(ints, j:k-1)
i = k+1
end
return ints
end
It also returns a vector of UnitRanges, since that seemed a bit more natural to me.
try this:
a = [False, True, True, True, False, True, False, True, True, False]
index = 0
foundTrue = False
booleanList = []
sublist = []
for i in a:
index += 1
if foundTrue:
if i == False:
foundTrue = False
sublist.append(index-1)
booleanList.append(sublist)
sublist = []
else:
if i == True:
foundTrue = True
sublist.append(index)
print(booleanList)
output should be: [[2, 4], [6, 6], [8, 9]]
This iterates in the a list and when it finds a True it marks a flag (foundTrue) and stores its index on sublist. Now with the maked flag (foundTrue), if it finds a False, then we store the previous index from that False into sublist, appends it to the booleanList and resets sublist.
This is not the shortest but very fast without using any find functions.
function find_intervals(v)
i = 0
res = Tuple{Int64, Int64}[]
while (i+=1) <= length(v)
v[i] || continue
s = f = i
while i < length(v) && v[i+=1]
f = i
end
push!(res, (s,f))
end
res
end
For a = [false, true, true, true, false, true, false, true, true, false], it gives:
find_intervals(a)
3-element Vector{Tuple{Int64, Int64}}:
(2, 4)
(6, 6)
(8, 9)

Numpy 2D indexing of a 1D array with known min, max indices

I have a 1D numpy array of False booleans, and a 2D numpy array containing the min,max indices of values in the first array to change to True.
An example:
my_data = numpy.zeros((10,), dtype=bool)
inds2true = numpy.array([[1, 3], [8, 9]])
And I want the following result:
out = numpy.array([False, True, True, True, False, False, False, False, True, True])
How is this possible in Python with Numpy?
Edit: I would like this to be performed in one step (i.e. no looping).
There's one rule-breaking hack:
my_data[inds2true] = True
my_data = np.cumsum(my_data) % 2 == 1
my_data
>>> array([False, True, True, False, False, False, False, False, True, False])
The most common practise is to change indices within np.arange([1, 3]) and np.arange([8, 9]), not including 3 or 9. If you still want to include them, do in addition: my_data[inds2true[:, 1]] = True
If you're looking for other options to do it in one go, the most probably it will include np.cumsum tricks.
import numpy as np
my_data = np.zeros((10,), dtype=bool)
inds2true = np.array([[1, 3], [8, 9]])
indeces = []
for ix_range in inds2true:
indeces += list(range(ix_range[0], ix_range[1] + 1))
my_data[indeces] = True

Create slice mask in pytorch?

Is there a way to specify a mask based on a slice operation?
For example
A = torch.arange(6).view((2,3))
# A = [[0,1,2], [3,4,5]]
mask_slice = torch.mask_slice(A[:,1:])
# mask_slice = [[0,1,1],[0,1,1]]
You can do something like this (if I got your question right):
mask_slice = torch.zeros(A.shape, dtype=bool)
mask_slice[:, 1:] = 1
# tensor([[False, True, True],
# [False, True, True]])

Replace values in specific columns of a numpy array

I have a N x M numpy array (matrix). Here is an example with a 3 x 5 array:
x = numpy.array([[0,1,2,3,4,5],[0,-1,2,3,-4,-5],[0,-1,-2,-3,4,5]])
I'd like to scan all the columns of x and replace the values of each column if they are equal to a specific value.
This code for example aims to replace all the negative values (where the value is equal to the column number) to 100:
for i in range(1,6):
x[:,i == -(i)] = 100
This code obtains this warning:
DeprecationWarning: using a boolean instead of an integer will result in an error in the future
I'm using numpy 1.8.2. How can I avoid this warning without downgrade numpy?
I don't follow what your code is trying to do:
the i == -(i)
will evaluate to something like this:
x[:, True]
x[:, False]
I don't think this is what you want. You should try something like this:
for i in range(1, 6):
mask = x[:, i] == -i
x[:, i][mask] = 100
Create a mask over the whole column, and use that to change the values.
Even without the warning, the code you have there will not do what you want. i is the loop index and will equal minus itself only if i == 0, which is never. Your test will always return false, which is cast to 0. In other words your code will replace the first element of each row with 100.
To get this to work I would do
for i in range(1, 6):
col = x[:,i]
col[col == -i] = 100
Notice that you use the name of the array for the masking and that you need to separate the conventional indexing from the masking
If you are worried about the warning spewing out text, then ignore it as a Warning/Exception:
import numpy
import warnings
warnings.simplefilter('default') # this enables DeprecationWarnings to be thrown
x = numpy.array([[0,1,2,3,4,5],[0,-1,2,3,-4,-5],[0,-1,-2,-3,4,5]])
with warnings.catch_warnings():
warnings.simplefilter("ignore") # and this ignores them
for i in range(1,6):
x[:,i == -(i)] = 100
print(x) # just to show that you are actually changing the content
As you can see in the comments, some people are not getting DeprecationWarning. That is probably because python suppresses developer-only warnings since 2.7
As others have said, your loop isn't doing what you think it is doing. I would propose you change your code to use numpy's fancy indexing.
# First, create the "test values" (column index):
>>> test_values = numpy.arange(6)
# test_values is array([0, 1, 2, 3, 4, 5])
#
# Now, we want to check which columns have value == -test_values:
#
>>> mask = (x == -test_values) & (x < 0)
# mask is True wherever a value in the i-th column of x is negative i
>>> mask
array([[False, False, False, False, False, False],
[False, True, False, False, True, True],
[False, True, True, True, False, False]], dtype=bool)
#
# Now, set those values to 100
>>> x[mask] = 100
>>> x
array([[ 0, 1, 2, 3, 4, 5],
[ 0, 100, 2, 3, 100, 100],
[ 0, 100, 100, 100, 4, 5]])

Determine sum of numpy array while excluding certain values

I would like to determine the sum of a two dimensional numpy array. However, elements with a certain value I want to exclude from this summation. What is the most efficient way to do this?
For example, here I initialize a two dimensional numpy array of 1s and replace several of them by 2:
import numpy
data_set = numpy.ones((10, 10))
data_set[4][4] = 2
data_set[5][5] = 2
data_set[6][6] = 2
How can I sum over the elements in my two dimensional array while excluding all of the 2s? Note that with the 10 by 10 array the correct answer should be 97 as I replaced three elements with the value 2.
I know I can do this with nested for loops. For example:
elements = []
for idx_x in range(data_set.shape[0]):
for idx_y in range(data_set.shape[1]):
if data_set[idx_x][idx_y] != 2:
elements.append(data_set[idx_x][idx_y])
data_set_sum = numpy.sum(elements)
However on my actual data (which is very large) this is too slow. What is the correct way of doing this?
Use numpy's capability of indexing with boolean arrays. In the below example data_set!=2 evaluates to a boolean array which is True whenever the element is not 2 (and has the correct shape). So data_set[data_set!=2] is a fast and convenient way to get an array which doesn't contain a certain value. Of course, the boolean expression can be more complex.
In [1]: import numpy as np
In [2]: data_set = np.ones((10, 10))
In [4]: data_set[4,4] = 2
In [5]: data_set[5,5] = 2
In [6]: data_set[6,6] = 2
In [7]: data_set[data_set != 2].sum()
Out[7]: 97.0
In [8]: data_set != 2
Out[8]:
array([[ True, True, True, True, True, True, True, True, True,
True],
[ True, True, True, True, True, True, True, True, True,
True],
...
[ True, True, True, True, True, True, True, True, True,
True]], dtype=bool)
Without numpy, the solution is not much more complex:
x = [1,2,3,4,5,6,7]
sum(y for y in x if y != 7)
# 21
Works for a list of excluded values too:
# set is faster for resolving `in`
exl = set([1,2,3])
sum(y for y in x if y not in exl)
# 22
Using np.sums where= argument, we avoid the need for array copying which would otherwise be triggered from using advanced array indexing:
>>> import numpy as np
>>> data_set = np.ones((10,10))
>>> data_set[(4,5,6),(4,5,6)] = 2
>>> np.sum(data_set, where=data_set != 2)
97.0
>>> data_set.sum(where=data_set != 2)
97.0
https://numpy.org/doc/stable/reference/generated/numpy.sum.html
Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).
https://numpy.org/doc/stable/user/basics.indexing.html#advanced-indexing
How about this way that makes use of numpy's boolean capabilities.
We simply set all the values that meet the specification to zero before taking the sum, that way we don't change the shape of the array as we would if we were to filter them from the array.
The other benefit of this is that it means we can sum along axis after the filter is applied.
import numpy
data_set = numpy.ones((10, 10))
data_set[4][4] = 2
data_set[5][5] = 2
data_set[6][6] = 2
print "Sum", data_set.sum()
another_set = numpy.array(data_set) # Take a copy, we'll need that later
data_set[data_set == 2] = 0 # Set all the values that are 2 to zero
print "Filtered sum", data_set.sum()
print "Along axis", data_set.sum(0), data_set.sum(1)
Equally we could use any other boolean to set the data we wish to exclude from the sum.
another_set[(another_set > 1) & (another_set < 3)] = 0
print "Another filtered sum", another_set.sum()

Categories

Resources