Numpy: copying numpy array at specific indexes - python

I have arrays like
arr1['a'] = np.array([1, 1, 1])
arr1['b'] = np.array([1, 1, 1])
arr1['c'] = np.array([1, 1, 1])
b_index = [0, 2, 5]
arr2['a'] = np.array([2, 2, 2, 2, 2, 2])
arr2['b'] = np.array([2, 2, 2, 2, 2, 2])
arr2['c'] = np.array([2, 2, 2, 2, 2, 2])
arr2['f'] = np.array([2, 2, 2, 2, 2, 2])
b_index is the list of indexes.
I want to copy from arr1 to arr2 at indexes in b_index.
so the result should be something like
arr2['a'] = np.array([1, 2, 1, 2, 2, 1])
arr2['b'] = np.array([1, 2, 1, 2, 2, 1])
arr2['c'] = np.array([1, 2, 1, 2, 2, 1])
arr2['f'] = np.array([2, 2, 2, 2, 2, 2])
I can obviously do using loops, but not sure if that is a right way to do that.
We are talking about 100 columns('a','b','c') and around a 1 million rows.

One solution, which might not be optimal, is to use advanced array indexing:
In [1]: arr = np.ones((5, 3))
In [2]: arr2 = np.full((5, 5), 2)
In [3]: arr2[:, [1, 2, 4]] = arr
In [4]: arr2
Out[4]:
array([[2, 1, 1, 2, 1],
[2, 1, 1, 2, 1],
[2, 1, 1, 2, 1],
[2, 1, 1, 2, 1],
[2, 1, 1, 2, 1]])
Does it help ?

Related

What is the opposite of torch.unique_consecutive?

How can I efficiently revert torch.unique_consecutive? I.e.:
x = torch.tensor([1, 1, 2, 2, 3, 1, 1, 2])
output, counts = torch.unique_consecutive(x, return_counts=True)
y = torch.SOMETHING(output,counts) #y equals x
Please use torch.repeat_interleave(**args) for your task
x = torch.tensor([1, 1, 2, 2, 3, 1, 1, 2])
output, counts = torch.unique_consecutive(x, return_counts=True)
y = torch.repeat_interleave(output, counts)
#>>y = [1, 1, 2, 2, 3, 1, 1, 2]

Numpy unique function

I have a quick question about the numpy unique function. I want to return the unique column values for each row
import numpy as np
a = np.array([[3, 2, 3, 2, 1, 3, 1, 2, 1, 3, 1, 2, 2, 2, 3, 3],
[3, 2, 3, 2, 3, 3, 3, 3, 2, 2, 3, 1, 2, 1, 2, 1],
[3, 3, 3, 2, 3, 3, 3, 2, 2, 2, 3, 2, 2, 3, 1, 1]]) # a.shape is (3,16)
np.unique(a)
array([1, 2, 3]) # not what I want
np.unique(a,axis=1)
array([[1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3],
[2, 3, 1, 1, 2, 2, 3, 1, 2, 2, 3],
[2, 3, 2, 3, 2, 3, 2, 1, 1, 2, 3]]) # also not what I want, and I'm not even sure what its doing
np.apply_along_axis(np.unique,1,a)
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]) # this is what I want
The problem is that I also want to use other features of np.unqiue, like returning index values. Can anyone help me to get np.unique to work by itself?
You can loop over rows and collect unique values:
import numpy as np
a = np.array([[3, 2, 3, 2, 1, 3, 1, 2, 1, 3, 1, 2, 2, 2, 3, 3],
[3, 2, 3, 2, 3, 3, 3, 3, 2, 2, 3, 1, 2, 1, 2, 1],
[3, 3, 3, 2, 3, 3, 3, 2, 2, 2, 3, 2, 2, 3, 1, 1]])
arr = np.empty((0,3), int)
for row in a:
arr = np.append(arr, np.array([np.unique(a)]), axis=0)
Output:
[[1 2 3]
[1 2 3]
[1 2 3]]
numpy will not be able to return a matrix with rows of different sizes. your example has exactly 3 distinct values per row which makes np.apply_along_axis work but if you had a value of 4 in one of the rows or only 1s and 2s on a row it would fail.
To obtain what you are looking for you will need to use a normal Python list as the result. You can build it using a list comprehension:
import numpy as np
a = np.array([[1, 2, 2, 2, 1, 1, 1, 2, 1, 2, 1, 2, 2, 2, 1, 1],
[3, 2, 3, 2, 3, 3, 3, 3, 2, 2, 3, 1, 2, 1, 2, 1],
[3, 3, 3, 2, 3, 3, 4, 2, 2, 2, 3, 2, 2, 3, 1, 1]])
r = [ np.unique(row) for row in a ]
print(r)
# [array([1, 2]), array([1, 2, 3]), array([1, 2, 3, 4])]
r = [ np.unique(row,return_index=True)for row in a ]
print(r)
# [(array([1, 2]), array([0, 1])),
# (array([1, 2, 3]), array([11, 1, 0])),
# (array([1, 2, 3, 4]), array([14, 3, 0, 6]))]
One thing you could do is build a mask of the values that are the first of their kind on each row. This can be done using numpy.
Here's one way to do it (hopefully, numpy experts could suggest something less convoluted):
np.sum(np.cumsum(np.cumsum(a==np.unique(a)[:,None,None],axis=2),axis=2)==1,axis=0)
array([[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0]])
Such a mask offers many processing options such as finding indices of the first occurrence on each line (using np.argwhere), erasing/assigning first or subsequent occurrences, and more.

How do I make copies of an array in python based on conditions in the array?

I have an
n = np.array([[1, 12, 1, 3],
[1, 1, 12, 0]])
and would like to duplicate it such that if I have a double-digit number in the array, it breaks the array into two identical arrays where the first array has the first digit and the second array has the second digit. In the above example, I would have 4 copies of the matrix. The assumptions are that there are either single digit or double digit numbers in the array.
n1 = [1, 1, 1, 3], [1, 1, 1, 0]
n2 = [1, 1, 1, 3], [1, 1, 2, 0]
n3 = [1, 2, 1, 3], [1, 1, 1, 0]
n4 = [1, 2, 1, 3], [1, 1, 2, 0]
Approach 1: itertools.product
>>> import numpy as np
>>> from itertools import product
>>> from pprint import pprint
>>>
>>> n = np.array([[1, 12, 1, 3],
... [1, 1, 12, 0]])
>>>
>>> pprint([np.reshape(nn, n.shape).astype(int) for nn in product(*map(str, n.ravel()))])
[array([[1, 1, 1, 3],
[1, 1, 1, 0]]),
array([[1, 1, 1, 3],
[1, 1, 2, 0]]),
array([[1, 2, 1, 3],
[1, 1, 1, 0]]),
array([[1, 2, 1, 3],
[1, 1, 2, 0]])]
Note that this happens to work also for longer numbers.
>>> n = np.array([462, 3, 15, 1, 0])
>>> pprint([np.reshape(nn, n.shape).astype(int) for nn in product(*map(str, n.ravel()))])
[array([4, 3, 1, 1, 0]),
array([4, 3, 5, 1, 0]),
array([6, 3, 1, 1, 0]),
array([6, 3, 5, 1, 0]),
array([2, 3, 1, 1, 0]),
array([2, 3, 5, 1, 0])]
Approach 2: np.meshgrid
>>> import numpy as np
>>>
>>> n = np.array([[1, 12, 1, 3],
... [1, 1, 12, 0]])
>>>
>>> te = np.where(n>=10)
>>> dims = tuple(np.log10(n[te]).astype(int) + 1)
>>>
>>> out = np.empty(dims + n.shape, dtype=n.dtype)
>>> out[...] = n
>>> out[(Ellipsis,) + te] = np.moveaxis(np.meshgrid(*(s//10**np.arange(i)[::-1]%10 for i, s in zip(dims, n[te])), indexing='ij'), 0, -1)
>>>
>>> out
array([[[[1, 1, 1, 3],
[1, 1, 1, 0]],
[[1, 1, 1, 3],
[1, 1, 2, 0]]],
[[[1, 2, 1, 3],
[1, 1, 1, 0]],
[[1, 2, 1, 3],
[1, 1, 2, 0]]]])

transform an array of array to an array of numbers

I have an array of values and an array of repeated times
>>> x=np.arange(5)
>>> x
array([0, 1, 2, 3, 4])
>>> n=np.random.randint(1,3,5)
>>> n
array([2, 1, 1, 2, 2])
And I do
>>> y=np.array([np.repeat(x[i],n[i]) for i in range(5)])
>>> y
array([array([0, 0]), array([1]), array([2]), array([3, 3]), array([4, 4])], dtype=object)
But I want my result to be array([0, 0, 1, 2, 3, 3, 4, 4]).
How can I do it?
I think this is simpler than you're making it (docs):
>>> x = np.arange(5)
>>> y = np.array([2, 1, 1, 2, 2])
>>> np.repeat(x,y)
array([0, 0, 1, 2, 3, 3, 4, 4])

24-bit audio sample into int32 array (little-endian WAV file)

I read a 24-bit mono audio .wav file into an array of type <i4 (<i3 doesn't exist)
data = numpy.fromfile(fid, dtype=`<i4`, count=size//3)
What should I do in order to get the audio samples properly ? Should I swap bytes order of something like this, how ?
Here is the solution for reading 24 bits files (thanks to Warren Weckesser's gist https://gist.github.com/WarrenWeckesser/7461781) :
data = numpy.fromfile(fid, dtype='u1', count=size) # first read byte per byte
a = numpy.empty((len(data)/3, 4), dtype=`u1`)
a[:, :3] = data.reshape((-1, 3))
a[:, 3:] = (a[:, 3 - 1:3] >> 7) * 255
data = a.view('<i4').reshape(a.shape[:-1])
This can be directly inserted in def _read_data_chunk(fid, noc, bits): (scipy\io\wavfile.py).
You can convert the data into a numpy array of uint8, then add the 0 to each sample by using reshape and hstack;
In [1]: import numpy as np
I'm using a generated sequence here as an example.
In [2]: a = np.array([1,2,3]*10, dtype=np.uint8)
In [3]: a
Out[3]:
array([1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2,
3, 1, 2, 3, 1, 2, 3], dtype=uint8)
In [4]: a = a.reshape((-1,3))
Reshape allows you to group the samples:
In [5]: a
Out[5]:
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]], dtype=uint8)
Make the zeros that have to be added.
In [6]: b = np.zeros(10, dtype=np.uint8).reshape((-1,1))
In [7]: b
Out[7]:
array([[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0]], dtype=uint8)
Now we add the zeroes. Assuming you're using a little-endian system, the added zero goes at the front, to scale the data.
(I hope I got this endianness stuff right. If the sample now sounds very faint, I got it wrong and you need to use (a,b) instead of (b,a))
In [8]: c = np.hstack((b, a))
In [9]: c
Out[9]:
array([[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]], dtype=uint8)
Reshape it back.
In [10]: c.reshape((1,-1))
Out[10]:
array([[0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1,
2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]], dtype=uint8)
Convert to bytes:
In [11]: bytearray(c.reshape((1,-1)))
bytearray(b'\x00\x01\x02\x03\x00\x01\x02\x03\x00\x01\x02\x03\x00\x01\x02\x03\x00\x01\x02\x03\x00\x01\x02\x03\x00\x01\x02\x03\x00\x01\x02\x03\x00\x01\x02\x03\x00\x01\x02\x03')
Now you have 4-byte samples.

Categories

Resources