Reducing size of a character array in Numpy

Reducing size of a character array in Numpy - python

Given a character array:
In [21]: x = np.array(['a ','bb ','cccc '])
One can remove the whitespace using:
In [22]: np.char.strip(x)
Out[22]:
array(['a', 'bb', 'cccc'],
dtype='|S8')
but is there a way to also shrink the width of the column to the minimum required size, in the above case |S4?

Do you just want to change the data type?
import numpy as NP
a = NP.array(["a", "bb", "ccc"])
a
# returns array(['a', 'bb', 'ccc'], dtype='|S3')
a = NP.array(a, dtype="|S8") # change dtype
# returns array(['a', 'bb', 'ccc'], dtype='|S8')
a = NP.array(a, dtype="|S3") # change it back
# returns array(['a', 'bb', 'ccc'], dtype='|S3')

>>> x = np.array(['a ','bb ','cccc '])
>>> x = np.array([s.strip() for s in x])
>>> x
array(['a', 'bb', 'cccc'],
dtype='|S4')

Related

transform 1D array of list to 2D array

For example
import pandas as pd
d1 = pd.Series(['a b', 'c d'])
t1 = d1.str.split()
a1 = t1.values
where a1 would be
array([list(['a', 'b']), list(['c', 'd'])], dtype=object)
how to transform it to
array([['a', 'b'],
['c', 'd']], dtype='<U1')

Use np.stack on t1:
In [186]: np.stack(t1)
Out[186]:
array([['a', 'b'],
['c', 'd']], dtype='<U1')
Or np.array on t1.tolist
In [187]: np.array(t1.tolist())
Out[187]:
array([['a', 'b'],
['c', 'd']], dtype='<U1')

Creating array with single structured element containing an array

I have a dtype like this:
>>> dt = np.dtype([('x', object, 3)])
>>> dt
dtype([('x', 'O', (3,))])
One field named 'x', containing three pointers. I would like to construct an array with a single element of this type:
>>> a = np.array([(['a', 'b', 'c'])], dtype=dt)
>>> b = np.array([(np.array(['a', 'b', 'c'], dtype=object))], dtype=dt)
>>> c = np.array((['a', 'b', 'c']), dtype=dt)
>>> d = np.array(['a', 'b', 'c'], dtype=dt)
>>> e = np.array([([['a', 'b', 'c']])], dtype=dt)
All five of these statements yield the same incorrect result:
array([[(['a', 'a', 'a'],), (['b', 'b', 'b'],), (['c', 'c', 'c'],)]],
dtype=[('x', 'O', (3,))])
If I try to drop the inner list/array, I get an error:
>>> f = np.array([('a', 'b', 'c')], dtype=dt)
ValueError: could not assign tuple of length 3 to structure with 1 fields.
Same error happens for
>>> g = np.array(('a', 'b', 'c'), dtype=dt)
I've run out of possible combinations to try. The result I am looking for is
array([(['a', 'b', 'c'],)], dtype=[('x', 'O', (3,))])
How do I create an array that has one element of the specified dtype?
So far, the only approach that I've found is manual assignment:
z = np.empty(1, dtype=dt)
z['x'][0, :] = ['a', 'b', 'c']
OR
z[0]['x'] = ['a', 'b', 'c']
This seems like an unnecessary workaround for something that np.array ought to be able to handle out of the box.

In [44]: dt = np.dtype([('x', object, 3)]) # corrected
In [45]: dt
Out[45]: dtype([('x', 'O', (3,))])
In [46]: np.empty(3, dt)
Out[46]:
array([([None, None, None],), ([None, None, None],),
([None, None, None],)], dtype=[('x', 'O', (3,))])
In [47]: np.array([(['a','b','c'],)], dt)
Out[47]: array([(['a', 'b', 'c'],)], dtype=[('x', 'O', (3,))])
Input formatting should match output formatting.
In [48]: arr = np.empty(3, dt)
In [49]: arr['x']
Out[49]:
array([[None, None, None],
[None, None, None],
[None, None, None]], dtype=object)
In [50]: arr['x'][0]
Out[50]: array([None, None, None], dtype=object)
In [51]: arr['x'][0] = ['a','b','c']
In [52]: arr
Out[52]:
array([(['a', 'b', 'c'],), ([None, None, None],), ([None, None, None],)],
dtype=[('x', 'O', (3,))])

numpy, merge two array of different shape

For two arrays a and b,
a = np.array([[1],[2],[3],[4]])
b = np.array(['a', 'b', 'c', 'd'])
I want to generate the following array
c = np.array([[1, 'a'], [2, 'b'], [3, 'c'], [4, 'd']])
Is there a way to do this efficiently ?

You need:
import numpy as np
a = np.array([[1],[2],[3],[4]])
b = np.array(['a', 'b', 'c', 'd'])
print(np.array(list(zip(np.concatenate(a), b))))
Output:
[[1, 'a'], [2, 'b'], [3, 'c'], [4, 'd']]
Alternate Solution
print(np.stack((np.concatenate(a), b), axis=1))

Solution
>>> import numpy as np
>>> a = np.array([[1],[2],[3],[4]])
>>> b = np.array(['a', 'b', 'c', 'd'])
# You have strange array so result is strange
>>> np.array([[a[i],b[i]] for i in range(a.shape[0])])
array([[array([1]), 'a'],
[array([2]), 'b'],
[array([3]), 'c'],
[array([4]), 'd']], dtype=object)
# You want this
>>> np.array([[a[i][0],b[i]] for i in range(a.shape[0])])
array([['1', 'a'],
['2', 'b'],
['3', 'c'],
['4', 'd']], dtype='<U11')
>>>
Note:
You may want to reshape your 'a' array.
>>> a.shape
(4, 1)
>>> a
array([[1],
[2],
[3],
[4]])
Reshape like this for easier use, for next time...
>>> a.reshape(4)
array([1, 2, 3, 4])

You can do:
c = np.vstack((a.flatten(), b)).T

Permute string of Kronecker products

A function I am writing will receive as input a matrix H=A x B x I x I, where each matrix is square and of dimension d, the cross refers to the Kronecker product np.kron and I is the identity np.eye(d). Thus
I = np.eye(d)
H = np.kron(A, B)
H = np.kron(H, I)
H = np.kron(H, I)
Given H and the above form, but without knowledge of A and B, I would like to construct G = I x A x I x B e.g. the result of
G = np.kron(I, A)
G = np.kron(G, I)
G = np.kron(G, B)
It should be possible to do this by applying some permutation to H. How do I implement that permutation?

Transposing with (2,0,3,1,6,4,7,5) (after expanding to 8 axes) appears to do it:
>>> from functools import reduce
>>>
>>> A = np.random.randint(0,10,(10,10))
>>> B = np.random.randint(0,10,(10,10))
>>> I = np.identity(10, int)
>>> G = reduce(np.kron, (A,B,I,I))
>>> H = reduce(np.kron, (I,A,I,B))
>>>
>>>
>>> (G.reshape(*8*(10,)).transpose(2,0,3,1,6,4,7,5).reshape(10**4,10**4) == H).all()
True
Explanation: Let's look at a minimal example to understand how the Kronecker product relates to reshaping and axis shuffling.
Two 1D factors:
>>> A, B = np.arange(1,5), np.array(list("abcd"), dtype=object)
>>> np.kron(A, B)
array(['a', 'b', 'c', 'd', 'aa', 'bb', 'cc', 'dd', 'aaa', 'bbb', 'ccc',
'ddd', 'aaaa', 'bbbb', 'cccc', 'dddd'], dtype=object)
We can observe that the arrangement is row-major-ish, so if we reshape we actually get the outer product:
>>> np.kron(A, B).reshape(4, 4)
array([['a', 'b', 'c', 'd'],
['aa', 'bb', 'cc', 'dd'],
['aaa', 'bbb', 'ccc', 'ddd'],
['aaaa', 'bbbb', 'cccc', 'dddd']], dtype=object)
>>> np.outer(A, B)
array([['a', 'b', 'c', 'd'],
['aa', 'bb', 'cc', 'dd'],
['aaa', 'bbb', 'ccc', 'ddd'],
['aaaa', 'bbbb', 'cccc', 'dddd']], dtype=object)
If we do the same with factors swapped we get the transpose:
>>> np.kron(B, A).reshape(4, 4)
array([['a', 'aa', 'aaa', 'aaaa'],
['b', 'bb', 'bbb', 'bbbb'],
['c', 'cc', 'ccc', 'cccc'],
['d', 'dd', 'ddd', 'dddd']], dtype=object)
With 2D factors things are similar
>>> A2, B2 = A.reshape(2,2), B.reshape(2,2)
>>>
>>> np.kron(A2, B2)
array([['a', 'b', 'aa', 'bb'],
['c', 'd', 'cc', 'dd'],
['aaa', 'bbb', 'aaaa', 'bbbb'],
['ccc', 'ddd', 'cccc', 'dddd']], dtype=object)
>>> np.kron(A2, B2).reshape(2,2,2,2)
array([[[['a', 'b'],
['aa', 'bb']],
[['c', 'd'],
['cc', 'dd']]],
[[['aaa', 'bbb'],
['aaaa', 'bbbb']],
[['ccc', 'ddd'],
['cccc', 'dddd']]]], dtype=object)
But there is a minor complication in that the corresponding outer product has axes arranged differently:
>>> np.multiply.outer(A2, B2)
array([[[['a', 'b'],
['c', 'd']],
[['aa', 'bb'],
['cc', 'dd']]],
[[['aaa', 'bbb'],
['ccc', 'ddd']],
[['aaaa', 'bbbb'],
['cccc', 'dddd']]]], dtype=object)
We need to swap middle axes to get the same result.
>>> np.multiply.outer(A2, B2).swapaxes(1,2)
array([[[['a', 'b'],
['aa', 'bb']],
[['c', 'd'],
['cc', 'dd']]],
[[['aaa', 'bbb'],
['aaaa', 'bbbb']],
[['ccc', 'ddd'],
['cccc', 'dddd']]]], dtype=object)
So if we want to go the swapped Kronecker product we can swap the middle axes: (0,2,1,3)
now we have the outer product. swapping factors exchanges the first two axes with the second two: (1,3,0,2)
going back to Kronecker, swap the middle axes
=> total axis permutation: (1,0,3,2)
>>> np.all(np.kron(A2, B2).reshape(2,2,2,2).transpose(1,0,3,2).reshape(4,4) == np.kron(B2, A2))
True
Using the same principles leads to the recipe for the four factor original question.

This answer expands on Paul Panzer's correct answer to document how one would solve similar problems like this more generally.
Suppose we wish to map a matrix string reduce(kron, ABCD) into, for example, reduce(kron, CADB), where each matrix has dimension d columns. Both of the strings are thus d**4, d**4 matrices. Alternatively they are [d,]*8 shaped arrays.
The way np.kron arranges data means that the index ordering of ABDC corresponds to that of its constituents as follows: D_0 C_0 B_0 A_0 D_1 C_1 B_1 A_1 where for example D_0 (D_1) is the fastest (slowest) oscillating index in D. For CADB the index ordering is instead (B_0 D_0 A_0 C_0 B_1 D_1 A_1 C_1); you just read the string backwards once for the faster and once for the slower indices. The appropriate permutation string in this case is thus (2,0,3,1,6,4,7,5).

How to delete numpy nan from a list of strings in Python?

I have a list of strings
x = ['A', 'B', nan, 'D']
and want to remove the nan.
I tried:
x = x[~numpy.isnan(x)]
But that only works if it contains numbers. How do we solve this for strings in Python 3+?

If you have a numpy array you can simply check the item is not the string nan, but if you have a list you can check the identity with is and np.nan since it's a singleton object.
In [25]: x = np.array(['A', 'B', np.nan, 'D'])
In [26]: x
Out[26]:
array(['A', 'B', 'nan', 'D'],
dtype='<U3')
In [27]: x[x != 'nan']
Out[27]:
array(['A', 'B', 'D'],
dtype='<U3')
In [28]: x = ['A', 'B', np.nan, 'D']
In [30]: [i for i in x if i is not np.nan]
Out[30]: ['A', 'B', 'D']
Or as a functional approach in case you have a python list:
In [34]: from operator import is_not
In [35]: from functools import partial
In [37]: f = partial(is_not, np.nan)
In [38]: x = ['A', 'B', np.nan, 'D']
In [39]: list(filter(f, x))
Out[39]: ['A', 'B', 'D']

You can use math.isnan and a good-old list comprehension.
Something like this would do the trick:
import math
x = [y for y in x if not math.isnan(y)]

You may want to avoid np.nan with strings, use None instead; but if you do have nan you could do this:
import numpy as np
[i for i in x if i is not np.nan]
# ['A', 'B', 'D']

You could also try this:
[s for s in x if str(s) != 'nan']
Or, convert everything to str at the beginning:
[s for s in map(str, x) if s != 'nan']
Both approaches yield ['A', 'B', 'D'].

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reducing size of a character array in Numpy - python

Given a character array: In [21]: x = np.array(['a ','bb ','cccc ']) One can remove the whitespace using: In [22]: np.char.strip(x) Out[22]: array(['a', 'bb', 'cccc'], dtype='|S8') but is there a way to also shrink the width of the column to the minimum required size, in the above case |S4?

>>> x = np.array(['a ','bb ','cccc ']) >>> x = np.array([s.strip() for s in x]) >>> x array(['a', 'bb', 'cccc'], dtype='|S4')

Related

transform 1D array of list to 2D array

Creating array with single structured element containing an array

numpy, merge two array of different shape

Permute string of Kronecker products

How to delete numpy nan from a list of strings in Python?

Categories

Resources