Creating array with single structured element containing an array - python

I have a dtype like this:
>>> dt = np.dtype([('x', object, 3)])
>>> dt
dtype([('x', 'O', (3,))])
One field named 'x', containing three pointers. I would like to construct an array with a single element of this type:
>>> a = np.array([(['a', 'b', 'c'])], dtype=dt)
>>> b = np.array([(np.array(['a', 'b', 'c'], dtype=object))], dtype=dt)
>>> c = np.array((['a', 'b', 'c']), dtype=dt)
>>> d = np.array(['a', 'b', 'c'], dtype=dt)
>>> e = np.array([([['a', 'b', 'c']])], dtype=dt)
All five of these statements yield the same incorrect result:
array([[(['a', 'a', 'a'],), (['b', 'b', 'b'],), (['c', 'c', 'c'],)]],
dtype=[('x', 'O', (3,))])
If I try to drop the inner list/array, I get an error:
>>> f = np.array([('a', 'b', 'c')], dtype=dt)
ValueError: could not assign tuple of length 3 to structure with 1 fields.
Same error happens for
>>> g = np.array(('a', 'b', 'c'), dtype=dt)
I've run out of possible combinations to try. The result I am looking for is
array([(['a', 'b', 'c'],)], dtype=[('x', 'O', (3,))])
How do I create an array that has one element of the specified dtype?
So far, the only approach that I've found is manual assignment:
z = np.empty(1, dtype=dt)
z['x'][0, :] = ['a', 'b', 'c']
OR
z[0]['x'] = ['a', 'b', 'c']
This seems like an unnecessary workaround for something that np.array ought to be able to handle out of the box.

In [44]: dt = np.dtype([('x', object, 3)]) # corrected
In [45]: dt
Out[45]: dtype([('x', 'O', (3,))])
In [46]: np.empty(3, dt)
Out[46]:
array([([None, None, None],), ([None, None, None],),
([None, None, None],)], dtype=[('x', 'O', (3,))])
In [47]: np.array([(['a','b','c'],)], dt)
Out[47]: array([(['a', 'b', 'c'],)], dtype=[('x', 'O', (3,))])
Input formatting should match output formatting.
In [48]: arr = np.empty(3, dt)
In [49]: arr['x']
Out[49]:
array([[None, None, None],
[None, None, None],
[None, None, None]], dtype=object)
In [50]: arr['x'][0]
Out[50]: array([None, None, None], dtype=object)
In [51]: arr['x'][0] = ['a','b','c']
In [52]: arr
Out[52]:
array([(['a', 'b', 'c'],), ([None, None, None],), ([None, None, None],)],
dtype=[('x', 'O', (3,))])

Related

transform 1D array of list to 2D array

For example
import pandas as pd
d1 = pd.Series(['a b', 'c d'])
t1 = d1.str.split()
a1 = t1.values
where a1 would be
array([list(['a', 'b']), list(['c', 'd'])], dtype=object)
how to transform it to
array([['a', 'b'],
['c', 'd']], dtype='<U1')
Use np.stack on t1:
In [186]: np.stack(t1)
Out[186]:
array([['a', 'b'],
['c', 'd']], dtype='<U1')
Or np.array on t1.tolist
In [187]: np.array(t1.tolist())
Out[187]:
array([['a', 'b'],
['c', 'd']], dtype='<U1')

Dataframe column names np array has \n after one of the column names

I am trying to sum the values of select columns from a list of columns and store it in a new column. However, I keep getting
raise KeyError('%s not in index' % objarr[mask])
KeyError: "['a' 'b' 'c' 'd' 'e'\n 'f'] not in index"
This is the piece of code where I am trying to achieve this :
cols = df.columns.values
colIndex = np.argwhere('Person')
selectCols = np.delete(cols, colIndex)
df['total counts'] = df[selectCols].sum(axis=1)
First I'm not sure how the \n is present after column e and secondly I don't know what's causing this KeyError. Please help!
In [290]: np.argwhere('Person')
Out[290]: array([], shape=(1, 0), dtype=int64)
It doesn't make sense to use that in np.delete.
Show cols
===
In [301]: cols = np.array(['Person', 'a', 'b', 'c', 'd', 'e', 'f'])
In [302]: idx = np.argwhere('Person')
In [303]: idx
Out[303]: array([], shape=(1, 0), dtype=int64)
In [304]: np.delete(cols, idx)
Out[304]: array(['Person', 'a', 'b', 'c', 'd', 'e', 'f'], dtype='<U6')
alternatively we can find where cols is equal to 'Person':
In [305]: idx = np.argwhere(cols=='Person')
In [306]: idx
Out[306]: array([[0]])
In [307]: np.delete(cols, idx)
Out[307]: array(['a', 'b', 'c', 'd', 'e', 'f'], dtype='<U6')
or working with a list version of cols:
In [313]: alist = cols.tolist()
In [314]: alist
Out[314]: ['Person', 'a', 'b', 'c', 'd', 'e', 'f']
In [315]: alist.remove('Person')
In [316]: alist
Out[316]: ['a', 'b', 'c', 'd', 'e', 'f']
===
replicating more of your case:
In [317]: df = pd.DataFrame(np.ones((2,4),int), columns=['a','b','c','d'])
In [318]: df
Out[318]:
a b c d
0 1 1 1 1
1 1 1 1 1
In [319]: cols = df.columns.values
In [320]: cols
Out[320]: array(['a', 'b', 'c', 'd'], dtype=object)
In [322]: np.argwhere(cols=='a')
Out[322]: array([[0]])
In [323]: np.delete(cols, _)
Out[323]: array(['b', 'c', 'd'], dtype=object)
In [324]: df[_]
Out[324]:
b c d
0 1 1 1
1 1 1 1
In [326]: df[_323].sum(axis=1)
Out[326]:
0 3
1 3
dtype: int64

numpy, merge two array of different shape

For two arrays a and b,
a = np.array([[1],[2],[3],[4]])
b = np.array(['a', 'b', 'c', 'd'])
I want to generate the following array
c = np.array([[1, 'a'], [2, 'b'], [3, 'c'], [4, 'd']])
Is there a way to do this efficiently ?
You need:
import numpy as np
a = np.array([[1],[2],[3],[4]])
b = np.array(['a', 'b', 'c', 'd'])
print(np.array(list(zip(np.concatenate(a), b))))
Output:
[[1, 'a'], [2, 'b'], [3, 'c'], [4, 'd']]
Alternate Solution
print(np.stack((np.concatenate(a), b), axis=1))
Solution
>>> import numpy as np
>>> a = np.array([[1],[2],[3],[4]])
>>> b = np.array(['a', 'b', 'c', 'd'])
# You have strange array so result is strange
>>> np.array([[a[i],b[i]] for i in range(a.shape[0])])
array([[array([1]), 'a'],
[array([2]), 'b'],
[array([3]), 'c'],
[array([4]), 'd']], dtype=object)
# You want this
>>> np.array([[a[i][0],b[i]] for i in range(a.shape[0])])
array([['1', 'a'],
['2', 'b'],
['3', 'c'],
['4', 'd']], dtype='<U11')
>>>
Note:
You may want to reshape your 'a' array.
>>> a.shape
(4, 1)
>>> a
array([[1],
[2],
[3],
[4]])
Reshape like this for easier use, for next time...
>>> a.reshape(4)
array([1, 2, 3, 4])
You can do:
c = np.vstack((a.flatten(), b)).T

The order of list is not as expected

I have a list node_list.
In [1]: node_list
Out[1]: ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 1, 2, 3, 4, 5, 6]
I add nodes to a NetworkX graph G from the node_list
In [2]: import networkx as nx
In [3]: G = nx.Graph()
In [4]: G.add_nodes_from(node_list)
But when I get the list of the nodes, the pattern is changed!
In [5]: list(G.nodes())
Out[5]: ['a', 1, 'c', 'b', 'e', 'd', 'g', 'f', 'i', 'h', 'j', 2, 3, 4, 6, 5]
I want the Out[5] to be in the same pattern as the node_list but that didn't happen. How can it be done?
From the docstring of NetworkX.Graph :
Examples
--------
Create a graph object that tracks the order nodes are added.
>>> from collections import OrderedDict
>>> class OrderedNodeGraph(nx.Graph):
... node_dict_factory=OrderedDict
>>> G=OrderedNodeGraph()
>>> G.add_nodes_from( (2,1) )
>>> G.nodes()
[2, 1]
>>> G.add_edges_from( ((2,2), (2,1), (1,1)) )
>>> G.edges()
[(2, 1), (2, 2), (1, 1)]

append list of values to sublists

How do you append each item of one list to each sublist of another list?
a = [['a','b','c'],['d','e','f'],['g','h','i']]
b = [1,2,3]
Result should be:
[['a','b','c',1],['d','e','f',2],['g','h','i',3]]
Keep in mind that I want to do this to a very large list, so efficiency and speed is important.
I've tried:
for sublist,value in a,b:
sublist.append(value)
it returns 'ValueError: too many values to unpack'
Perhaps a listindex or a listiterator could work, but not sure how to apply here
a = [['a','b','c'],['d','e','f'],['g','h','i']]
b = [1,2,3]
for ele_a, ele_b in zip(a, b):
ele_a.append(ele_b)
Result:
>>> a
[['a', 'b', 'c', 1], ['d', 'e', 'f', 2], ['g', 'h', 'i', 3]]
The reason your original solution did not work, is that a,b does create a tuple, but not what you want.
>>> z = a,b
>>> type(z)
<type 'tuple'>
>>> z
([['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']], [1, 2, 3])
>>> len(z[0])
3
>>> for ele in z:
... print ele
...
[['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']] #In your original code, you are
[1, 2, 3] #unpacking a list of 3 elements
#into two values, hence the
#'ValueError: too many values to unpack'
>>> zip(a,b) # using zip gives you what you want.
[(['a', 'b', 'c'], 1), (['d', 'e', 'f'], 2), (['g', 'h', 'i'], 3)]
Here is a simple solution:
a = [['a','b','c'],['d','e','f'],['g','h','i']]
b = [1,2,3]
for i in range(len(a)):
a[i].append(b[i])
print(a)
One option, using list comprehension:
a = [(a[i] + b[i]) for i in range(len(a))]
Just loop through the sublists, adding one item at a time:
for i in range(0,len(listA)):
listA.append(listB[i])
You can do:
>>> a = [['a','b','c'],['d','e','f'],['g','h','i']]
>>> b = [1,2,3]
>>> [l1+[l2] for l1, l2 in zip(a,b)]
[['a', 'b', 'c', 1], ['d', 'e', 'f', 2], ['g', 'h', 'i', 3]]
You can also abuse a side effect of list comprehensions to get this done in place:
>>> [l1.append(l2) for l1, l2 in zip(a,b)]
[None, None, None]
>>> a
[['a', 'b', 'c', 1], ['d', 'e', 'f', 2], ['g', 'h', 'i', 3]]

Categories

Resources