Creating array with single structured element containing an array

Creating array with single structured element containing an array - python

I have a dtype like this:
>>> dt = np.dtype([('x', object, 3)])
>>> dt
dtype([('x', 'O', (3,))])
One field named 'x', containing three pointers. I would like to construct an array with a single element of this type:
>>> a = np.array([(['a', 'b', 'c'])], dtype=dt)
>>> b = np.array([(np.array(['a', 'b', 'c'], dtype=object))], dtype=dt)
>>> c = np.array((['a', 'b', 'c']), dtype=dt)
>>> d = np.array(['a', 'b', 'c'], dtype=dt)
>>> e = np.array([([['a', 'b', 'c']])], dtype=dt)
All five of these statements yield the same incorrect result:
array([[(['a', 'a', 'a'],), (['b', 'b', 'b'],), (['c', 'c', 'c'],)]],
dtype=[('x', 'O', (3,))])
If I try to drop the inner list/array, I get an error:
>>> f = np.array([('a', 'b', 'c')], dtype=dt)
ValueError: could not assign tuple of length 3 to structure with 1 fields.
Same error happens for
>>> g = np.array(('a', 'b', 'c'), dtype=dt)
I've run out of possible combinations to try. The result I am looking for is
array([(['a', 'b', 'c'],)], dtype=[('x', 'O', (3,))])
How do I create an array that has one element of the specified dtype?
So far, the only approach that I've found is manual assignment:
z = np.empty(1, dtype=dt)
z['x'][0, :] = ['a', 'b', 'c']
OR
z[0]['x'] = ['a', 'b', 'c']
This seems like an unnecessary workaround for something that np.array ought to be able to handle out of the box.

In [44]: dt = np.dtype([('x', object, 3)]) # corrected
In [45]: dt
Out[45]: dtype([('x', 'O', (3,))])
In [46]: np.empty(3, dt)
Out[46]:
array([([None, None, None],), ([None, None, None],),
([None, None, None],)], dtype=[('x', 'O', (3,))])
In [47]: np.array([(['a','b','c'],)], dt)
Out[47]: array([(['a', 'b', 'c'],)], dtype=[('x', 'O', (3,))])
Input formatting should match output formatting.
In [48]: arr = np.empty(3, dt)
In [49]: arr['x']
Out[49]:
array([[None, None, None],
[None, None, None],
[None, None, None]], dtype=object)
In [50]: arr['x'][0]
Out[50]: array([None, None, None], dtype=object)
In [51]: arr['x'][0] = ['a','b','c']
In [52]: arr
Out[52]:
array([(['a', 'b', 'c'],), ([None, None, None],), ([None, None, None],)],
dtype=[('x', 'O', (3,))])

Related

transform 1D array of list to 2D array

For example
import pandas as pd
d1 = pd.Series(['a b', 'c d'])
t1 = d1.str.split()
a1 = t1.values
where a1 would be
array([list(['a', 'b']), list(['c', 'd'])], dtype=object)
how to transform it to
array([['a', 'b'],
['c', 'd']], dtype='<U1')

Use np.stack on t1:
In [186]: np.stack(t1)
Out[186]:
array([['a', 'b'],
['c', 'd']], dtype='<U1')
Or np.array on t1.tolist
In [187]: np.array(t1.tolist())
Out[187]:
array([['a', 'b'],
['c', 'd']], dtype='<U1')

Dataframe column names np array has \n after one of the column names

I am trying to sum the values of select columns from a list of columns and store it in a new column. However, I keep getting
raise KeyError('%s not in index' % objarr[mask])
KeyError: "['a' 'b' 'c' 'd' 'e'\n 'f'] not in index"
This is the piece of code where I am trying to achieve this :
cols = df.columns.values
colIndex = np.argwhere('Person')
selectCols = np.delete(cols, colIndex)
df['total counts'] = df[selectCols].sum(axis=1)
First I'm not sure how the \n is present after column e and secondly I don't know what's causing this KeyError. Please help!

In [290]: np.argwhere('Person')
Out[290]: array([], shape=(1, 0), dtype=int64)
It doesn't make sense to use that in np.delete.
Show cols
===
In [301]: cols = np.array(['Person', 'a', 'b', 'c', 'd', 'e', 'f'])
In [302]: idx = np.argwhere('Person')
In [303]: idx
Out[303]: array([], shape=(1, 0), dtype=int64)
In [304]: np.delete(cols, idx)
Out[304]: array(['Person', 'a', 'b', 'c', 'd', 'e', 'f'], dtype='<U6')
alternatively we can find where cols is equal to 'Person':
In [305]: idx = np.argwhere(cols=='Person')
In [306]: idx
Out[306]: array([[0]])
In [307]: np.delete(cols, idx)
Out[307]: array(['a', 'b', 'c', 'd', 'e', 'f'], dtype='<U6')
or working with a list version of cols:
In [313]: alist = cols.tolist()
In [314]: alist
Out[314]: ['Person', 'a', 'b', 'c', 'd', 'e', 'f']
In [315]: alist.remove('Person')
In [316]: alist
Out[316]: ['a', 'b', 'c', 'd', 'e', 'f']
===
replicating more of your case:
In [317]: df = pd.DataFrame(np.ones((2,4),int), columns=['a','b','c','d'])
In [318]: df
Out[318]:
a b c d
0 1 1 1 1
1 1 1 1 1
In [319]: cols = df.columns.values
In [320]: cols
Out[320]: array(['a', 'b', 'c', 'd'], dtype=object)
In [322]: np.argwhere(cols=='a')
Out[322]: array([[0]])
In [323]: np.delete(cols, _)
Out[323]: array(['b', 'c', 'd'], dtype=object)
In [324]: df[_]
Out[324]:
b c d
0 1 1 1
1 1 1 1
In [326]: df[_323].sum(axis=1)
Out[326]:
0 3
1 3
dtype: int64

numpy, merge two array of different shape

For two arrays a and b,
a = np.array([[1],[2],[3],[4]])
b = np.array(['a', 'b', 'c', 'd'])
I want to generate the following array
c = np.array([[1, 'a'], [2, 'b'], [3, 'c'], [4, 'd']])
Is there a way to do this efficiently ?

You need:
import numpy as np
a = np.array([[1],[2],[3],[4]])
b = np.array(['a', 'b', 'c', 'd'])
print(np.array(list(zip(np.concatenate(a), b))))
Output:
[[1, 'a'], [2, 'b'], [3, 'c'], [4, 'd']]
Alternate Solution
print(np.stack((np.concatenate(a), b), axis=1))

Solution
>>> import numpy as np
>>> a = np.array([[1],[2],[3],[4]])
>>> b = np.array(['a', 'b', 'c', 'd'])
# You have strange array so result is strange
>>> np.array([[a[i],b[i]] for i in range(a.shape[0])])
array([[array([1]), 'a'],
[array([2]), 'b'],
[array([3]), 'c'],
[array([4]), 'd']], dtype=object)
# You want this
>>> np.array([[a[i][0],b[i]] for i in range(a.shape[0])])
array([['1', 'a'],
['2', 'b'],
['3', 'c'],
['4', 'd']], dtype='<U11')
>>>
Note:
You may want to reshape your 'a' array.
>>> a.shape
(4, 1)
>>> a
array([[1],
[2],
[3],
[4]])
Reshape like this for easier use, for next time...
>>> a.reshape(4)
array([1, 2, 3, 4])

You can do:
c = np.vstack((a.flatten(), b)).T

The order of list is not as expected

I have a list node_list.
In [1]: node_list
Out[1]: ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 1, 2, 3, 4, 5, 6]
I add nodes to a NetworkX graph G from the node_list
In [2]: import networkx as nx
In [3]: G = nx.Graph()
In [4]: G.add_nodes_from(node_list)
But when I get the list of the nodes, the pattern is changed!
In [5]: list(G.nodes())
Out[5]: ['a', 1, 'c', 'b', 'e', 'd', 'g', 'f', 'i', 'h', 'j', 2, 3, 4, 6, 5]
I want the Out[5] to be in the same pattern as the node_list but that didn't happen. How can it be done?

From the docstring of NetworkX.Graph :
Examples
--------
Create a graph object that tracks the order nodes are added.
>>> from collections import OrderedDict
>>> class OrderedNodeGraph(nx.Graph):
... node_dict_factory=OrderedDict
>>> G=OrderedNodeGraph()
>>> G.add_nodes_from( (2,1) )
>>> G.nodes()
[2, 1]
>>> G.add_edges_from( ((2,2), (2,1), (1,1)) )
>>> G.edges()
[(2, 1), (2, 2), (1, 1)]

append list of values to sublists

How do you append each item of one list to each sublist of another list?
a = [['a','b','c'],['d','e','f'],['g','h','i']]
b = [1,2,3]
Result should be:
[['a','b','c',1],['d','e','f',2],['g','h','i',3]]
Keep in mind that I want to do this to a very large list, so efficiency and speed is important.
I've tried:
for sublist,value in a,b:
sublist.append(value)
it returns 'ValueError: too many values to unpack'
Perhaps a listindex or a listiterator could work, but not sure how to apply here

a = [['a','b','c'],['d','e','f'],['g','h','i']]
b = [1,2,3]
for ele_a, ele_b in zip(a, b):
ele_a.append(ele_b)
Result:
>>> a
[['a', 'b', 'c', 1], ['d', 'e', 'f', 2], ['g', 'h', 'i', 3]]
The reason your original solution did not work, is that a,b does create a tuple, but not what you want.
>>> z = a,b
>>> type(z)
<type 'tuple'>
>>> z
([['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']], [1, 2, 3])
>>> len(z[0])
3
>>> for ele in z:
... print ele
...
[['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']] #In your original code, you are
[1, 2, 3] #unpacking a list of 3 elements
#into two values, hence the
#'ValueError: too many values to unpack'
>>> zip(a,b) # using zip gives you what you want.
[(['a', 'b', 'c'], 1), (['d', 'e', 'f'], 2), (['g', 'h', 'i'], 3)]

Here is a simple solution:
a = [['a','b','c'],['d','e','f'],['g','h','i']]
b = [1,2,3]
for i in range(len(a)):
a[i].append(b[i])
print(a)

One option, using list comprehension:
a = [(a[i] + b[i]) for i in range(len(a))]

Just loop through the sublists, adding one item at a time:
for i in range(0,len(listA)):
listA.append(listB[i])

You can do:
>>> a = [['a','b','c'],['d','e','f'],['g','h','i']]
>>> b = [1,2,3]
>>> [l1+[l2] for l1, l2 in zip(a,b)]
[['a', 'b', 'c', 1], ['d', 'e', 'f', 2], ['g', 'h', 'i', 3]]
You can also abuse a side effect of list comprehensions to get this done in place:
>>> [l1.append(l2) for l1, l2 in zip(a,b)]
[None, None, None]
>>> a
[['a', 'b', 'c', 1], ['d', 'e', 'f', 2], ['g', 'h', 'i', 3]]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating array with single structured element containing an array - python

Related

transform 1D array of list to 2D array

Dataframe column names np array has \n after one of the column names

numpy, merge two array of different shape

The order of list is not as expected

append list of values to sublists

Categories

Resources