Repeat values of an array on both the axes - python

Say I have this array:
array = np.array([[1,2,3],[4,5,6],[7,8,9]])
Returns:
123
456
789
How should I go about getting it to return something like this?
111222333
111222333
111222333
444555666
444555666
444555666
777888999
777888999
777888999

You'd have to use np.repeat twice here.
np.repeat(np.repeat(array, 3, axis=1), 3, axis=0)
# [[1 1 1 2 2 2 3 3 3]
# [1 1 1 2 2 2 3 3 3]
# [1 1 1 2 2 2 3 3 3]
# [4 4 4 5 5 5 6 6 6]
# [4 4 4 5 5 5 6 6 6]
# [4 4 4 5 5 5 6 6 6]
# [7 7 7 8 8 8 9 9 9]
# [7 7 7 8 8 8 9 9 9]
# [7 7 7 8 8 8 9 9 9]]

For fun (because the nested repeat will be more efficient), you could use einsum on the input array and an array of ones that has extra dimensions to create a multidimensional array with the dimensions in an ideal order to reshape to the expected 2D shape:
np.einsum('ij,ikjl->ikjl', array, np.ones((3,3,3,3))).reshape(9,9)
The generic method being:
i,j = array.shape
k = 3 # extra rows
l = 3 # extra cols
np.einsum('ij,ikjl->ikjl', a, np.ones((i,k,j,l))).reshape(i*k,j*l)
Output:
array([[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3],
[4, 4, 4, 5, 5, 5, 6, 6, 6],
[4, 4, 4, 5, 5, 5, 6, 6, 6],
[4, 4, 4, 5, 5, 5, 6, 6, 6],
[7, 7, 7, 8, 8, 8, 9, 9, 9],
[7, 7, 7, 8, 8, 8, 9, 9, 9],
[7, 7, 7, 8, 8, 8, 9, 9, 9]])
What is however nice with this method, is that it's quite easy to change the order to obtain other patterns or work with higher dimensions.
Example with other patterns:
>>> np.einsum('ij,iklj->iklj', a, np.ones((3,3,3,3))).reshape(9,9)
array([[1, 2, 3, 1, 2, 3, 1, 2, 3],
[1, 2, 3, 1, 2, 3, 1, 2, 3],
[1, 2, 3, 1, 2, 3, 1, 2, 3],
[4, 5, 6, 4, 5, 6, 4, 5, 6],
[4, 5, 6, 4, 5, 6, 4, 5, 6],
[4, 5, 6, 4, 5, 6, 4, 5, 6],
[7, 8, 9, 7, 8, 9, 7, 8, 9],
[7, 8, 9, 7, 8, 9, 7, 8, 9],
[7, 8, 9, 7, 8, 9, 7, 8, 9]])
>>> np.einsum('ij,kjil->kjil', a, np.ones((3,3,3,3))).reshape(9,9)
array([[1, 1, 1, 4, 4, 4, 7, 7, 7],
[2, 2, 2, 5, 5, 5, 8, 8, 8],
[3, 3, 3, 6, 6, 6, 9, 9, 9],
[1, 1, 1, 4, 4, 4, 7, 7, 7],
[2, 2, 2, 5, 5, 5, 8, 8, 8],
[3, 3, 3, 6, 6, 6, 9, 9, 9],
[1, 1, 1, 4, 4, 4, 7, 7, 7],
[2, 2, 2, 5, 5, 5, 8, 8, 8],
[3, 3, 3, 6, 6, 6, 9, 9, 9]])

Related

How to Make my Merge output Horizontally instead of Vertically in python

I have this python3 code that merges my sub-list to a single list:
l=[[4, 5, 6], [10], [1, 2, 3], [10], [1, 2, 3], [10], [4, 5, 6], [1, 2, 3], [4, 5, 6], [4, 5, 6], [7, 8, 9], [1, 2, 3], [7, 8, 9], [1, 2, 3], [4, 5, 6], [7, 8, 9], [4, 5, 6], [10], [7, 8, 9], [7, 8, 9]]
import itertools
merged = list(itertools.chain(*l))
from collections import Iterable
def flatten(items):
"""Yield items from any nested iterable; see Reference."""
for x in items:
if isinstance(x, Iterable) and not
isinstance(x, (str, bytes)):
for sub_x in flatten(x):
yield sub_x
else:
yield x
merged = list(itertools.chain(*l))
merged
The Undesired Shape of the Output
Though the output produces what I want but the shape of the output is not what I want
the output comes out in vertical shape as shown bellow:
[4,
5,
6,
10,
1,
2,
3,
10,
1,
2,
3,
10,
4,
5,
6,
1,
2,
3,
4,
5,
6,
4,
5,
6,
7,
8,
9,
1,
2,
3,
7,
8,
9,
1,
2,
3,
4,
5,
6,
7,
8,
9,
4,
5,
6,
10,
7,
8,
9,
7,
8,
9]
The Desirable Shape of the Output as I Would Want It
I would rather want the output to come out horizontally as I present bellow:
[4, 5, 6, 10, 1, 2, 3, 10, 1, 2, 3, 10, 4, 5, 6, 1, 2, 3, 4, 5, 6, 4, 5, 6, 7, 8, 9, 1, 2, 3, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 4, 5, 6, 10, 7, 8, 9, 7, 8, 9]
Please help me out, I will not mind if there is a way to make this happen different from my code.
Instead just use list comprehention:
l=[[4, 5, 6], [10], [1, 2, 3], [10], [1, 2, 3], [10], [4, 5, 6], [1, 2, 3], [4, 5, 6], [4, 5, 6], [7, 8, 9], [1, 2, 3], [7, 8, 9], [1, 2, 3], [4, 5, 6], [7, 8, 9], [4, 5, 6], [10], [7, 8, 9], [7, 8, 9]]
new_l=[j for i in l for j in i]
print(new_l)
Output :
C:\Users\Desktop>py x.py
[4, 5, 6, 10, 1, 2, 3, 10, 1, 2, 3, 10, 4, 5, 6, 1, 2, 3, 4, 5, 6, 4, 5, 6, 7, 8, 9, 1, 2, 3, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 4, 5, 6, 10, 7, 8, 9, 7, 8, 9]
You can do it like this:
In [5]: for x in merged:
...: print(x, end=' ')
...:
4 5 6 10 1 2 3 10 1 2 3 10 4 5 6 1 2 3 4 5 6 4 5 6 7 8 9 1 2 3 7 8 9 1 2 3 4 5 6 7 8 9 4 5 6 10 7 8 9 7 8 9
from collections import Counter
import itertools
import operator
list1=[[4, 5, 6], [10], [1, 2, 3], [10], [1, 2, 3], [10], [4, 5, 6], [1, 2, 3], [4, 5, 6], [4, 5, 6], [7, 8, 9], [1, 2, 3], [7, 8, 9], [1, 2, 3], [4, 5, 6], [7, 8, 9], [4, 5, 6], [10], [7, 8, 9], [7, 8, 9]]
dd = [item for sublist in list1 for item in sublist]
print(dd) # method 1
out1 = reduce(operator.concat,list1)
print(out1) # method 2
merged1 = list(itertools.chain.from_iterable(list1))
print(merged1) # method 3
merged2 = list(itertools.chain(*list1))
print(merged2) # method 4

How do I find the maximum value in an array within a dataframe column?

I have a dataframe (df) that looks like this:
a b
loc.1 [1, 2, 3, 4, 7, 5, 6]
loc.2 [3, 4, 3, 7, 7, 8, 6]
loc.3 [1, 4, 3, 1, 7, 8, 6]
...
I want to find the maximum of the array in column b and append this to the original data frame. My thought was something like this:
for line in df:
split = map(float,b.split(','))
count_max = max(split)
print count
Ideal output should be:
a b max_val
loc.1 [1, 2, 3, 4, 7, 5, 6] 7
loc.2 [3, 4, 3, 7, 7, 8, 6] 8
loc.3 [1, 4, 3, 1, 7, 8, 6] 8
...
But this does not work, as I cannot use b.split as it is not defined...
If working with lists without NaNs best is use max in list comprehension or map:
a['max'] = [max(x) for x in a['b']]
a['max'] = list(map(max, a['b']))
Pure pandas solution:
a['max'] = pd.DataFrame(a['b'].values.tolist()).max(axis=1)
Sample:
array = {'loc.1': np.array([ 1,2,3,4,7,5,6]),
'loc.2': np.array([ 3,4,3,7,7,8,6]),
'loc.3': np.array([ 1,4,3,1,7,8,6])}
L = [(k, v) for k, v in array.items()]
a = pd.DataFrame(L, columns=['a','b']).set_index('a')
a['max'] = [max(x) for x in a['b']]
print (a)
b max
a
loc.1 [1, 2, 3, 4, 7, 5, 6] 7
loc.2 [3, 4, 3, 7, 7, 8, 6] 8
loc.3 [1, 4, 3, 1, 7, 8, 6] 8
EDIT:
You can also get max in list comprehension:
L = [(k, v, max(v)) for k, v in array.items()]
a = pd.DataFrame(L, columns=['a','b', 'max']).set_index('a')
print (a)
b max
a
loc.1 [1, 2, 3, 4, 7, 5, 6] 7
loc.2 [3, 4, 3, 7, 7, 8, 6] 8
loc.3 [1, 4, 3, 1, 7, 8, 6] 8
Try this:
df["max_val"] = df["b"].apply(lambda x:max(x))
You can use numpy arrays for a vectorised calculation:
df = pd.DataFrame({'a': ['loc.1', 'loc.2', 'loc.3'],
'b': [[1, 2, 3, 4, 7, 5, 6],
[3, 4, 3, 7, 7, 8, 6],
[1, 4, 3, 1, 7, 8, 6]]})
df['maxval'] = np.array(df['b'].values.tolist()).max(axis=1)
print(df)
# a b maxval
# 0 loc.1 [1, 2, 3, 4, 7, 5, 6] 7
# 1 loc.2 [3, 4, 3, 7, 7, 8, 6] 8
# 2 loc.3 [1, 4, 3, 1, 7, 8, 6] 8

Concatenating columns of lists containing NaNs in a dataframe

I have a pandas df with two columns having either lists or NaN values. There are no rows having NaN in both columns. I want to create a third column that merges the values of the other two columns in the following way:-
if row df.a is NaN -> df.c = df.b
if row df.b is Nan -> df.c = df.a
else df.c = df.a + df.b
Input:-
df
a b
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
2 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
3 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
4 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
5 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
6 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
7 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
8 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
9 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
10 NaN [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
11 NaN [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
output:
df.c
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
3 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
4 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
5 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
6 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
7 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
8 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
9 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
10 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
11 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
I tried to use this nested condition with apply
df['c'] = df.apply(lambda x: x.a if x.b is float else (x.b if x.a is float else (x['a'] + x['b'])), axis = 1)
but is giving me this error :
TypeError: ('can only concatenate list (not "float") to list', u'occurred at index 0').
I am using ( and it's acutally working)
if x is float
because is the only way I found to separate a list from a NaN value.
When you use pd.DataFrame.stack null values are dropped by default. We can then group by the first level of the index and concatenate the lists together with sum
df.stack().groupby(level=0).sum()
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
3 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
4 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
5 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
6 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
7 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
8 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
9 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
10 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
11 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
dtype: object
We can then add it to a copy of the dataframe with assign
df.assign(c=df.stack().groupby(level=0).sum())
Or add it to a new column in place
df['c'] = df.stack().groupby(level=0).sum()
You can convert the NaNs to list, and then apply np.sum:
In [718]: df['c'] = df[['a', 'b']].applymap(lambda x: [] if x != x else x).apply(np.sum, axis=1); df['c']
Out[718]:
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
3 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
4 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
5 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
6 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
7 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, ...
8 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, ...
9 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
10 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
Name: c, dtype: object
This works for any number of columns that have list/NaN contents.
You can use fillna for replace NaNs to empty list first:
df = pd.DataFrame({'a': [[0, 1, 2], np.nan, [0, 1, 2]],
'b':[np.nan,[0, 1, 2],[ 5, 6, 7, 8, 9]]})
print (df)
s = pd.Series([[]], index=df.index)
df['c'] = df['a'].fillna(s) + df['b'].fillna(s)
print (df)
a b c
0 [0, 1, 2] NaN [0, 1, 2]
1 NaN [0, 1, 2] [0, 1, 2]
2 [0, 1, 2] [5, 6, 7, 8, 9] [0, 1, 2, 5, 6, 7, 8, 9]

Two Combination Lists from One List

I am a python beginner. I am trying to get two combination lists from one list.
For example, I have a list:
c = [1, 2, 3, 4]
I want to get every possible combination using every four items to fill two lists. There are going to be ((2^4)/2)-1 possibilities.
c1 = [1] c2 = [2, 3, 4]
c1 = [2] c2 = [1, 3, 4]
c1 = [3] c2 = [2, 3, 4]
c1 = [4] c2 = [1, 2, 3]
c1 = [1, 2] c2 = [3, 4]
c1 = [1, 3] c2 = [2, 4]
c1 = [1, 4] c2 = [2, 3]
The function usually works for this kind of task is itertools, but I cannot choose the number of lists produced by itertools.combination.
The function only allows me to choose how many items per one list should be.
For example, If I try following function,
print list(itertools.combinations(c, 2))
I can get an outcome only like this.
[(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)]
I searched pretty hard to find this, but I couldn't find anything.
Update
Oh my poor English is causing such a confusion! I totally changed my example. I wanted to allocate the 4 items to 2 lists. Sorry for the confusion!
I'm unsure as to what your understanding of 10 choose 2 is. The output you receive from list(itertools.combinations(c, 2)) is what is mathematically defined as 10C2.
EDIT
From the edit to your question, it appears that you want an entirely different kind of combinations. The number of outcomes would still not be 45, but instead: 10C1 + 10C2 + 10C3 + 10C4 + 10C5.
I expect the following should help you move forward:
for i in range(1, 6):
for c1 in itertools.combinations(c, i):
c1 = set(c1)
c2 = set(c) - c1
print c1, c2
The above code was inspired by this (deleted) answer by CSZ.
The output received when range(1, 3) is used:
[1] [2, 3, 4, 5, 6, 7, 8, 9, 10]
[2] [1, 3, 4, 5, 6, 7, 8, 9, 10]
[3] [1, 2, 4, 5, 6, 7, 8, 9, 10]
[4] [1, 2, 3, 5, 6, 7, 8, 9, 10]
[5] [1, 2, 3, 4, 6, 7, 8, 9, 10]
[6] [1, 2, 3, 4, 5, 7, 8, 9, 10]
[7] [1, 2, 3, 4, 5, 6, 8, 9, 10]
[8] [1, 2, 3, 4, 5, 6, 7, 9, 10]
[9] [1, 2, 3, 4, 5, 6, 7, 8, 10]
[10] [1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2] [3, 4, 5, 6, 7, 8, 9, 10]
[1, 3] [2, 4, 5, 6, 7, 8, 9, 10]
[1, 4] [2, 3, 5, 6, 7, 8, 9, 10]
[1, 5] [2, 3, 4, 6, 7, 8, 9, 10]
[1, 6] [2, 3, 4, 5, 7, 8, 9, 10]
[1, 7] [2, 3, 4, 5, 6, 8, 9, 10]
[8, 1] [2, 3, 4, 5, 6, 7, 9, 10]
[1, 9] [2, 3, 4, 5, 6, 7, 8, 10]
[1, 10] [2, 3, 4, 5, 6, 7, 8, 9]
[2, 3] [1, 4, 5, 6, 7, 8, 9, 10]
[2, 4] [1, 3, 5, 6, 7, 8, 9, 10]
[2, 5] [1, 3, 4, 6, 7, 8, 9, 10]
[2, 6] [1, 3, 4, 5, 7, 8, 9, 10]
[2, 7] [1, 3, 4, 5, 6, 8, 9, 10]
[8, 2] [1, 3, 4, 5, 6, 7, 9, 10]
[9, 2] [1, 3, 4, 5, 6, 7, 8, 10]
[2, 10] [1, 3, 4, 5, 6, 7, 8, 9]
[3, 4] [1, 2, 5, 6, 7, 8, 9, 10]
[3, 5] [1, 2, 4, 6, 7, 8, 9, 10]
[3, 6] [1, 2, 4, 5, 7, 8, 9, 10]
[3, 7] [1, 2, 4, 5, 6, 8, 9, 10]
[8, 3] [1, 2, 4, 5, 6, 7, 9, 10]
[9, 3] [1, 2, 4, 5, 6, 7, 8, 10]
[10, 3] [1, 2, 4, 5, 6, 7, 8, 9]
[4, 5] [1, 2, 3, 6, 7, 8, 9, 10]
[4, 6] [1, 2, 3, 5, 7, 8, 9, 10]
[4, 7] [1, 2, 3, 5, 6, 8, 9, 10]
[8, 4] [1, 2, 3, 5, 6, 7, 9, 10]
[9, 4] [1, 2, 3, 5, 6, 7, 8, 10]
[10, 4] [1, 2, 3, 5, 6, 7, 8, 9]
[5, 6] [1, 2, 3, 4, 7, 8, 9, 10]
[5, 7] [1, 2, 3, 4, 6, 8, 9, 10]
[8, 5] [1, 2, 3, 4, 6, 7, 9, 10]
[9, 5] [1, 2, 3, 4, 6, 7, 8, 10]
[10, 5] [1, 2, 3, 4, 6, 7, 8, 9]
[6, 7] [1, 2, 3, 4, 5, 8, 9, 10]
[8, 6] [1, 2, 3, 4, 5, 7, 9, 10]
[9, 6] [1, 2, 3, 4, 5, 7, 8, 10]
[10, 6] [1, 2, 3, 4, 5, 7, 8, 9]
[8, 7] [1, 2, 3, 4, 5, 6, 9, 10]
[9, 7] [1, 2, 3, 4, 5, 6, 8, 10]
[10, 7] [1, 2, 3, 4, 5, 6, 8, 9]
[8, 9] [1, 2, 3, 4, 5, 6, 7, 10]
[8, 10] [1, 2, 3, 4, 5, 6, 7, 9]
[9, 10] [1, 2, 3, 4, 5, 6, 7, 8]
l = [1,2,3,4, 5, 6, 7, 8]
print [[l[:i], l[i:]] for i in range(1, len(l))]
If you want all combinations. you can do like this.
print [l[i:i+n] for i in range(len(l)) for n in range(1, len(l)-i+1)]
or
itertools.combinations

Add numpy array as column to Pandas data frame

I have a Pandas data frame object of shape (X,Y) that looks like this:
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
and a numpy sparse matrix (CSC) of shape (X,Z) that looks something like this
[[0, 1, 0],
[0, 0, 1],
[1, 0, 0]]
How can I add the content from the matrix to the data frame in a new named column such that the data frame will end up like this:
[[1, 2, 3, [0, 1, 0]],
[4, 5, 6, [0, 0, 1]],
[7, 8, 9, [1, 0, 0]]]
Notice the data frame now has shape (X, Y+1) and rows from the matrix are elements in the data frame.
import numpy as np
import pandas as pd
import scipy.sparse as sparse
df = pd.DataFrame(np.arange(1,10).reshape(3,3))
arr = sparse.coo_matrix(([1,1,1], ([0,1,2], [1,2,0])), shape=(3,3))
df['newcol'] = arr.toarray().tolist()
print(df)
yields
0 1 2 newcol
0 1 2 3 [0, 1, 0]
1 4 5 6 [0, 0, 1]
2 7 8 9 [1, 0, 0]
Consider using a higher dimensional datastructure (a Panel), rather than storing an array in your column:
In [11]: p = pd.Panel({'df': df, 'csc': csc})
In [12]: p.df
Out[12]:
0 1 2
0 1 2 3
1 4 5 6
2 7 8 9
In [13]: p.csc
Out[13]:
0 1 2
0 0 1 0
1 0 0 1
2 1 0 0
Look at cross-sections etc, etc, etc.
In [14]: p.xs(0)
Out[14]:
csc df
0 0 1
1 1 2
2 0 3
See the docs for more on Panels.
df = pd.DataFrame(np.arange(1,10).reshape(3,3))
df['newcol'] = pd.Series(your_2d_numpy_array)
You can add and retrieve a numpy array from dataframe using this:
import numpy as np
import pandas as pd
df = pd.DataFrame({'b':range(10)}) # target dataframe
a = np.random.normal(size=(10,2)) # numpy array
df['a']=a.tolist() # save array
np.array(df['a'].tolist()) # retrieve array
This builds on the previous answer that confused me because of the sparse part and this works well for a non-sparse numpy arrray.
Here is other example:
import numpy as np
import pandas as pd
""" This just creates a list of touples, and each element of the touple is an array"""
a = [ (np.random.randint(1,10,10), np.array([0,1,2,3,4,5,6,7,8,9])) for i in
range(0,10) ]
""" Panda DataFrame will allocate each of the arrays , contained as a touple
element , as column"""
df = pd.DataFrame(data =a,columns=['random_num','sequential_num'])
The secret in general is to allocate the data in the form a = [ (array_11, array_12,...,array_1n),...,(array_m1,array_m2,...,array_mn) ] and panda DataFrame will order the data in n columns of arrays. Of course , arrays of arrays could be used instead of touples, in that case the form would be :
a = [ [array_11, array_12,...,array_1n],...,[array_m1,array_m2,...,array_mn] ]
This is the output if you print(df) from the code above:
random_num sequential_num
0 [7, 9, 2, 2, 5, 3, 5, 3, 1, 4] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 [8, 7, 9, 8, 1, 2, 2, 6, 6, 3] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2 [3, 4, 1, 2, 2, 1, 4, 2, 6, 1] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
3 [3, 1, 1, 1, 6, 2, 8, 6, 7, 9] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
4 [4, 2, 8, 5, 4, 1, 2, 2, 3, 3] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
5 [3, 2, 7, 4, 1, 5, 1, 4, 6, 3] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
6 [5, 7, 3, 9, 7, 8, 4, 1, 3, 1] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
7 [7, 4, 7, 6, 2, 6, 3, 2, 5, 6] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
8 [3, 1, 6, 3, 2, 1, 5, 2, 2, 9] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
9 [7, 2, 3, 9, 5, 5, 8, 6, 9, 8] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Other variation of the example above:
b = [ (i,"text",[14, 5,], np.array([0,1,2,3,4,5,6,7,8,9])) for i in
range(0,10) ]
df = pd.DataFrame(data=b,columns=['Number','Text','2Elemnt_array','10Element_array'])
Output of df:
Number Text 2Elemnt_array 10Element_array
0 0 text [14, 5] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 1 text [14, 5] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2 2 text [14, 5] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
3 3 text [14, 5] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
4 4 text [14, 5] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
5 5 text [14, 5] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
6 6 text [14, 5] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
7 7 text [14, 5] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
8 8 text [14, 5] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
9 9 text [14, 5] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
If you want to add other columns of arrays, then:
df['3Element_array']=[([1,2,3]),([1,2,3]),([1,2,3]),([1,2,3]),([1,2,3]),([1,2,3]),([1,2,3]),([1,2,3]),([1,2,3]),([1,2,3])]
The final output of df will be:
Number Text 2Elemnt_array 10Element_array 3Element_array
0 0 text [14, 5] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [1, 2, 3]
1 1 text [14, 5] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [1, 2, 3]
2 2 text [14, 5] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [1, 2, 3]
3 3 text [14, 5] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [1, 2, 3]
4 4 text [14, 5] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [1, 2, 3]
5 5 text [14, 5] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [1, 2, 3]
6 6 text [14, 5] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [1, 2, 3]
7 7 text [14, 5] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [1, 2, 3]
8 8 text [14, 5] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [1, 2, 3]
9 9 text [14, 5] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [1, 2, 3]

Categories

Resources