Creating dataframe from list of arrays (varying shapes) - python

I am trying to convert a list of arrays of varying shapes to a dataframe.
import numpy as np
import pandas as pd
data = [np.array([[1, 2], [1, 3], [1, 1]]),
np.array([[1, 2, 3], [3, 1, 2], [3, 2, 1]])]
names = ['A', 'B']
df = pd.DataFrame(data=data, columns=names)
df
However, this gives the error-
ValueError: Shape of passed values is (2, 1), indices imply (2, 2)
I then tried-
df = pd.DataFrame(np.array([None, *data], dtype=object)[1:]).T
df
0 1
0 [[1, 2], [1, 3], [1, 1]] [[1, 2, 3], [3, 1, 2], [3, 2, 1]]
Which is not my desired output.
I want each inner list in as separate rows, like the following:
A B
0 [1, 2] [1, 2, 3]
1 [1, 3] [3, 1, 2]
2 [1, 1] [3, 2, 1]
Not sure how to proceed.

Try:
pd.DataFrame(dict((k,list(v)) for k,v in zip(names, data)))
Output:
A B
0 [1, 2] [1, 2, 3]
1 [1, 3] [3, 1, 2]
2 [1, 1] [3, 2, 1]

Let us try concat , it will do it one by one sub-data
out = pd.concat([pd.Series(list(x)) for x in data], keys=names, axis=1)
A B
0 [1, 2] [1, 2, 3]
1 [1, 3] [3, 1, 2]
2 [1, 1] [3, 2, 1]

this is what worked for me, istead of sending the data as nasted lists i sended a dictionary which define its values for each column name, this way pandas didnt converted it to 3 columns:
data = [array([[1, 2],
[1, 3],
[1, 1]]),
array([[1, 2, 3],
[3, 1, 2],
[3, 2, 1]])]
names = ['A', 'B']
pd.DataFrame({name:l.tolist() for name,l in zip(names,data)})
Out[5]:
A B
0 [1, 2] [1, 2, 3]
1 [1, 3] [3, 1, 2]
2 [1, 1] [3, 2, 1]
the wrong way
pd.DataFrame(data)
>>>
0
0 [[1, 2], [1, 3], [1, 1]]
1 [[1, 2, 3], [3, 1, 2], [3, 2, 1]]
# or
pd.DataFrame([l.tolist() for l in data])
>>>
0 1 2
0 [1, 2] [1, 3] [1, 1]
1 [1, 2, 3] [3, 1, 2] [3, 2, 1]

Related

How to make a recursive function to generate the combination of numbers eg. for n=3, (1,1,1),(1,1,2) and so on?

def generate(n):
t=[]
lol=[[] for i in range(n**n)]
helper(n,t,lol)
return(lol)
def helper(n,t,lol):
global j
if len(t)==n:
lol[j]=lol[j]+t
j += 1
return
for i in range(1,n+1):
print(i)
t.append(i)
helper(n,t,lol)
t.pop()
j=0
print(generate(2))
print(generate(3))
Here, for n=2, i'm getting expected answer.
But for, n=3, it is showing Index Error:
in helper
lol[j]=lol[j]+t
IndexError: list index out of range
Try this code:
def generate(n):
def helper(m, n, s):
if m==0:
print(s)
else:
for x in range(1,n+1):
helper(m-1, n, s+[x])
assert n>=1
helper(n, n, [])
Examples:
>>> generate(1)
[1]
>>> generate(2)
[1, 1]
[1, 2]
[2, 1]
[2, 2]
>>> generate(3)
[1, 1, 1]
[1, 1, 2]
[1, 1, 3]
[1, 2, 1]
[1, 2, 2]
[1, 2, 3]
[1, 3, 1]
[1, 3, 2]
[1, 3, 3]
[2, 1, 1]
[2, 1, 2]
[2, 1, 3]
[2, 2, 1]
[2, 2, 2]
[2, 2, 3]
[2, 3, 1]
[2, 3, 2]
[2, 3, 3]
[3, 1, 1]
[3, 1, 2]
[3, 1, 3]
[3, 2, 1]
[3, 2, 2]
[3, 2, 3]
[3, 3, 1]
[3, 3, 2]
[3, 3, 3]
>>>

Get list for duplicates on an other list python

I need help to get a list from an other :
input :
[[1, 1], [1, 1], [2, 2], [1, 1], [1, 1], [2, 2], [3, 3], [4, 4]]
output wanted :
[0, 0, 1, 0, 0, 1, 2, 3]
I tried to use enumerate but I fail, any suggestion ?
Edit : Every time I meet a new element in the list, I associate this new element with a number (start from 0 and +1 every new element) and if I recognize it later I put the same number, so [1,1] --> 0 because is the first element we met and [2,2] --> 1 etc...
Okay I found a solution :
One more thing before, my example is bad because I can have [1,2] in element of the list for input
the solution I found is
line = [[1, 1], [1, 1], [2, 2], [1, 1], [2, 1], [2, 2], [3, 3], [4, 4]]
p = []
line_not = []
k = 0
for i in range (len(line)):
if line[i] in line[:i]:
p.append(line_not[:k].index(line[i]))
else:
p.append(k)
line_not.append(line[i])
k+=1
the output is :
[0, 0, 1, 0, 2, 1, 3, 4]
If u have a better solution, tell me !
try to make a map, this works:
inp=[[1, 1], [1, 1], [2, 2], [1, 1], [1, 1], [2, 2], [3, 3], [4, 4]]
out = [0, 0, 1, 0, 0, 1, 2, 3]
mymap={inp[0][0]:0}
output = [0]
k_count=1
for i in inp[1:]:
if i[0] in mymap.keys():
output.append(mymap[i[0]])
else:
mymap[i[0]] = k_count
output.append(mymap[i[0]])
k_count+=1
and then output == [0, 0, 1, 0, 0, 1, 2, 3]
First build a dictionary that does the assocation of each unique element with a number:
>>> x = [[1, 1], [1, 1], [2, 2], [1, 1], [1, 1], [2, 2], [3, 3], [4, 4]]
>>> d = {}
>>> for [i, _] in x:
... if i not in d:
... d[i] = len(d)
...
and then you can easily build your output list by doing lookups in that dictionary:
>>> [d[i] for [i, _] in x]
[0, 0, 1, 0, 0, 1, 2, 3]
this would work in your current example, but it is not a comprehensive solution. Without context its hard to understand what you are trying to achieve, so use with care:
import numpy as np
inp = [[1, 1], [1, 1], [2, 2], [1, 1], [1, 1], [2, 2], [3, 3], [4, 4]]
out = np.array([i[0] for i in inp]) - 1
print(out) # result: [0 0 1 0 0 1 2 3]

python: assign and read out the values of a 2-d list [duplicate]

This question already has answers here:
List of lists changes reflected across sublists unexpectedly
(17 answers)
Closed 4 years ago.
Here is a snippet of python code:
a = [[0]*3]*4
for i in range(4):
for j in range(3):
a[i][j] = i+j
print(a[i][j])
print(a)
However, the outputs of the two prints are different.
The former one prints what I want. And the second prints all the same for the 4 sublists.
It seems the problem of shallow copying. I really don't understand how and why it happens.
Update:
After I have solved this, I found another problem:
a = [[0]*3]*4
for i in range(4):
a[i] = [i, 2*i, 3*i]
The result is also what I want. I'm once again confused about this.
Who can tell me the difference?
a = [[0]*3]*4
for i in range(4):
for j in range(3):
a[i][j] = i+j
print(a)
print(a[i][j])//show the execution at every step
print(a)
At each step the list with same column is updated with the same value.
output is
[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
0
[[0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0]]
1
[[0, 1, 2], [0, 1, 2], [0, 1, 2], [0, 1, 2]]
2
[[1, 1, 2], [1, 1, 2], [1, 1, 2], [1, 1, 2]]
1
[[1, 2, 2], [1, 2, 2], [1, 2, 2], [1, 2, 2]]
2
[[1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]]
3
[[2, 2, 3], [2, 2, 3], [2, 2, 3], [2, 2, 3]]
2
[[2, 3, 3], [2, 3, 3], [2, 3, 3], [2, 3, 3]]
3
[[2, 3, 4], [2, 3, 4], [2, 3, 4], [2, 3, 4]]
4
[[3, 3, 4], [3, 3, 4], [3, 3, 4], [3, 3, 4]]
3
[[3, 4, 4], [3, 4, 4], [3, 4, 4], [3, 4, 4]]
4
[[3, 4, 5], [3, 4, 5], [3, 4, 5], [3, 4, 5]]
5
[[3, 4, 5], [3, 4, 5], [3, 4, 5], [3, 4, 5]]
The multiplier of operator taking a list and an int will make makes shallow copies of the elements of the list, so you're on the right track.
Initialise like this instead
a = [[0] * 3 for _ in range(4)]

filter numpy array by another array of different shape

Given:
a = [[0, 1], [2, 2], [4, 2]]
b = [[0, 1, 2, 3], [2, 2, 3, 4], [4, 2, 3, 3], [2, 3, 3, 3]]
Solution is:
for (i,j) in zip(a[:, 0], a[:, 1]):
print b[np.logical_and( a[:, 0] == i, a[:, 1] == j)]
Result should be
[[0 1 2 3]]
[[2 2 3 4]]
[[4 2 3 3]]
Is there any solution for this problem without use of the for-loop?
You can use NumPy broadcasting for a vectorized solution, like so -
# 2D mask corresponding to all iterations of :
# "np.logical_and( a[:, 0] == i, a[:, 1] == j)"
mask = (a[:,None,0] == a[:,0]) & (a[:,None,1] == a[:,1])
# Use column indices of valid ones for indexing into b for final output
_,C_idx = np.where(mask)
out = b[C_idx]
Sample run -
In [67]: # Modified generic case
...: a = np.array([[0, 1], [3, 2], [3, 2]])
...: b = np.array([[0, 1, 2, 3], [2, 2, 3, 4], [4, 2, 3, 3], [2, 3, 3, 3]])
...:
...: for (i,j) in zip(a[:, 0], a[:, 1]):
...: print b[np.logical_and( a[:, 0] == i, a[:, 1] == j)]
...:
[[0 1 2 3]]
[[2 2 3 4]
[4 2 3 3]]
[[2 2 3 4]
[4 2 3 3]]
In [68]: mask = (a[:,None,0] == a[:,0]) & (a[:,None,1] == a[:,1])
...: _,C_idx = np.where(mask)
...: out = b[C_idx]
...:
In [69]: out
Out[69]:
array([[0, 1, 2, 3],
[2, 2, 3, 4],
[4, 2, 3, 3],
[2, 2, 3, 4],
[4, 2, 3, 3]])
Given:
a = np.array(([0, 1], [2, 2], [4, 2]))
b = np.array(([0, 1, 2, 3], [2, 2, 3, 4], [4, 2, 3, 3], [2, 3, 3, 3]))
Calculate:
temp = np.in1d(b[:,0], a[:,0]) * np.in1d(b[:,1], a[:,1])
result = b[temp]
print 'result:', result
Output:
result: [[0 1 2 3]
[2 2 3 4]
[4 2 3 3]]

Transposing a nxn Matrix in Python using only for/while loops

I wanted to know why the solution I have written doesn't work:
def transpose(sudoku):
n = len(sudoku)
l_tr = [0]*n
k = 0
tr_sudoku = [0]*n
while k < n:
tr_sudoku[k] = l_tr
k = k+1
i = 0
for i in range(len(sudoku)):
j = 0
for j in range(len(sudoku)):
tr_sudoku[i][j] = sudoku[j][i]
print j, i, tr_sudoku, sudoku[i][j]
print tr_sudoku
return tr_sudoku
correct = [[1,2,3],[2,3,1],[3,1,2]]
print transpose(correct)
It outputs the following incorrect solution:
0 0 [[1, 0, 0], [1, 0, 0], [1, 0, 0]] 1
1 0 [[1, 2, 0], [1, 2, 0], [1, 2, 0]] 2
2 0 [[1, 2, 3], [1, 2, 3], [1, 2, 3]] 3
[[1, 2, 3], [1, 2, 3], [1, 2, 3]]
0 1 [[2, 2, 3], [2, 2, 3], [2, 2, 3]] 2
1 1 [[2, 3, 3], [2, 3, 3], [2, 3, 3]] 3
2 1 [[2, 3, 1], [2, 3, 1], [2, 3, 1]] 1
[[2, 3, 1], [2, 3, 1], [2, 3, 1]]
0 2 [[3, 3, 1], [3, 3, 1], [3, 3, 1]] 3
1 2 [[3, 1, 1], [3, 1, 1], [3, 1, 1]] 1
2 2 [[3, 1, 2], [3, 1, 2], [3, 1, 2]] 2
[[3, 1, 2], [3, 1, 2], [3, 1, 2]]
[[3, 1, 2], [3, 1, 2], [3, 1, 2]]
Help would be appreciated! Thanks.
The ideal correct solution to:
correct = [[1,2,4],[2,3,4],[3,4,2]]
would be:
tr_correct = [[1,2,3],[2,3,4],[4,4,2]]
You can easily transpose with zip:
def transpose(sudoku):
return list(map(list, zip(*sudoku)))
Example output:
>>> correct = [[1,2,3],[2,3,1],[3,1,2]]
>>> transpose(correct)
[[1, 2, 3], [2, 3, 1], [3, 1, 2]]
The easiest "manual" way is to switch rows and columns:
def transpose_manually(sudoku):
output = sudoku[:] # new grid the same size
for r in range(len(sudoku)): # each row
for c in range(len(sudoku[0])): # each column
output[c][r] = sudoku[r][c] # switch
return output

Categories

Resources