i have 3 list, such as below:
list1 = [1,2]
list2 = [x,y]
list3 = [i,j,l]
how can i multiple them and save into a pandas dataframe, as following dataframe:
list1 list2 list3
1 x i
1 x j
1 x l
1 y i
1 y j
1 y l
2 x i
2 x j
2 x l
2 y i
2 y j
2 y l
i couldn't find any similar question on Stackoverflow.
You can use:
import itertools
df_new=pd.DataFrame(list(itertools.product(list1,list2,list3)),\
columns=['list1','list2','list3'])
print(df_new)
list1 list2 list3
0 1 x i
1 1 x j
2 1 x l
3 1 y i
4 1 y j
5 1 y l
6 2 x i
7 2 x j
8 2 x l
9 2 y i
10 2 y j
11 2 y l
Hack from pandas
pd.MultiIndex.from_product([list1,list2,list3],names=['list1','list2','list3']).to_frame().reset_index(drop=True)
Out[196]:
list1 list2 list3
0 1 x i
1 1 x j
2 1 x l
3 1 y i
4 1 y j
5 1 y l
6 2 x i
7 2 x j
8 2 x l
9 2 y i
10 2 y j
11 2 y l
Related
I want to divide below Columns_A and Columns_B into 3 columns.
What approach I am thinking of creating(but no idea what to write in python):
breakdown Columns_A and Columns_B into 3 columns
merge pass_one and pass_two and pass_three
append Columns_C and Columns_D to the longest values of list
Original data(I changed it to list of lists):
Columns_A
Columns_B
Columns_C
Columns_D
1
A
X
Y
1
A
X
Y
1
A
X
Y
2
B
X
Y
2
B
X
Y
3
C
X
Y
3
C
X
Y
3
C
X
Y
3
C
X
Y
11
D
Z
Q
12
E
Z
Q
12
E
Z
Q
12
E
Z
Q
13
F
Z
Q
13
F
Z
Q
What I would like to create:
Columns_A_1
Columns_B_1
Columns_A_2
Columns_B_2
Columns_A_3
Columns_B_3
Columns_C
Columns_D
1
A
2
B
3
C
X
Y
1
A
2
B
3
C
X
Y
1
A
Blacnk
Blacnk
3
C
X
Y
Blacnk
Blacnk
Blacnk
Blacnk
3
C
X
Y
11
D
12
E
13
F
Z
Q
Blank
Blank
12
E
13
F
Z
Q
Blank
Blank
12
E
Blank
Blank
Z
Q
Code that I tried but didn't work (no error but pass_two & pass_two output blank):
#â‘ breakdown Columns_A and Columns_B into 3 columns
!pip install pandas
import pandas as pd
dic = {'Column_A': ["1","1","1","2","2","3","3","3","3","11","12","12","12","13","13"],
'Column_B': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C', 'C', 'D', 'E', 'E', 'E', 'F', 'F'],
'Column_C': ['X'] * 9 + ['Z'] * 6,
'Column_D': ['Y'] * 9 + ['Q'] * 6,}
df = pd.DataFrame(dic)
list_data = df.values.tolist()
pass_one = [] #Columns_A_1 and Columns_A_1
pass_two = [] #Columns_A_2 and Columns_B_2
pass_three = [] #Columns_A_3 and Columns_B_3
for row in list_data:
Columns_A = row[0]
Columns_B = row[1]
Columns_C = row[2]
Columns_D = row[3]
list_one = [Columns_A ,Columns_B] #I would like to append these data set
if Columns_C in Columns_C and Columns_A not in Columns_A:
pass_two.append(list_one)
if Columns_C in Columns_C and Columns_A not in Columns_A:
pass_three.append(list_one)
else:
pass_one.append(list_one)
Once Columns_A and Columns_B is separated into 3 list of lists:
I would like to merge pass_one and pass_two and pass_three
At last, append Columns_C and Columns_D to the longest values of list
Does anyone have any ideas how to do this??
This is not a complete answer, but perhaps it'll get you one step further. I assumed your sort criteria was Column_A mod 10:
# create the column we can group by; column A integers mod 10
df['Column_A_sort'] = df['Column_A'].astype(int) % 10
# group by that value
df.groupby('Column_A_sort').agg(list)
Output:
for i in g.groups:
print(g.get_group(i))
prints:
Column_A Column_B Column_C Column_D Column_A_sort
0 1 A X Y 1
1 1 A X Y 1
2 1 A X Y 1
9 11 D Z Q 1
Column_A Column_B Column_C Column_D Column_A_sort
3 2 B X Y 2
4 2 B X Y 2
10 12 E Z Q 2
11 12 E Z Q 2
12 12 E Z Q 2
Column_A Column_B Column_C Column_D Column_A_sort
5 3 C X Y 3
6 3 C X Y 3
7 3 C X Y 3
8 3 C X Y 3
13 13 F Z Q 3
14 13 F Z Q 3
As ignoring_gravity suggests, in order to go further, it'd be helpful to understand exactly your criteria for sorting and recombining the columns.
I have a dataframe like this
import pandas as pd
df1 = pd.DataFrame({
'key': list('AAABBC'),
'prop1': list('xyzuuy'),
'prop2': list('mnbnbb')
})
key prop1 prop2
0 A x m
1 A y n
2 A z b
3 B u n
4 B u b
5 C y b
and a dictionary like this (user input):
d = {
'A': 2,
'B': 1,
'C': 3,
}
The keys of d refer to entries in column key in df1, the values indicate how often the rows of df1 that belong to the respective keys should be present: 1 means that nothing has to be done, 2 means all lines should be copied once, 3 they should be copied twice.
For the example above, the expected output looks as follows:
key prop1 prop2
0 A x m
1 A y n
2 A z b
3 B u n
4 B u b
5 C y b
6 A x m # <-- copied, copy 1
7 A y n # <-- copied, copy 1
8 A z b # <-- copied, copy 1
9 C y b # <-- copied, copy 1
10 C y b # <-- copied, copy 2
So, the rows that belong to A have been copied once and added to df1, nothing had to be done about the rows the belong to B and the rows that belong to C have been copied twice and were also added to df1.
I currently implement this as follows:
dfs_to_add = []
for el, val in d.items():
if val > 1:
_temp_df = pd.concat(
[df1[df1['key'] == el]] * (val-1)
)
dfs_to_add.append(_temp_df)
df_to_add = pd.concat(dfs_to_add)
df_final = pd.concat([df1, df_to_add]).reset_index(drop=True)
which gives me the desired output.
The code is rather ugly; does anyone see a more straightforward option to get to the same output?
The order is important, so in case of A, I would need
0 A x m
1 A y n
2 A z b
0 A x m
1 A y n
2 A z b
and not
0 A x m
0 A x m
1 A y n
1 A y n
2 A z b
2 A z b
We can sue concat + groupby
df=pd.concat([ pd.concat([y]*d.get(x)) for x , y in df1.groupby('key')])
key prop1 prop2
0 A x m
1 A y n
2 A z b
0 A x m
1 A y n
2 A z b
3 B u n
4 B u b
5 C y b
5 C y b
5 C y b
One way using Index.repeat with loc[] and series.map:
m = df1.set_index('key',append=True)
out = m.loc[m.index.repeat(df1['key'].map(d))].reset_index('key')
print(out)
key prop1 prop2
0 A x m
0 A x m
1 A y n
1 A y n
2 A z b
2 A z b
3 B u n
4 B u b
5 C y b
5 C y b
5 C y b
You can try repeat:
df1.loc[df1.index.repeat(df1['key'].map(d))]
Output:
key prop1 prop2
0 A x m
0 A x m
1 A y n
1 A y n
2 A z b
2 A z b
3 B u n
4 B u b
5 C y b
5 C y b
5 C y b
If order is not important, use another solutions.
If order is important get indices of repeated values, repeat by loc and add to original:
idx = [x for k, v in d.items() for x in df1.index[df1['key'] == k].repeat(v-1)]
df = df1.append(df1.loc[idx], ignore_index=True)
print (df)
key prop1 prop2
0 A x m
1 A y n
2 A z b
3 B u n
4 B u b
5 C y b
6 A x m
7 A y n
8 A z b
9 C y b
10 C y b
Using DataFrame.merge and np.repeat:
df = df1.merge(
pd.Series(np.repeat(list(d.keys()), list(d.values())), name='key'), on='key')
Result:
# print(df)
key prop1 prop2
0 A x m
1 A x m
2 A y n
3 A y n
4 A z b
5 A z b
6 B u n
7 B u b
8 C y b
9 C y b
10 C y b
Here's my pseudo code
source
a b c d e
0 x x x x x
1 x x x x x
2 x x x x x
3 x x x x x
4 x x x x x
5 x x x x x
And then I have a lookup dataframe
lookup
a b c
0 1 2 3
Is there any function that would behave something like this - pd.source.overlay(lookup[2,c]) - producing an "overlay" at a specific position?
a b c d e
0 x x x x x
1 x x x x x
2 x x 1 2 3
3 x x x x x
4 x x x x x
5 x x x x x
Like this:
In [898]: df.iloc[2, -3:] = lu.values
In [899]: df
Out[899]:
a b c d e
0 x x x x x
1 x x x x x
2 x x 1 2 3
3 x x x x x
4 x x x x x
5 x x x x x
First we ge the index , then assign the value
df.values[2,2:]=lu.values
df
a b c d e
0 x x x x x
1 x x x x x
2 x x 1 2 3
3 x x x x x
4 x x x x x
5 x x x x x
col='c'
df.values[2,df.columns.get_indexer([col])[0]:]=lu.values
Setup
import pandas as pd
from string import ascii_uppercase
df = pd.DataFrame(np.array(list(ascii_uppercase[:25])).reshape(5, 5))
df
0 1 2 3 4
0 A B C D E
1 F G H I J
2 K L M N O
3 P Q R S T
4 U V W X Y
Question
How do I concatenate the strings along the off diagonals?
Expected Result
0 A
1 FB
2 KGC
3 PLHD
4 UQMIE
5 VRNJ
6 WSO
7 XT
8 Y
dtype: object
What I Tried
df.unstack().groupby(sum).sum()
This works fine. But #Zero's answer is far faster.
You could do
In [1766]: arr = df.values[::-1, :] # or np.flipud(df.values)
In [1767]: N = arr.shape[0]
In [1768]: [''.join(arr.diagonal(i)) for i in range(-N+1, N)]
Out[1768]: ['A', 'FB', 'KGC', 'PLHD', 'UQMIE', 'VRNJ', 'WSO', 'XT', 'Y']
In [1769]: pd.Series([''.join(arr.diagonal(i)) for i in range(-N+1, N)])
Out[1769]:
0 A
1 FB
2 KGC
3 PLHD
4 UQMIE
5 VRNJ
6 WSO
7 XT
8 Y
dtype: object
You may also do arr.diagonal(i).sum() but ''.join is more explicit.
I wrote this code to print out the multiplication table from 1 to 9, but it prints it out without a new line between the different tables. Does anyone know how to fix this?
for i in range(1, 10):
for j in range(1, 10):
k = i * j
print(i,"x",j, "=", k)
the result is this:
1 x 1 = 1
1 x 2 = 2
1 x 3 = 3
1 x 4 = 4
1 x 5 = 5
1 x 6 = 6
1 x 7 = 7
1 x 8 = 8
1 x 9 = 9
2 x 1 = 2
2 x 2 = 4
2 x 3 = 6
2 x 4 = 8
2 x 5 = 10
2 x 6 = 12
2 x 7 = 14
2 x 8 = 16
2 x 9 = 18
Print an empty line between iterations of your for i in range(1, 10): loop, this will separate the tables by the number you're printing the multiplication of
for i in range(1, 4):
for j in range(1, 4):
k = i * j
print(i,"x",j, "=", k)
print()
>> 1 x 1 = 1
>> 1 x 2 = 2
>> 1 x 3 = 3
>> 2 x 1 = 2
>> 2 x 2 = 4
>> 2 x 3 = 6
>> 3 x 1 = 3
>> 3 x 2 = 6
>> 3 x 3 = 9
for i in range(1, 10):
for j in range(1, 10):
k = i * j
print(i,"x",j, "=", k, end='\n')