Create expanded/permuted dataframe from several lists - python

I'm very open to changing the title of the question if there's a clearer way to ask this.
I want to convert several lists into repeated columns of a dataframe. Somehow, between itertools and np.tile, I wasn't able to get the behavior I wanted.
Input:
list_1 = [1, 2]
list_2 = [a, b]
list_3 = [A, B]
Output:
col1 col2 col3
1 a A
1 a B
1 b A
1 b B
2 a A
2 a B
2 b A
2 b B

itertools.product is I think what you're looking for:
>>> pd.DataFrame(itertools.product(list_1, list_2, list_3))
0 1 2
0 1 a A
1 1 a B
2 1 b A
3 1 b B
4 2 a A
5 2 a B
6 2 b A
7 2 b B

Not sure how efficient this would be with very large lists, but it is a possible approach to your problem.
list_1 = [1, 2]
list_2 = ['a', 'b']
list_3 = ['A', 'B']
indices = []
values = []
for i in list_1:
for m in list_2:
for n in list_3:
indices.append(i)
values.append([m,n])
df = pd.DataFrame(data=values, index=indices)
print(df)
Output:
0 1
1 a A
1 a B
1 b A
1 b B
2 a A
2 a B
2 b A
2 b B

Related

How can I create a dataframe based on two lists?

I want to know how could I create a dataframe based on two list. I have the following lists:
List_time = [1,2,3]
List_item = [a,b,c]
For every item in list_item, I want another column that agregates every time in list_time:
df = [1 a
1 b
1 c
2 a
2 b
2 c
3 a
3 b
3 c]
Sorry if it's a very basic question, I'm exhausted right now. Thanks
Use itertools.product
from itertools import product
df = pd.DataFrame(product(List_time, List_item))
Try this;
List_time = [1,2,3]
List_item = ["a","b","c"]
n = 3 # times need to repeat
import pandas as pd
df = pd.DataFrame({"List_time":[i for i in List_time for _ in range(n)],
"List_item":List_item*n})
#output of df;
List_time List_item
0 1 a
1 1 b
2 1 c
3 2 a
4 2 b
5 2 c
6 3 a
7 3 b
8 3 c
I like to use itertools product function for just this purpose. It will combine lists as cross products and Pandas will ingest this nicely.
import itertools
import pandas as pd
a = [1, 2, 3]
b = ['a', 'b', 'c']
df = pd.DataFrame(data=itertools.product(a, b))
Output:
0 1
0 1 a
1 1 b
2 1 c
3 2 a
4 2 b
5 2 c
6 3 a
7 3 b
8 3 c
Edit: I misread the question, my mistake

Join an array to every row in the pandas dataframe

I have a data frame and an array as follows:
df = pd.DataFrame({'x': range(0,5), 'y' : range(1,6)})
s = np.array(['a', 'b', 'c'])
I would like to attach the array to every row of the data frame, such that I got a data frame as follows:
What would be the most efficient way to do this?
Just plain assignment:
# replace the first `s` with your desired column names
df[s] = [s]*len(df)
Try this:
for i in s:
df[i] = i
Output:
x y a b c
0 0 1 a b c
1 1 2 a b c
2 2 3 a b c
3 3 4 a b c
4 4 5 a b c
You could use pandas.concat:
pd.concat([df, pd.DataFrame(s).T], axis=1).ffill()
output:
x y 0 1 2
0 0 1 a b c
1 1 2 a b c
2 2 3 a b c
3 3 4 a b c
4 4 5 a b c
You can try using df.loc here.
df.loc[:, s] = s
print(df)
x y a b c
0 0 1 a b c
1 1 2 a b c
2 2 3 a b c
3 3 4 a b c
4 4 5 a b c

How to populate categories in one column and paste the exact value in other column

It has been a long time that I dealt with pandas library. I searched for it but could not come up with an efficient way, which might be a function existed in the library.
Let's say I have the dataframe below:
df1 = pd.DataFrame({'V1':['A','A','B'],
'V2':['B','C','C'],
'Value':[4, 1, 5]})
df1
And I would like to extend this dataset and populate all the combinations of categories and put its corresponding value as exactly the same.
df2 = pd.DataFrame({'V1':['A','B','A', 'C', 'B', 'C'],
'V2':['B','A','C','A','C','B'],
'Value':[4, 4 , 1, 1, 5, 5]})
df2
In other words, in df1, A and B has Value of 4 and I also want to have a row of that B and A has Value of 4 in the second dataframe. It is very similar to melting. I also do not want to use a for loop. I am looking for a more efficient way.
Use:
df = pd.concat([df1, df1.rename(columns={'V2':'V1', 'V1':'V2'})]).sort_index().reset_index(drop=True)
Output:
V1 V2 Value
0 A B 4
1 B A 4
2 A C 1
3 C A 1
4 B C 5
5 C B 5
Or np.vstack:
>>> pd.DataFrame(np.vstack((df1.to_numpy(), df1.iloc[:, np.r_[1:-1:-1, -1]].to_numpy())), columns=df1.columns)
V1 V2 Value
0 A B 4
1 A C 1
2 B C 5
3 B A 4
4 C A 1
5 C B 5
>>>
For correct order:
>>> pd.DataFrame(np.vstack((df1.to_numpy(), df1.iloc[:, np.r_[1:-1:-1, -1]].to_numpy())), columns=df1.columns, index=[*df1.index, *df1.index]).sort_index()
V1 V2 Value
0 A B 4
0 B A 4
1 A C 1
1 C A 1
2 B C 5
2 C B 5
>>>
And index reset:
>>> pd.DataFrame(np.vstack((df1.to_numpy(), df1.iloc[:, np.r_[1:-1:-1, -1]].to_numpy())), columns=df1.columns, index=[*df1.index, *df1.index]).sort_index().reset_index(drop=True)
V1 V2 Value
0 A B 4
1 B A 4
2 A C 1
3 C A 1
4 B C 5
5 C B 5
>>>
You can use methods assign and append:
df1.append(df1.assign(V1=df1.V2, V2=df1.V1), ignore_index=True)
Output:
V1 V2 Value
0 A B 4
1 A C 1
2 B C 5
3 B A 4
4 C A 1
5 C B 5

Create DataFrame from multiple lists?

I have two lists
list1=['a','b','c']
list2=[1,2]
I want my dataframe output to look like:
col1 col2
a 1
a 2
b 1
b 2
c 1
c 2
How can this be done?
Use itertools.product:
import itertools
list1 = ['a','b','c']
list2 = [1,2]
df = pd.DataFrame(itertools.product(list1, list2), columns=['col1', 'col2'])
print(df)
Output:
col1 col2
0 a 1
1 a 2
2 b 1
3 b 2
4 c 1
5 c 2
If you don't want to explicitly import itertools, pd.MultiIndex has a from_product method that you might piggyback on:
list1 = ['a','b','c']
list2 = [1, 2]
pd.DataFrame(pd.MultiIndex.from_product((list1, list2)).to_list(), columns=['col1', 'col2'])
col1 col2
0 a 1
1 a 2
2 b 1
3 b 2
4 c 1
5 c 2

Append values corresponding to a unique label as a list of values in pandas dataframe

I have got a dataframe:
df = A B
0 a 1
1 b 2
2 a 3
3 d 4
I want to update it like:
df = A B
0 a [1, 3]
1 b [2]
2 d [4]
You can groupby column A and convert the grouped elements in B to lists with apply:
df.groupby('A').B.apply(list).reset_index()
A B
0 a [1, 3]
1 b [2]
2 d [4]

Categories

Resources