Create expanded/permuted dataframe from several lists

Create expanded/permuted dataframe from several lists - python

I'm very open to changing the title of the question if there's a clearer way to ask this.
I want to convert several lists into repeated columns of a dataframe. Somehow, between itertools and np.tile, I wasn't able to get the behavior I wanted.
Input:
list_1 = [1, 2]
list_2 = [a, b]
list_3 = [A, B]
Output:
col1 col2 col3
1 a A
1 a B
1 b A
1 b B
2 a A
2 a B
2 b A
2 b B

itertools.product is I think what you're looking for:
>>> pd.DataFrame(itertools.product(list_1, list_2, list_3))
0 1 2
0 1 a A
1 1 a B
2 1 b A
3 1 b B
4 2 a A
5 2 a B
6 2 b A
7 2 b B

Not sure how efficient this would be with very large lists, but it is a possible approach to your problem.
list_1 = [1, 2]
list_2 = ['a', 'b']
list_3 = ['A', 'B']
indices = []
values = []
for i in list_1:
for m in list_2:
for n in list_3:
indices.append(i)
values.append([m,n])
df = pd.DataFrame(data=values, index=indices)
print(df)
Output:
0 1
1 a A
1 a B
1 b A
1 b B
2 a A
2 a B
2 b A
2 b B

Related

How can I create a dataframe based on two lists?

I want to know how could I create a dataframe based on two list. I have the following lists:
List_time = [1,2,3]
List_item = [a,b,c]
For every item in list_item, I want another column that agregates every time in list_time:
df = [1 a
1 b
1 c
2 a
2 b
2 c
3 a
3 b
3 c]
Sorry if it's a very basic question, I'm exhausted right now. Thanks

Use itertools.product
from itertools import product
df = pd.DataFrame(product(List_time, List_item))

Try this;
List_time = [1,2,3]
List_item = ["a","b","c"]
n = 3 # times need to repeat
import pandas as pd
df = pd.DataFrame({"List_time":[i for i in List_time for _ in range(n)],
"List_item":List_item*n})
#output of df;
List_time List_item
0 1 a
1 1 b
2 1 c
3 2 a
4 2 b
5 2 c
6 3 a
7 3 b
8 3 c

I like to use itertools product function for just this purpose. It will combine lists as cross products and Pandas will ingest this nicely.
import itertools
import pandas as pd
a = [1, 2, 3]
b = ['a', 'b', 'c']
df = pd.DataFrame(data=itertools.product(a, b))
Output:
0 1
0 1 a
1 1 b
2 1 c
3 2 a
4 2 b
5 2 c
6 3 a
7 3 b
8 3 c
Edit: I misread the question, my mistake

Join an array to every row in the pandas dataframe

I have a data frame and an array as follows:
df = pd.DataFrame({'x': range(0,5), 'y' : range(1,6)})
s = np.array(['a', 'b', 'c'])
I would like to attach the array to every row of the data frame, such that I got a data frame as follows:
What would be the most efficient way to do this?

Just plain assignment:
# replace the first `s` with your desired column names
df[s] = [s]*len(df)

Try this:
for i in s:
df[i] = i
Output:
x y a b c
0 0 1 a b c
1 1 2 a b c
2 2 3 a b c
3 3 4 a b c
4 4 5 a b c

You could use pandas.concat:
pd.concat([df, pd.DataFrame(s).T], axis=1).ffill()
output:
x y 0 1 2
0 0 1 a b c
1 1 2 a b c
2 2 3 a b c
3 3 4 a b c
4 4 5 a b c

You can try using df.loc here.
df.loc[:, s] = s
print(df)
x y a b c
0 0 1 a b c
1 1 2 a b c
2 2 3 a b c
3 3 4 a b c
4 4 5 a b c

How to populate categories in one column and paste the exact value in other column

It has been a long time that I dealt with pandas library. I searched for it but could not come up with an efficient way, which might be a function existed in the library.
Let's say I have the dataframe below:
df1 = pd.DataFrame({'V1':['A','A','B'],
'V2':['B','C','C'],
'Value':[4, 1, 5]})
df1
And I would like to extend this dataset and populate all the combinations of categories and put its corresponding value as exactly the same.
df2 = pd.DataFrame({'V1':['A','B','A', 'C', 'B', 'C'],
'V2':['B','A','C','A','C','B'],
'Value':[4, 4 , 1, 1, 5, 5]})
df2
In other words, in df1, A and B has Value of 4 and I also want to have a row of that B and A has Value of 4 in the second dataframe. It is very similar to melting. I also do not want to use a for loop. I am looking for a more efficient way.

Use:
df = pd.concat([df1, df1.rename(columns={'V2':'V1', 'V1':'V2'})]).sort_index().reset_index(drop=True)
Output:
V1 V2 Value
0 A B 4
1 B A 4
2 A C 1
3 C A 1
4 B C 5
5 C B 5

Or np.vstack:
>>> pd.DataFrame(np.vstack((df1.to_numpy(), df1.iloc[:, np.r_[1:-1:-1, -1]].to_numpy())), columns=df1.columns)
V1 V2 Value
0 A B 4
1 A C 1
2 B C 5
3 B A 4
4 C A 1
5 C B 5
>>>
For correct order:
>>> pd.DataFrame(np.vstack((df1.to_numpy(), df1.iloc[:, np.r_[1:-1:-1, -1]].to_numpy())), columns=df1.columns, index=[*df1.index, *df1.index]).sort_index()
V1 V2 Value
0 A B 4
0 B A 4
1 A C 1
1 C A 1
2 B C 5
2 C B 5
>>>
And index reset:
>>> pd.DataFrame(np.vstack((df1.to_numpy(), df1.iloc[:, np.r_[1:-1:-1, -1]].to_numpy())), columns=df1.columns, index=[*df1.index, *df1.index]).sort_index().reset_index(drop=True)
V1 V2 Value
0 A B 4
1 B A 4
2 A C 1
3 C A 1
4 B C 5
5 C B 5
>>>

You can use methods assign and append:
df1.append(df1.assign(V1=df1.V2, V2=df1.V1), ignore_index=True)
Output:
V1 V2 Value
0 A B 4
1 A C 1
2 B C 5
3 B A 4
4 C A 1
5 C B 5

Create DataFrame from multiple lists?

I have two lists
list1=['a','b','c']
list2=[1,2]
I want my dataframe output to look like:
col1 col2
a 1
a 2
b 1
b 2
c 1
c 2
How can this be done?

Use itertools.product:
import itertools
list1 = ['a','b','c']
list2 = [1,2]
df = pd.DataFrame(itertools.product(list1, list2), columns=['col1', 'col2'])
print(df)
Output:
col1 col2
0 a 1
1 a 2
2 b 1
3 b 2
4 c 1
5 c 2

If you don't want to explicitly import itertools, pd.MultiIndex has a from_product method that you might piggyback on:
list1 = ['a','b','c']
list2 = [1, 2]
pd.DataFrame(pd.MultiIndex.from_product((list1, list2)).to_list(), columns=['col1', 'col2'])
col1 col2
0 a 1
1 a 2
2 b 1
3 b 2
4 c 1
5 c 2

Append values corresponding to a unique label as a list of values in pandas dataframe

I have got a dataframe:
df = A B
0 a 1
1 b 2
2 a 3
3 d 4
I want to update it like:
df = A B
0 a [1, 3]
1 b [2]
2 d [4]

You can groupby column A and convert the grouped elements in B to lists with apply:
df.groupby('A').B.apply(list).reset_index()
A B
0 a [1, 3]
1 b [2]
2 d [4]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create expanded/permuted dataframe from several lists - python

itertools.product is I think what you're looking for: >>> pd.DataFrame(itertools.product(list_1, list_2, list_3)) 0 1 2 0 1 a A 1 1 a B 2 1 b A 3 1 b B 4 2 a A 5 2 a B 6 2 b A 7 2 b B

Related

How can I create a dataframe based on two lists?

Join an array to every row in the pandas dataframe

How to populate categories in one column and paste the exact value in other column

Create DataFrame from multiple lists?

Append values corresponding to a unique label as a list of values in pandas dataframe

Categories

Resources