I want to know how could I create a dataframe based on two list. I have the following lists:
List_time = [1,2,3]
List_item = [a,b,c]
For every item in list_item, I want another column that agregates every time in list_time:
df = [1 a
1 b
1 c
2 a
2 b
2 c
3 a
3 b
3 c]
Sorry if it's a very basic question, I'm exhausted right now. Thanks
Use itertools.product
from itertools import product
df = pd.DataFrame(product(List_time, List_item))
Try this;
List_time = [1,2,3]
List_item = ["a","b","c"]
n = 3 # times need to repeat
import pandas as pd
df = pd.DataFrame({"List_time":[i for i in List_time for _ in range(n)],
"List_item":List_item*n})
#output of df;
List_time List_item
0 1 a
1 1 b
2 1 c
3 2 a
4 2 b
5 2 c
6 3 a
7 3 b
8 3 c
I like to use itertools product function for just this purpose. It will combine lists as cross products and Pandas will ingest this nicely.
import itertools
import pandas as pd
a = [1, 2, 3]
b = ['a', 'b', 'c']
df = pd.DataFrame(data=itertools.product(a, b))
Output:
0 1
0 1 a
1 1 b
2 1 c
3 2 a
4 2 b
5 2 c
6 3 a
7 3 b
8 3 c
Edit: I misread the question, my mistake
Related
I have a dict in python like this:
d = {"a": [1,2,3], "b": [4,5,6]}
I want to transform in a dataframe like this:
letter
number
a
1
a
2
a
3
b
4
b
5
b
6
i have tried this code:
df = pd.DataFrame.from_dict(vulnerabilidade, orient = 'index').T
but this gave me:
a
1
2
3
b
4
5
6
You can always read your data in as you already have and then .melt it:
When passed no id_vars or value_vars, melt turns each of your columns into their own rows.
import pandas as pd
d = {"a": [1,2,3], "b": [4,5,6]}
out = pd.DataFrame(d).melt(var_name='letter', value_name='value')
print(out)
letter value
0 a 1
1 a 2
2 a 3
3 b 4
4 b 5
5 b 6
To use 'letter' and 'number' as column labels you could use:
a2 = [[key, val] for key, x in d.items() for val in x]
dict2 = pd.DataFrame(a2, columns = ['letter', 'number'])
which gives
letter number
0 a 1
1 a 2
2 a 3
3 b 4
4 b 5
5 b 6
Yet another possible solution:
(pd.Series(d, index=d.keys(), name='numbers')
.rename_axis('letters').reset_index()
.explode('numbers', ignore_index=True))
Output:
letters numbers
0 a 1
1 a 2
2 a 3
3 b 4
4 b 5
5 b 6
This will yield what you want (there might be a simpler way though):
import pandas as pd
my_dict = {"a": [1,2,3], "b": [4,5,6]}
my_list = [[key, val] for key in my_dict for val in my_dict[key] ]
df = pd.DataFrame(my_list, columns=['letter','number'])
df
# Out[106]:
# letter number
# 0 a 1
# 1 a 2
# 2 a 3
# 3 b 4
# 4 b 5
# 5 b 6
I have a list and I want to interleave the elements in all combinations then distribute them to two columns of a dataframe in pandas, like:
df = pd.DataFrame(columns = ["pair1","pair2"])
mylist = ["a", "b", "c"]
for i in mylist:
for j in mylist:
df.loc[df.shape[0]] = [i, j]
to output
pair1 pair2
0 a a
1 a b
2 a c
3 b a
4 b b
5 b c
6 c a
7 c b
8 c c
However, such an assignment is slow.
Do we have a faster method?
For a pandas solution, you could use pd.MultiIndex:
df[['pair1','pair2']] = pd.MultiIndex.from_product([mylist]*2).tolist()
or you could also cross-merge (if you have pandas>=1.2.0):
df = pd.merge(pd.Series(mylist, name='pair1'), pd.Series(mylist, name='pair2'), how='cross')
Output:
pair1 pair2
0 a a
1 a b
2 a c
3 b a
4 b b
5 b c
6 c a
7 c b
8 c c
You can use itertools.product() to generate the data ahead of time, rather than repeatedly appending to the end of the dataframe:
import pandas as pd
from itertools import product
mylist = ["a", "b", "c"]
df = pd.DataFrame(product(mylist, repeat=2), columns = ["pair1","pair2"])
print(df)
This outputs:
pair1 pair2
0 a a
1 a b
2 a c
3 b a
4 b b
5 b c
6 c a
7 c b
8 c c
One fast method is expand_grid from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor as jn
others = {'pair1': mylist, 'pair2': mylist}
jn.expand_grid(others = others).droplevel(axis = 1, level = 1)
pair1 pair2
0 a a
1 a b
2 a c
3 b a
4 b b
5 b c
6 c a
7 c b
8 c c
I'm very open to changing the title of the question if there's a clearer way to ask this.
I want to convert several lists into repeated columns of a dataframe. Somehow, between itertools and np.tile, I wasn't able to get the behavior I wanted.
Input:
list_1 = [1, 2]
list_2 = [a, b]
list_3 = [A, B]
Output:
col1 col2 col3
1 a A
1 a B
1 b A
1 b B
2 a A
2 a B
2 b A
2 b B
itertools.product is I think what you're looking for:
>>> pd.DataFrame(itertools.product(list_1, list_2, list_3))
0 1 2
0 1 a A
1 1 a B
2 1 b A
3 1 b B
4 2 a A
5 2 a B
6 2 b A
7 2 b B
Not sure how efficient this would be with very large lists, but it is a possible approach to your problem.
list_1 = [1, 2]
list_2 = ['a', 'b']
list_3 = ['A', 'B']
indices = []
values = []
for i in list_1:
for m in list_2:
for n in list_3:
indices.append(i)
values.append([m,n])
df = pd.DataFrame(data=values, index=indices)
print(df)
Output:
0 1
1 a A
1 a B
1 b A
1 b B
2 a A
2 a B
2 b A
2 b B
I have a dictionary like the below
d = {'a':'1,2,3','b':'3,4,5,6'}
I want to create dataframes from it in a loop, such as
a = 1,2,3
b = 3,4,5,6
Creating a single dataframe that can reference dictionary keys such as df['a'] does not work for what I am trying to achieve. Any suggestions?
Try this to get a list of dataframes:
>>> import pandas as pd
>>> import numpy as np
>>> dfs = [pd.DataFrame(np.array(b.split(',')), columns=list(a)) for a,b in d.items()]
gives the following output
>>> dfs[0]
a
0 1
1 2
2 3
>>> dfs[1]
b
0 3
1 4
2 5
3 6
To convert your dictionary into a list of DataFrames, run:
lst = [ pd.Series(v.split(','), name=k).to_frame()
for k, v in d.items() ]
Then, for your sample data, lst[0] contains:
a
0 1
1 2
2 3
and lst[1]:
b
0 3
1 4
2 5
3 6
Hope this helps:
dfs=[]
for key, value in d.items():
df = pd.DataFrame.from_dict((list(filter(None, value))))
dfs.append(df)
What I want to achieve is the following in Pandas:
a = [1,2,3,4]
b = ['a', 'b']
Can I create a DataFrame like:
column1 column2
'a' 1
'a' 2
'a' 3
'a' 4
'b' 1
'b' 2
'b' 3
'b' 4
Use itertools.product with DataFrame constructor:
a = [1, 2, 3, 4]
b = ['a', 'b']
from itertools import product
# pandas 0.24.0+
df = pd.DataFrame(product(b, a), columns=['column1', 'column2'])
# pandas below
# df = pd.DataFrame(list(product(b, a)), columns=['column1', 'column2'])
print (df)
column1 column2
0 a 1
1 a 2
2 a 3
3 a 4
4 b 1
5 b 2
6 b 3
7 b 4
I will put here another method, just in case someone prefers it.
full mockup below:
import pandas as pd
a = [1,2,3,4]
b = ['a', 'b']
df=pd.DataFrame([(y, x) for x in a for y in b], columns=['column1','column2'])
df
result below:
column1 column2
0 a 1
1 b 1
2 a 2
3 b 2
4 a 3
5 b 3
6 a 4
7 b 4