I am trying to convert a dictionary into a dataframe.
import pandas as pd
dict = {'A': [1,2,3], 'B': [1,2,3,4])
pd.DataFrame.from_dict(dict, orient = 'index').T
Expect:
A B
0 [1,2,3] [1,2,3,4]
But got instead:
A B
-----------
0 1 a
1 2 b
2 3 c
3 None d
Try to put the dictionary inside list ([]):
import pandas as pd
dct = {"A": [1, 2, 3], "B": [1, 2, 3, 4]}
df = pd.DataFrame([dct])
print(df)
Prints:
A B
0 [1, 2, 3] [1, 2, 3, 4]
Note: Don't use reserved words such as dict for variable names.
Related
With python Pandas, I'm trying to filter out the data that contains the specified value in the array, I try to use python in to filter value, but it's not working, I want to know if there is a way to achieve such a function without looping
import pandas as pd
df = pd.DataFrame({'A' : [1,2,3,4], 'B' : [[1, 2, 3], [2, 3], [3], [1, 2, 3]]})
df = 1 in df['custom_test_type']
A B
0 1 [1, 2, 3]
1 2 [2, 3]
2 3 [3]
3 4 [1, 2, 3]
I'm try to filter 1 in row B, so expected output will be:
A B
0 1 [1, 2, 3]
3 4 [1, 2, 3]
but the output always be True
due to my limited ability, Any help or explanation is welcome! Thank you.
You need to use a loop/list comprehension:
out = df[[1 in l for l in df['B']]]
A pandas version would be more verbose and less efficient:
out = df[df['B'].explode().eq(1).groupby(level=0).any()]
Output:
A B
0 1 [1, 2, 3]
3 4 [1, 2, 3]
e.g. I have two dataframes:
a = pd.DataFrame({'A':[1,2,3],'B':[6,5,4]})
b = pd.DataFrame({'A':[3,2,1],'B':[4,5,6]})
I want to get a dataframe c consisting of the larger value in each position of a & b:
c = max_function(a,b) = pd.DataFrame(max(a.iloc[i,j], b.iloc[i,j]))
c = pd.DataFrame({'A':[3,2,3],'B':[6,5,6]})
I don't want to generate c by comparing each value in a & b because the real dataframes in my work is very large.
So I wonder if there's a ready-made pandas function which can do this? Thanks!
You could use numpy.maximum:
import pandas as pd
import numpy as np
a = pd.DataFrame({'A': [1, 2, 3], 'B': [6, 5, 4]})
b = pd.DataFrame({'A': [3, 2, 1], 'B': [4, 5, 6]})
c = np.maximum(a, b)
print(c)
Output
A B
0 3 6
1 2 5
2 3 6
I have a huge Pandas data frame with the structure follows as an example below:
import pandas as pd
df = pd.DataFrame({'col1': ['A', 'A', 'B', 'C', 'C', 'C'], 'col2': [1, 2, 5, 2, 4, 6]})
df
col1 col2
0 A 1
1 A 2
2 B 5
3 C 2
4 C 4
5 C 6
The task is to build a dictionary with elements in col1 as keys and corresponding elements in col2 as values. For the example above the output should be:
A -> [1, 2]
B -> [5]
C -> [2, 4, 6]
Although I write a solution as
from collections import defaultdict
dd = defaultdict(set)
for row in df.itertuples():
dd[row.col1].append(row.col2)
I wonder if somebody is aware of a more "Python-native" solution, using in-build pandas functions.
Without apply we do it by for loop
{x : y.tolist() for x , y in df.col2.groupby(df.col1)}
{'A': [1, 2], 'B': [5], 'C': [2, 4, 6]}
Use GroupBy.apply with list for Series of lists and then Series.to_dict:
d = df.groupby('col1')['col2'].apply(list).to_dict()
print (d)
{'A': [1, 2], 'B': [5], 'C': [2, 4, 6]}
Following the Pandas explode() method documentation, one can explode a list in a row into multiple rows:
df = pd.DataFrame({'A': [[1, 2, 3], 'foo', None, [3, 4]], 'B': 1})
df.explode('A')
However, the DataFrame() I get from a database contains lists that are seen as strings :
df = pd.DataFrame({'A': ["[1, 2, 3]", 'foo', None, "[3, 4]"], 'B': 1})
df.explode('A')
# Does not fail, but does not explode "[1, 2, 3]"
Outside Pandas, I use ast.literal_eval() but don't know how to make it work for my column.
How to cast my 'A' column as list so that explode('A') works ?
You could try with DataFrame.apply:
def f(x):
try:
return ast.literal_eval(x)
except Exception:
return x
df['A']=df['A'].apply(f)
df.explode('A')
A B
0 1 1
0 2 1
0 3 1
1 foo 1
2 None 1
3 3 1
3 4 1
I have to duplicate rows that have a certain value in a column and replace the value with another value.
For instance, I have this data:
import pandas as pd
df = pd.DataFrame({'Date': [1, 2, 3, 4], 'B': [1, 2, 3, 2], 'C': ['A','B','C','D']})
Now, I want to duplicate the rows that have 2 in column 'B' then change 2 to 4
df = pd.DataFrame({'Date': [1, 2, 2, 3, 4, 4], 'B': [1, 2, 4, 3, 2, 4], 'C': ['A','B','B','C','D','D']})
Please help me on this one. Thank you.
You can use append, to append the rows where B == 2, which you can extract using loc, but also reassigning B to 4 using assign. If order matters, you can then order by C (to reproduce your desired frame):
>>> df.append(df[df.B.eq(2)].assign(B=4)).sort_values('C')
B C Date
0 1 A 1
1 2 B 2
1 4 B 2
2 3 C 3
3 2 D 4
3 4 D 4