How to plot one column in different graphs? - python

I have the following problem. I have this kind of a dataframe:
f = pd.DataFrame([['Meyer', 2], ['Mueller', 4], ['Radisch', math.nan], ['Meyer', 2],['Pavlenko', math.nan]])
is there an elegant way to split the DataFrame up in several dataframes by the first column? So, I would like to get a dataframe where first column = 'Müller' and another one for first column = Radisch.
Thanks in advance,
Erik

You can loop by unique values of column A with boolean indexing:
df = pd.DataFrame([['Meyer', 2], ['Mueller', 4],
['Radisch', np.nan], ['Meyer', 2],
['Pavlenko', np.nan]])
df.columns = list("AB")
print (df)
A B
0 Meyer 2.0
1 Mueller 4.0
2 Radisch NaN
3 Meyer 2.0
4 Pavlenko NaN
print (df.A.unique())
['Meyer' 'Mueller' 'Radisch' 'Pavlenko']
for x in df.A.unique():
print(df[df.A == x])
A B
0 Meyer 2.0
3 Meyer 2.0
A B
1 Mueller 4.0
A B
2 Radisch NaN
A B
4 Pavlenko NaN
Then use dict comprehension - get dictionary of DataFrames:
dfs = {x:df[df.A == x].reset_index(drop=True) for x in df.A.unique()}
print (dfs)
{'Meyer': A B
0 Meyer 2.0
1 Meyer 2.0, 'Radisch': A B
0 Radisch NaN, 'Mueller': A B
0 Mueller 4.0, 'Pavlenko': A B
0 Pavlenko NaN}
print (dfs.keys())
dict_keys(['Meyer', 'Radisch', 'Mueller', 'Pavlenko'])
print (dfs['Meyer'])
A B
0 Meyer 2.0
1 Meyer 2.0
print (dfs['Pavlenko'])
A B
0 Pavlenko NaN

Related

how to drop NAs only if all elements are NAs in a groupby in pandas

I have a dataframe that looks like this
import pandas as pd
import numpy as np
fff = pd.DataFrame({'group': ['a','a','a','b','b','b','b','c','c'], 'value': [1,2, np.nan, 1,2,3,4, np.nan, np.nan]})
I would like to drop the NAs by group only if all values are Nas inside the group. How could i do that ?
Expected output:
fff = pd.DataFrame({'group': ['a','a','a','b','b','b','b'], 'value': [1,2, np.nan, 1,2,3,4]})
You can check value for nan and use groupby().any():
fff = fff[(~fff['value'].isna()).groupby(fff['group']).transform('any')]
Output:
group value
0 a 1.0
1 a 2.0
2 a NaN
3 b 1.0
4 b 2.0
5 b 3.0
6 b 4.0
create a boolean series with isna() and then group on fff['group'], and transform with all , then filter out(exclude) values which return True
c = fff['value'].isna()
fff[~c.groupby(fff['group']).transform('all')]
group value
0 a 1.0
1 a 2.0
2 a NaN
3 b 1.0
4 b 2.0
5 b 3.0
6 b 4.0
Another option:
fff["cases"] = fff.groupby("group").cumcount()
fff["null"] = fff["value"].isnull()
fff["cases 2"] = fff.groupby(["group","null"]).cumcount()
fff[~((fff["value"].isnull()) & (fff["cases"] == fff["cases 2"]))][["group","value"]]
Output:
group value
0 a 1.0
1 a 2.0
2 a NaN
3 b 1.0
4 b 2.0
5 b 3.0
6 b 4.0
An addition to the answers already provided : Keep only groups where all the values are True, and filter the fff dataframe with the result variable.
result = fff.groupby("group").value.all().index.tolist()
fff.query("group == #result")

Can't create jagged dataframe in pandas?

I have a simple dataframe with 2 columns and 2rows.
I also have a list of 4 numbers.
I want to concatenate this list to the FIRST column of the dataframe, and only the first. So the dataframe will have 6rows in the first column, and 2in the second.
I wrote this code:
df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
numbers = [5, 6, 7, 8]
for i in range(0, 4):
df1['A'].loc[i + 2] = numbers[i]
print(df1)
It prints the original dataframe oddly enough. But when I debug and evaluate the expression df1['A'] then it does show the new numbers. What's going on here?
It's not just that it's printing the original df, it also writes the original df to csv when I use to_csv method.
It seems you need:
for i in range(0, 4):
df1.loc[0, i] = numbers[i]
print (df1)
A B 0 1 2 3
0 1 2 5.0 6.0 7.0 8.0
1 3 4 NaN NaN NaN NaN
df1 = pd.concat([df1, pd.DataFrame([numbers], index=[0])], axis=1)
print (df1)
A B 0 1 2 3
0 1 2 5.0 6.0 7.0 8.0
1 3 4 NaN NaN NaN NaN

Make sure Column B = a certain value when Column A is Null - Python

I want to make sure that when Column A is NULL (in csv), or NaN (in dataframe), Column B is "Cash".
I've tried this:
check = df[df['A'].isnull()]['B']
check = check.to_string(index=False)
if "Cash" not in check:
print "Column A Fail"
else:
print "Column A Pass!"
But it is not working.
any suggestions?
I also need to make sure that it doesn't treat '0' as NaN
UPDATE:
my goal is not to assign 'Cash', but rather to make sure that it's
already there as a quality check
In [40]: df
Out[40]:
A B
0 NaN a
1 1.0 b
2 2.0 c
3 NaN Cash
In [41]: df.query("A != A and B != 'Cash'")
Out[41]:
A B
0 NaN a
or using boolean indexing:
In [42]: df.loc[df.A.isnull() & (df.B != 'Cash')]
Out[42]:
A B
0 NaN a
OLD answer:
Alternative solution:
In [23]: df.B = np.where(df.A.isnull(), 'Cash', df.B)
In [24]: df
Out[24]:
A B
0 NaN Cash
1 1.0 b
2 2.0 c
3 NaN Cash
another solution:
In [31]: df = df.mask(df.A.isnull(), df.assign(B='Cash'))
In [32]: df
Out[32]:
A B
0 NaN Cash
1 1.0 b
2 2.0 c
3 NaN Cash
Use loc to assign where A is null.
df.loc[df['A'].isnull(), 'B'] = 'Cash'
example
df = pd.DataFrame(dict(
A=[np.nan, 1, 2, np.nan],
B=['a', 'b', 'c', 'd']
))
print(df)
A B
0 NaN a
1 1.0 b
2 2.0 c
3 NaN d
Then do
df.loc[df['A'].isnull(), 'B'] = 'Cash'
print(df)
A B
0 NaN Cash
1 1.0 b
2 2.0 c
3 NaN Cash
check if all B are 'Cash' where A is null*
(df.loc[df.A.isnull(), 'B'] == 'Cash').all()
According to logic rules, P=>Q is (not P) or Q. So
(~df.A.isnull()|(df.B=="Cash")).all()
check all the lines.

How to do conditional statements in pandas/python with null values

How do I do condition replacements in pandas?
df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
In R - think this code is very easy to understand:
library(dplyr)
df = df %>%
mutate( # mutate means create new column for non-r people
my_new_column = ifelse( is.na(the_2nd_column)==TRUE & is.na(the_3rd_column)==TRUE, ' abc', 'cuz')
how do I do this in pandas - probably dumb question with the syntax, but I have heard np.where is the equivalent of if else in R...
df['new_column'] = np.where(np.nan(....help here with a conditional....))
np.where like this
df['new_column'] = np.where(df[1].isnull() & df[2].isnull(), 'abc', 'cuz')
print(df)
or faster with more numpy
df['new_column'] = \
np.where(np.isnan(df[1].values) & np.isnan(df[2].values), 'abc', 'cuz')
0 1 2 new_column
0 1.0 2.0 3.0 cuz
1 4.0 NaN NaN abc
2 NaN NaN 9.0 cuz
timing
Using np.where
In [279]: df['new'] = np.where(df[[1, 2]].isnull().all(axis=1), 'abc', 'cuz')
In [280]: df
Out[280]:
0 1 2 new
0 1.0 2.0 3.0 cuz
1 4.0 NaN NaN abc
2 NaN NaN 9.0 cuz

Pandas: Merge two dataframe columns

Consider two dataframes:
df_a = pd.DataFrame([
['a', 1],
['b', 2],
['c', NaN],
], columns=['name', 'value'])
df_b = pd.DataFrame([
['a', 1],
['b', NaN],
['c', 3],
['d', 4]
], columns=['name', 'value'])
So looking like
# df_a
name value
0 a 1
1 b 2
2 c NaN
# df_b
name value
0 a 1
1 b NaN
2 c 3
3 d 4
I want to merge these two dataframes and fill in the NaN values of the value column with the existing values in the other column. In other words, I want out:
# DESIRED RESULT
name value
0 a 1
1 b 2
2 c 3
3 d 4
Sure, I can do this with a custom .map or .apply, but I want a solution that uses merge or the like, not writing a custom merge function. How can this be done?
I think you can use combine_first:
print (df_b.combine_first(df_a))
name value
0 a 1.0
1 b 2.0
2 c 3.0
3 d 4.0
Or fillna:
print (df_b.fillna(df_a))
name value
0 a 1.0
1 b 2.0
2 c 3.0
3 d 4.0
Solution with update is not so common as combine_first:
df_b.update(df_a)
print (df_b)
name value
0 a 1.0
1 b 2.0
2 c 3.0
3 d 4.0

Categories

Resources