I want to use a value from a specific column in my Pandas dataframe as the Y-axis label. The reason for this is that the label could change depending on the Unit of Measure (UoM) - it could be kg, number of bags etc.
#create function using plant and material input to chart planned and actual manufactured quantities
def filter_df(df, plant: str = "", material: str = ""):
output_df = df.loc[(df['Plant'] == plant) & (df['Material'].str.contains(material))].reset_index()
return output_df['Planned_Qty_Cumsum'].plot.area (label = 'Planned Quantity'),\
output_df['Goods_Receipted_Qty_Cumsum'].plot.line(label = 'Delivered Quantity'),\
plt.title('Planned and Deliverd Quanties'),\
plt.legend(),\
plt.xlabel('Number of Process Orders'),\
plt.ylabel(output_df['UoM (of GR)']),\
plt.show()
#run function
filter_df(df_yield_data_formatted,'*plant*','*material*')
When running the function I get the following error message:
ValueError: The truth value of a Series is ambiguous. Use a.empty,
a.bool(), a.item(), a.any() or a.all().
Yes you can, but the way you are doing you are saying all the values of the Dataframe in that column and you should indicate what row and column you want for the label, use iloc for instace and it will work.
plt.ylabel(df.iloc[2,1])
Related
I am using conditional multiplication within data frame and using following syntax:
if(df_merged1["region_id"]=="EMEA"):
df_merged1["fcst_gr"] = df_merged1["plan_price_amount"]*(df_merged1["Enr"]-df_merged1["FM_f"])+df_merged1["OA_f"]-df_merged1["TX_f"]
else:
df_merged1["fcst_gr"] = df_merged1["plan_price_amount"]*(df_merged1["Enr"]-df_merged1["FM_f"])+df_merged1["OA_f"]
i want tax to be substracted only when region is EMEA. but getting following error
ValueError: The truth value of a {type(self).__name__} is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I think there is some problem in proving the if condition but how to resolve it not getting any idea
There is no problem here - df_merged1["region_id"] == "EMEA" returns a pd.Series instance populated with boolean values, not a boolean that can be handled using conditional statements. Pandas is reluctant to automatically run a method that would convert a pd.Series instance to a boolean like pd.Series.any() or pd.Series.all(), hence the error.
To achieve what you have meant to do for reasonably sized dataframes, use pd.DataFrame.apply, axis=1 with a lambda expression and a ternary operator. That way you populate a column ["fcst_gr"] based on value in column ["region_id"] for each individual row:
df_merged1["fcst_gr"] = df_merged1.apply(
lambda row: row["plan_price_amount"] * (row["Enr"] - row["FM_f"])
+ row["OA_f"]
- row["TX_f"]
if row["region_id"] == "EMEA"
else row["plan_price_amount"] * (row["Enr"] - row["FM_f"]) + row["OA_f"],
axis=1,
)
For bigger dataframes or more complex scenarios, consider more efficient solutions.
I'm trying to compute a new column in a pandas dataframe, based upon others columns, and a function I created. Instead of using a for loop, I prefer to apply the function with entires dataframe columns.
My code is like this :
df['po'] = vect.func1(df['gra'],
Se,
df['p_a'],
df['t'],
Tc)
where df['gra'], df['p_a'], and df['t'] are my dataframe columns (parameters), and Se and Tc are others (real) parameters. df['po'] is my new column.
func1 is a function described in my vect package.
This function is :
def func1(g, surf_e, Pa, t, Tco):
if (t <= Tco):
pos = (g-(Pa*surf_e*g))
else:
pos = 0.0
return(pos)
When implemented this way, I obtain an error message, which concern the line : if (t <= Tco):
The error is :
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I read the pandas documentation, but didn't find the solution. Can anybody explain me what is the problem ?
I tried to use apply :
for example :
df['po'] = df['gra'].apply(vect.func1)
but I don't know how to use apply with multiples columns as parameters.
Thank you by advance.
Use np.where with the required condition, value when the condition is True and the default value.
df['po'] = np.where(
df['t'] <= Tc, # Condition
df['gra'] - (df['P_a'] * Se * df['gra']), # Value if True
0 # Value if False
)
EDIT:
Don't forget to import numpy as np
Also, you get an error because you are comparing a series to a series
and hence obtain a series of boolean values and not an atomic boolean
value which if condition needs.
I want to have a simple function that would categorize numeric values from existing column into a new column. For some reason when doing it with a function that has multiple arguments "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()." is generated...
DataFrame:
l1=[1,2,3,4]
df_=pd.DataFrame(l1, columns=["Nums"])
Code that generate Error:
n1=2
n2=4
def func(x,y,z):
if (x>=y) & (x<=z):
return('good')
else:
return('bad')
df_['Nums_Cat']=func(df_.Nums, n1, n2)
Please note, that I'm trying to do this with a function approach as it will be applied to multiple columns with many different conditions passed.
In this case I'm trying to convert those numeric values that fall under this condition into string "good" and those that dont (else) into string "bad" So, that output should be 'bad, good, good, good' in a new column called Num_Cat.
Your nearly there. However Python's functions don't work the way you want. To do what you want you need to map each value from the result into either "good" or "bad".
def func(x, y, z):
values = (y <= x) & (x <= z)
return values.map(lambda item: "good" if item else "bad")
I have list of Data from a CSV file like this :
I wish to find a list of all members whose values lie within an interval. For ex. From the attached Dataset, to find list of all warriors whose powerlevels lie between 675000 and 750000.
In the following code I enter, the operators 'and', 'or', '&','|' are not working and are returning a ValueError.
strong = df[['name', 'attack', 'defense', 'HP','armour','powerlevel']][df.powerlevel > 675000 & df.powerlevel < 750000]
print(strong)
I get the following error-
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
How can I get by this issue, without creating a different dataframe each time?
You can use loc
strong = df.loc[(df.powerlevel > 675000) & (df.powerlevel < 750000)]
strong = strong[['name', 'attack', 'defense', 'HP','armour','powerlevel']]
how would you combine selected rows of a dataframe with builtin functions in correct syntax?
The key equation (that has error) is marked with '***' below. There are three aspects of this equation:
(1) Operation is only on selected rows [lo:hi] and column [ColumnName] of a dataframe
(2) Go through NaN entries in this selection and set each of them to a random number as defined by (3)
(3) The random number is defined by library function np.random.randint with
(a) range of values between (avg+std) and (avg-std) with a total of size=null_total[ColumnName] entries to be generated.
(b) The random number is then divided by avg to normalize the value.
avg and std are the mean and standard deviation of all of selected row values under [ColumnName] as computed by built in dataframe functions .mean and .std , respectively. avg,std and null_total are declared to be Dataframe type although they could be just series.
def process_Fill_and_Normalize(df,lo,hi,ColumnName):
avg = pd.DataFrame()
std = pd.DataFrame()
null_total = pd.DataFrame()
avg[ColumnName] = df[ColumnName][lo:hi].mean()
std[ColumnName] = df[ColumnName][lo:hi].std()
null_total[ColumnName] = df[ColumnName][lo:hi].isnull().sum()
***df[ColumnName][lo:hi][np.isnan(combined[ColumnName][lo:hi])] =
np.random.randint(avg[ColumnName] - std[ColumnName], avg[ColumnName] +
std[ColumnName], size=null_total[ColumnName])/avg[ColumnName]
return df
Error message is as follows:
ValueError: The truth value of a Series is ambiguous. Use a.empty,
a.bool(), a.item(), a.any() or a.all().
/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py in
__nonzero__(self)
915 raise ValueError("The truth value of a {0} is ambiguous. "
916 "Use a.empty, a.bool(), a.item(), a.any() or
a.all()."
--> 917 .format(self.__class__.__name__))
918
919 __bool__ = __nonzero__
You advice on how to modify the syntax would be very much appreciated.
Many thanks to reply from hpaulj that advises to break up the long equation. The index expression is defined in a separate equation. The following modified code works:
def process_Fill_and_Normalize(df,lo,hi,ColumnName):
avg = pd.DataFrame()
std = pd.DataFrame()
avg[ColumnName] = df[ColumnName][lo:hi].mean()
std[ColumnName] = df[ColumnName][lo:hi].std()
null_total[ColumnName] = df[ColumnName][lo:hi].isnull().sum()
mull_entry_index = np.isnan(combined[ColumnName][lo:hi])
df[ColumnName][lo:hi][mull_entry_index] =
np.random.randint(avg[ColumnName]
- std[ColumnName], avg[ColumnName] + std[ColumnName],
size=null_total[ColumnName])/avg[ColumnName]
return df