Create Json File if Dataframe > Column > Criteria matches

Create Json File if Dataframe > Column > Criteria matches - python

I am struggling with the following:
Row1 Row2
A 10
B 10
C 10
D 11
F 12
I have a large data and want to create a json file if its meets Row2. (It's an Object dtype)
if df['Row2'] == '10':
df.to_json(filelocation)
else:
df.to_json(diff_filelocation)
The error is receive is: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all. I used bool and still get the same error message. When I tried any(), then only the first file gets created. I have checked multiple posts, but nothing seems to work.
I have tried the following method as well
if df[df['Row2'] == '10']
or
if df.loc[(df.Row2=='10')]
but those aren't working either.
I am also confused as something like df[df["Row2"]] works, but not in an if statement.
Thanks in advance.

You need to separate df on two different segments basing on boolean mask:
m = df['Row2'].eq(10)
d[m].to_json(filelocation)
d[~m].to_json(diff_filelocation)

Related

How to apply multiple conditions in a dataframe

I do have an issue while applying a condition and then applying a value to that particular values.
I'm new to this just practicing with dataset.
My question is I need to apply a bonus mark to those students who have secured above 77 marks for 'math'.
Any suggestion would be helpful.
I have used if condition to check if the mark is above 77 and then created a column giving bonus mark to those.
What I have tried is :The above image gives the values for those students who have secured more than 77 in 'math_score'
Now, I need to give bonus score to these students, what I tried is that I looped through each student and then wrote a if condition gives me an error as '
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Please help me to achieve this goal, do correct me too. This image shows the result for the condityion, who have secured score more than 77
This is the error, I have got when I applied my condition

You can use np.where():
import numpy as np
df['bonus']=np.where(df['math_score'] > 77,(df['math_score'] + df['reading_score'] + df['writing_socre'] + 10),(df['math_score'] + df['reading_score'] + df['writing_socre']))

How to apply a function with several dataframe columns as arguments?

I'm trying to compute a new column in a pandas dataframe, based upon others columns, and a function I created. Instead of using a for loop, I prefer to apply the function with entires dataframe columns.
My code is like this :
df['po'] = vect.func1(df['gra'],
Se,
df['p_a'],
df['t'],
Tc)
where df['gra'], df['p_a'], and df['t'] are my dataframe columns (parameters), and Se and Tc are others (real) parameters. df['po'] is my new column.
func1 is a function described in my vect package.
This function is :
def func1(g, surf_e, Pa, t, Tco):
if (t <= Tco):
pos = (g-(Pa*surf_e*g))
else:
pos = 0.0
return(pos)
When implemented this way, I obtain an error message, which concern the line : if (t <= Tco):
The error is :
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I read the pandas documentation, but didn't find the solution. Can anybody explain me what is the problem ?
I tried to use apply :
for example :
df['po'] = df['gra'].apply(vect.func1)
but I don't know how to use apply with multiples columns as parameters.
Thank you by advance.

Use np.where with the required condition, value when the condition is True and the default value.
df['po'] = np.where(
df['t'] <= Tc, # Condition
df['gra'] - (df['P_a'] * Se * df['gra']), # Value if True
0 # Value if False
)
EDIT:
Don't forget to import numpy as np
Also, you get an error because you are comparing a series to a series
and hence obtain a series of boolean values and not an atomic boolean
value which if condition needs.

Python Dataframe If Else for every element in Dataframe to create seperate list

im struggling to find a solution for the following issue,
it's a very simple Code and I guess that's why it's not working properly.
Python keeps throwing These error Messages at me or gives me an Output I didnt not intend to receive.
I am new to Python, so im still learning. Looked up possible solution online, yet I did not find anything that fits my Problem.
I am trying to
check (if else) for every single element in my dataframe for it's value (I have Doubles only, no String in my dataframe)
and then add the element in Question to either list A1 or A2
the dataframe itself I don't want to modify
Approach1:
def function103(x):
if (x > 0):
A1.append(x)
else:
A2.append(x)
return x
# Applying the Code to the dataframe:
df.apply(lambda x: function103(x), axis=1)
Dataframe is various in size, at least 1000*3000, so i am Looking for a General way to approach
Approach2:
def function103(x):
if (x > 0).any():
A1.append(x)
else:
A2.append(x)
return x
--> gives me a strange list for my A1/A2:
[R1 6.951920e-310
R2 6.951920e-310
R3 6.951920e-310
R4 6.951920e-310
R5 6.951920e-310
Name: 2010-09-30T00:00:00.000000000, dtype: float64, Return1 6.951920e-310
……..]
I dont even think this is considered a list - even though it Closes and opens with These brackets []
I Always reveice this error message:
The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() (#approach1)
And for #approach2 the list im receiving doesn't help at all. I only want the single Elements in said list that meet the If criteria.

Another approach:
x = df[df>0].to_numpy().flatten()
y = df[df<=0].to_numpy().flatten()
A1.extend(x[~np.isnan(x)])
A2.extend(y[~np.isnan(y)])

applymap() can do this you. It'll apply to function to every cell in the dataframe.
Instead of
df.apply(lambda x: function103(x), axis=1)
Simply use this
df.applymap(function103)

Python-pandas: the truth value of a series is ambiguous

I am currently trying to compare values from a json file(on which I can already work on) to values from a csv file(which might be the issue). My current code looks like this:
for data in trades['timestamp']:
data = pd.to_datetime(data)
print(data)
if data == ask_minute['lastUpdated']:
#....'do something'
Which gives:
":The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."
My current print(data) looks like this:
2018-10-03 18:03:38.067000
2018-10-03 18:03:38.109000
2018-10-03 18:04:28
2018-10-03 18:04:28.685000
However, I am still unable to compare these timestamps from my CSV file to those of my Json file. Does someone have an idea?

Let's reduce it to a simpler example. By doing for instance the following comparison:
3 == pd.Series([3,2,4,1])
0 True
1 False
2 False
3 False
dtype: bool
The result you get is a Series of booleans, equal in size to the pd.Series in the right hand side of the expression. So really what's happening here is that the integer is being broadcast across the series, and then they are compared. So when you do:
if 3 == pd.Series([3,2,4,1]):
pass
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
You get an error. The problem here is that you are comparing a pd.Series with a value, so you'll have multiple True and multiple False values, as in the case above. This of course is ambiguous, since the condition is neither True or False.
So you need to further aggregate the result so that a single boolean value results from the operation. For that you'll have to use either any or all depending on whether you want at least one (any) or all values to satisfy the condition.
(3 == pd.Series([3,2,4,1])).all()
# False
or
(3 == pd.Series([3,2,4,1])).any()
# True

The problem I see is that even if you are evaluating one row in a dataframe, the code knows that a dataframe has the ability to have many rows. The code doesn't just assume you want the only row that exists. You have to tell it explicitly. The way I solved it was like this:
if data.iloc[0] == ask_minute['lastUpdated']:
then the code knows you are selecting the one row that exists.

How to loop through a pandas array?

I read several questions about for loop of pandas Dataframe, but couldn't work it out for my case
px=pd.read_sql()
for i, row in px.iterrows():
if x == 1 :
if px['first'] <= S1 :
S1 = px['first'];
if px['second'] > S2 :
previous_value = last_value;
last_value = px['second'];
x = 0;
else :
.....
This is, of course, part of the code to show the looping logic. I expected that the rows are read one by one as I can compare the values of each row with the previous row, but
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

You're accessing the entire column px['first'] from inside the loop that's intended to access only one entry at a time.
To fix your current loop, it might be enough just to change px['first'] to row['first'], and also px['second'] to row['second'].
Better would be to replace this manual looping with equivalent pandas expressions, which will be much faster and readable. If you post the full code (edit into question, not as comments!), we might be able to help.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create Json File if Dataframe > Column > Criteria matches - python

You need to separate df on two different segments basing on boolean mask: m = df['Row2'].eq(10) d[m].to_json(filelocation) d[~m].to_json(diff_filelocation)

Related

How to apply multiple conditions in a dataframe

How to apply a function with several dataframe columns as arguments?

Python Dataframe If Else for every element in Dataframe to create seperate list

Python-pandas: the truth value of a series is ambiguous

How to loop through a pandas array?

Categories

Resources