How to skip rows in pandas dataframe iteration - python

So I've created a dataframe called celtics and the last column is called 'Change in W/L%' and is right now filled with all 0s.
I want to calculate the change in Win-Loss Percentage (see 'W/L%' column) if the coach's name in one row of Coaches is different from the name of the coach right underneath that row. I have written this loop to try and execute this program:
i = 0
while i < len(celtics) - 1:
if (celtics["Coaches"].loc[i].split("("))[0] != (celtics["Coaches"].loc[i + 1].split("("))[0]:
celtics["Change in W/L%"].loc[i] = celtics["W/L%"].loc[i] - celtics["W/L%"].loc[i + 1]
i = i + 1
i = i + 1
So basically, if the name of the coach in Row i is different than the name of the coach in Row i+1, the change in W/L% between the two rows and is added to Row i of the Change in W/L% column. However, when I execute the code, the dataframe ends up looking like this.
For example, Row 1 should just have 0 in the Change in W/L% column; instead, it has been replaced by the difference in W/L% between Row 1 and Row 2, even though the coach's name is the same in both Rows. Could anyone help me resolve this issue? Thanks!

Check out this solution from this question here on StackOverflow.
Skip rows while looping over data frame Pandas

Related

python counting rows with missing values doesn't work

i do'nt know why but the code to calculate rows with missing values doesn't work.
Can somebody please hlep?
excel file showing data
code in IDE
in excel, the rows that have missing values were 156 in total but i can't get this in python
using the code below
(kidney_df.isna().sum(axis=1) > 0).sum()
count=0
for i in kidney_df.isnull().sum(axis=1):
if i>0:
count=count+1
kidney_df.isna().sum().sum()
kidney_df is a whole dataframe, do you want to count each empty cell or just the empty cells in one column? Based on the formula in your image, it seems your are interested only in column 'Z'. You can specify that by using .iloc[] (index location) or by specifying the column name (not visible in your imgage) like so:
kidney_df.iloc[:, 26].isnull().sum()
Explaination:
.iloc[] # index location
: # meaning -> from row 0 to last row or '0:-1' which can be shortened to ':'
26 # which is the column index of column 'Z' in excel

How do I search a pandas dataframe to get the row with a cell matching a specified value?

I have a dataframe that might look like this:
print(df_selection_names)
name
0 fatty red meat, like prime rib
0 grilled
I have another dataframe, df_everything, with columns called name, suggestion and a lot of other columns. I want to find all the rows in df_everything with a name value matching the name values from df_selection_names so that I can print the values for each name and suggestion pair, e.g., "suggestion1 is suggested for name1", "suggestion2 is suggested for name2", etc.
I've tried several ways to get cell values from a dataframe and searching for values within a row including
# number of items in df_selection_names = df_selection_names.shape[0]
# so, in other words, we are looping through all the items the user selected
for i in range(df_selection_names.shape[0]):
# get the cell value using at() function
# in 'name' column and i-1 row
sel = df_selection_names.at[i, 'name']
# this line finds the row 'sel' in df_everything
row = df_everything[df_everything['name'] == sel]
but everything I tried gives me ValueErrors. This post leads me to think I may be
way off, but I'm feeling pretty confused about everything at this point!
https://pandas.pydata.org/docs/reference/api/pandas.Series.isin.html?highlight=isin#pandas.Series.isin
df_everything[df_everything['name'].isin(df_selection_names["name"])]

For Python Pandas, how to implement a "running" check of 2 rows against the previous 2 rows?

[updated with expected outcome]
I'm trying to implement a "running" check where I need the sum and mean of two rows to be more than the previous 2 rows.
Referring to the dataframe (copied into spreadsheet) below, I'm trying code out a function where if the mean of those two orange cells is more than the blue cells, the function will return true for row 8, under a new column called 'Cond11'. The dataframe here is historical, so all rows are available.
Note that that Rows column is added in the spreadsheet, easier for me to reference the rows here.
I have been using .rolling to refer to the current row + whatever number of rows to refer to, or using shift(1) to refer to the previous row.
df.loc[:, ('Cond9')] = df.n.rolling(4).mean() >= 30
df.loc[:, ('Cond10')] = df.a > df.a.shift(1)
I'm stuck here... how to I do this 2 rows vs the previous 2 rows? Please advise!
The 2nd part of this question: I have another function that checks the latest rows in the dataframe for the same condition above. This function is meant to be used in real-time, when new data is streaming into the dataframe and the function is supposed to check the latest rows only.
Can I check if the following code works to detect the same conditions above?
cond11 = candles.n[-2:-1].sum() > candles.n[-4:-3].sum()
I believe this solves your problem:
df.rolling(4).apply(lambda rows: rows[0] + rows[1] < rows[2] + rows[3])
The first 3 rows will be NaNs but you did not define what you would like to happen there.
As for the second part, to be able to produce this condition live for new data you just have to prepend the last 3 rows of your current data and then apply the same process to it:
pd.concat([df[-3:], df])

iterrows() loop is only reading last value and only modifying first row

I have a dataframe test. My goal is to search in the column t1 for specific strings, and if it matches exactly a specific string, put that string in the next column over called t1_selected. Only thing is, I can't get iterrows() to go over the entire dataframe, and to report results in respective rows.
for index, row in test.iterrows():
if any(['ABCD_T1w_MPR_vNav_passive' in row['t1']]):
#x = ast.literal_eval(row['t1'])
test.loc[i, 't1_selected'] = str(['ABCD_T1w_MPR_vNav_passive'])
I am only trying to get ABCD_T1w_MPR_vNav_passive to be in the 4th row under the t1_selected, while all the other rows will have not found. The first entry in t1_selected is from the last row under t1 which I didn't include in the screenshot because the dataframe has over 200 rows.
I tried to initialize an empty list to append output of
import ast
x = ast.literal_eval(row['t1'])
to see if I can put x in there, but the same issue occurred.
Is there anything I am missing?
for index, row in test.iterrows():
if any(['ABCD_T1w_MPR_vNav_passive' in row['t1']]):
#x = ast.literal_eval(row['t1'])
test.loc[index, 't1_selected'] = str(['ABCD_T1w_MPR_vNav_passive'])
Where index is the row its written to. With i it was not changing

Python - Appending new columns to Excel File based on row/cell info

I am trying to append new columns to an excel file. But the cell value depends on the cells from the same row. I need to add about 8 columns.
Right now, based on one of the cells, let's say serialno, i do a lookup against a JSON URL and pull the relevant column info. But i need to write this info to that particular row.
So far all the help i have found shows adding 1 whole column at a time. Is that the best option or is there an easier process to add all 8 columns and keep appending row wise? I want to be careful with any blank information, as I want the cell to stay blank.
I'm a novice at this and pretty much learning and doing by available scripts.
Thanks for any direction you can provide.
Here is some code i'm currently using
except IndexError:
cols = [col for col in df.columns if 'no' in col]
col_name = cols[0]
for x in df.index:
n = 9 - len(str(df[col_name][x]))
num = str(df[col_name][x]).rjust(n + len(str(df[col_name][x])), '0')
with suppress(KeyError, UnicodeEncodeError):
main(num)
def main(num):
for i in jsonData["people"]:
room_no = jsonData["people"][i]["roomno"]
title = jsonData["people"][i]["title"]
fname = jsonData["people"][i]["full_name_ac"]
tel = jsonData["people"][i]["telephone"]

Categories

Resources