So I'm trying to go through my dataframe in pandas and if the value of two columns is equal to something, then I change a value in that location, here is a simplified version of the loop I've been using (I changed the values of the if/else function because the original used regex and stuff and was quite complicated):
pro_cr = ["IgA", "IgG", "IgE"] # CR's considered productive
rows_changed = 0
prod_to_unk = 0
unk_to_prod = 0
changed_ids = []
for index in df_sample.index:
if num=1 and color="red":
pass
elif num=2 and color="blue":
prod_to_unk += 1
changed_ids.append(df_sample.loc[index, "Sequence ID"])
df_sample.at[index, "Functionality"] = "unknown"
rows_changed += 1
elif num=3 and color="green":
unk_to_prod += 1
changed_ids.append(df_sample.loc[index, "Sequence ID"])
df_sample.at[index, "Functionality"] = "productive"
rows_changed += 1
else:
pass
print("Number of productive columns changed to unknown: {}".format(prod_to_unk))
print("Number of unknown columns changed to productive: {}".format(unk_to_prod))
print("Total number of rows changed: {}".format(rows_changed))
So the main problem is the changing code:
df_sample.at[index, "Functionality"] = "unknown" # or productive
If I run this code without these lines of code, it works properly, it finds all the correct locations, tells me how many were changed and what their ID's are, which I can use to validate with the CSV file.
If I use df_sample["Functionality"][index] = "unknown" # or productive the code runs, but checking the rows that have been changed shows that they were not changed at all.
When I use df.at[row, column] = value I get "AttributeError: 'BlockManager' object has no attribute 'T'"
I have no idea why this is showing up. There are no duplicate columns. Hope this was clear (if not let me know and I'll try to clarify it). Thanks!
To be honest, I've never used df.at - but try using df.loc instead:
df_sample.loc[index, "Functionality"] = "unknown"
You can also iat.
Example: df.iat[iTH row, jTH column]
Related
I've written something to calculate reserves where already reserved quantity needs to be taken into account in the next iteration. Only thing is the parameter isn;t used in the next iteration. The out put does contain the calculation, but the parameter isn't carried over. Any suggestions on how to solve this? Feel free to point out the muppetness of anything in the code, still learning
import pandas as pd
df=pd.read_csv('test_file.csv')
final_reserve = []
for i in range(len(df)):
#If it's the first row or a new sku set the reserved to 0
if i == 0 or df.loc[i]['SKU'] == df.loc[i-1]['SKU']:
reserved = 0
#calculate reserve
to_reserve = df.loc[i]['ON_HAND'] - (df.loc[i]['SALES']*2) - df.loc[i]['ALREADY_RESERVED']
final_reserve.append(to_reserve)
#add to the reserved parameter
reserved += to_reserve
else:
#if it's not the first row or a new sku take the already reserved units into account in the calculation
to_reserve = df.loc[i]['ON_HAND'] - (df.loc[i]['SALES']*2) - df.loc[i]['ALREADY_RESERVED']-reserved
final_reserve.append(to_reserve)
reserved += to_reserve
df['to_reserve'] = final_reserve
df.head()
As the output shown below shows, on 2nd row it's getting to 500, where it should deduct the 300 already reserved on 1st row
output reserve
I just had a quick look at your question, based on the first comment in your for loop:
#If it's the first row or a new sku set the reserved to 0
you want to change the next line to:
if i == 0 or df.loc[i]['SKU'] <> df.loc[i-1]['SKU']:
That is, change the second == operator to an <> operator.
As it stands now, your code is reseting the reserved to 0 if the SKU evaluated matches the previous one, and not if it is a new SKU.
I am trying to iterate through dataframe rows and set a new column to either a 1 or 0 depending on conditions. Up to the if statement works fine, but once the elif section is added it gives me an "index out of bounds error". Any ideas on how to remedy this?
low=history.low
high = history.high
history['bottom'] = " "
history['top']=" "
for i in range(len(history)):
if (low[i] < low[i-1]) and (low[i] < low[i-2]) and (low[i] < low[i+1]) and (low[i] < low[i+2]) :
history['bottom'][i] = 1
elif (high[i] > high[i-1]) and (high[i] > high[i-2]) and (high[i] > high[i+1]) and (high[i] > high[i+2]):
history['top'][i]=1
else:
history['top'][i] = 0
history['bottom'][i] = 0
One of our error is explained by #Code-Apprentice
Other which I found and I think you are looking for this is that in lines
history['bottom'][i] = 1 and history['top'][i]=1 you are trying to change the value of an index which may not be present in it.
For example if i = 1 then the lines specified above will generate error as index 1 is not present in them. (They only have index 0).
Instead of using index to change values you can use .append to add values
It's because this is trying to access an index that doesn't exist. I would like to see more of the code above to know what history.low and history.high is referring to for the value.
But have you gotten any results before the error?
Also, please explain len(history). In your code there is a history dictionary where you have history['bottom'] = " " and history['top']=" ", but at the same time you have low=history.low and high = history.high. What's the difference between these two history objects/variables?
Please show more of your code.
I want to create a dataframe, to which various users (name, phone number, address...) are continously being added. Now, I need a function, that automatically generates an ID once a new, non-existing user is added to the dataframe.
The first user should get the ID U000001, the second user the ID U000002 and so on.
What's the best way to do this?
If I'm understanding correctly, the main problem is the leading zeros. i.e. you can't just increment the previous ID, because typecasting '0001' just gives 1 instead of 0001. Please correct me if I'm wrong.
Anyways, here's what I came up with. It's far more verbose than you probably need, but I wanted to make sure my logic was clear.
def foo(previous):
"""
Takes in string of format 'U#####...'
Returns incremented value in same format.
Returns None if previous already maxed out (i.e. 'U9999')
"""
value_str = previous[1:] # chop off 'U'
value_int = int(value_str) # get integer value
new_int = value_int + 1 # increment
new_str = str(new_int) # turn back into string
# return None if exceeding character limit on ID
if len(new_str) > len(value_str):
print("Past limit")
return(None)
# add leading zeroes
while(len(new_str) < len(value_str)):
new_str = '0' + new_str
# add 'U' and return
return('U' + new_str)
Please let me know if I can clarify anything! Here's a script you can use to test it:
# test
current_id = 'U0001'
while(True):
current_id = foo(current_id)
print(current_id)
if current_id == None:
break
Code snippet:
for row in df.itertuples():
current_index = df.index.get_loc(row.Index)
if current_index < max_index - 29:
if df.iloc[current_index + 30].senkou_span_a == 0.0:
df.iloc[current_index + 30].senkou_span_a = 6700
if df.iloc[current_index + 30].senkou_span_b == 0.0:
df.iloc[current_index + 30].senkou_span_b = 6700.0
the last line where I am assigning a value via iloc, it goes through, but the resultant value is 0.0. I ran into this before where I was assigning the value to a copy of the df, but that doesn't seem to be the case this time. I have been staring at this all day. I hope it is just something silly.
df is timeseries(DateTimeIndex) financial data. I have verified the correct index exists, and no exceptions are thrown.
There is other code to assign values and those work just fine, omitted that code for the sake of brevity.
EDIT
This line works:
df.iloc[current_index + 30, df.columns.get_loc('senkou_span_b')] = 6700
why does this one, and not the original?
I'm not sure exactly what's causing your problem (I'm pretty new to Python), but here's some things that came to mind that might be helpful:
Assuming that the value you're replacing is always 0 or 0.0, maybe try switching from = to += to add instead of assign?
Is your dataframe in tuple format when you attempt to assign the value? Your issue might be that tuples are immutable.
If senkou_span_a refers to a column that you're isolating, maybe try only using iloc to isolate the value like df.iloc[(currentindex + 30), 1] == 0.0
Hope this helped!
I have written a python script in ArcGIS that selects features that intersect. It needs to keep repeating until all relevant features are selected. At this point the selection will stop changing. Is it possible to set a loop to keep repeating until the number of selected features is the same as last time it looped? I can get the selected features using the arcpy.GetCount_management() method.
I've set the number of selected features to be a variable:
selectCount = arcpy.GetCount_management("StreamT_StreamO1")
Then this is the
mylist = []
with arcpy.da.SearchCursor("antiRivStart","ORIG_FID") as mycursor:
for feat in mycursor:
mylist.append(feat[0])
liststring = str(mylist)
queryIn1 = liststring.replace('[','(')
queryIn2 = queryIn1.replace(']',')')
arcpy.SelectLayerByAttribute_management('StreamT_StreamO1',"ADD_TO_SELECTION",'OBJECTID IN '+ queryIn2 )
arcpy.SelectLayerByLocation_management("antiRivStart","INTERSECT","StreamT_StreamO1","","ADD_TO_SELECTION")
So what I want to do would effectively be:
while selectcount == previousselectcount:
do stuff
but I don't know how the while loop is supposed to be constructed
You are pretty close to how you would monitor the change in the number of features. Consider the following.
previousselectcount = -1
selectcount = arcpy.GetCount_management("StreamT_StreamO1")
while selectcount != previousselectcount:
do stuff
# update both counts at the end of what you want to do in the while loop
previousselectcount = selectcount
selectcount = arcpy.GetCount_management("StreamT_StreamO1")
Note the not equals operator (!=) in the while loop condition.
python wiki
If selectcount or previousselectcount are of type float, you probably wants to do a range
aka
while selectcount >= previousselectcount+c:
....
with c a positive constant very close to zero.