Assign value via iloc to Pandas Data Frame - python

Code snippet:
for row in df.itertuples():
current_index = df.index.get_loc(row.Index)
if current_index < max_index - 29:
if df.iloc[current_index + 30].senkou_span_a == 0.0:
df.iloc[current_index + 30].senkou_span_a = 6700
if df.iloc[current_index + 30].senkou_span_b == 0.0:
df.iloc[current_index + 30].senkou_span_b = 6700.0
the last line where I am assigning a value via iloc, it goes through, but the resultant value is 0.0. I ran into this before where I was assigning the value to a copy of the df, but that doesn't seem to be the case this time. I have been staring at this all day. I hope it is just something silly.
df is timeseries(DateTimeIndex) financial data. I have verified the correct index exists, and no exceptions are thrown.
There is other code to assign values and those work just fine, omitted that code for the sake of brevity.
EDIT
This line works:
df.iloc[current_index + 30, df.columns.get_loc('senkou_span_b')] = 6700
why does this one, and not the original?

I'm not sure exactly what's causing your problem (I'm pretty new to Python), but here's some things that came to mind that might be helpful:
Assuming that the value you're replacing is always 0 or 0.0, maybe try switching from = to += to add instead of assign?
Is your dataframe in tuple format when you attempt to assign the value? Your issue might be that tuples are immutable.
If senkou_span_a refers to a column that you're isolating, maybe try only using iloc to isolate the value like df.iloc[(currentindex + 30), 1] == 0.0
Hope this helped!

Related

Can I use a value stored in a variable after an if statement with the loc method?

I created an if else statement to determine whether the min or max is the bigger difference and then stored the number to a variable.
findValue = 0.0
minAbs = abs(df[["numbers"]].min())
maxAbs = abs(df[["numbers"]].max())
if minAbs > maxAbs:
findValue = minAbs
else:
findValue = maxAbs
**df2=df.loc[df['numbers'] == findValue, 'day_related']**
df2
Python hates that I use findValue and not the actual number that it's set equal to in the statement with ** around it, but I thought these are interchangeable?
df[["numbers"]] creates a new dataframe with one column called numbers, so df[["numbers"]].max() isn't actually going to return a number; it's going to return a Series object with one item. df["numbers"] will return the actual numbers column so .max() and .min() will work as expected.
Change your code to this:
minAbs = abs(df["numbers"].min())
maxAbs = abs(df["numbers"].max())
...and then the rest of your code.

Appending values to a dictionary keeps replacing the previous value

I am trying to append values to my dictionary under the key 'UMINV', which it seems to be doing. The problem is, it keeps replacing the values that were previously there.
colpath = '/home/jacob/PHOTOMETRY/RESTFRAME_COLOURS/'
goodcolindx = {}
colfiledat = {}
colors = {}
for iclust in range(len(clustname)):
filepath = catpath + clustname[iclust] + "_totalall_" + extname[iclust] + ".cat"
photdat[clustname[iclust]] = ascii.read(filepath)
filepath = zpath + "compilation_" + clustname[iclust] + ".dat"
zdat[clustname[iclust]] = ascii.read(filepath)
colfilepath = colpath + 'RESTFRAME_MASTER_' + clustname[iclust] + '_indivredshifts.cat'
colfiledat[clustname[iclust]] = ascii.read(colfilepath)
goodcolindx[clustname[iclust]] = np.where((colfiledat[clustname[iclust]]['REDSHIFTUSED'] > 0.9) & \
(colfiledat[clustname[iclust]]['REDSHIFTUSED'] < 1.5) & \
(photdat[clustname[iclust]]['totmask'] == 0) & \
(photdat[clustname[iclust]]['K_flag'] == 0) & \
((zdat[clustname[iclust]]['quality'] == 3) | (zdat[clustname[iclust]]['quality'] == 4)))
goodcolindx[clustname[iclust]] = goodcolindx[clustname[iclust]][0]
for igood in range(len(goodcolindx[clustname[iclust]])):
colors['UMINV'] = np.array([])
print(colfiledat[clustname[iclust]]['UMINV'][goodcolindx[clustname[iclust]][igood]])
colors['UMINV'] = np.append(colors['UMINV'], colfiledat[clustname[iclust]]['UMINV'][goodcolindx[clustname[iclust]][igood]])
print(colors)
The print statement at the end outputs 1.859, which is the last value in the data set, so it is cycling through them correctly, but it keeps appending over the previous value when I run a debugger. How do I make it so it appends all the values, not just replacing the previous one?
I only understand maybe 10% of the code you show, but I suspect I know what's wrong with your code.
On each iteration of the last loop, you're clobbering the value of colors['UMINV'] with this line:
colors['UMINV'] = np.array([])
When you later append a value to that empty array, it will be the only one. On the next iteration, you reinitialize to an empty array before appending another single value.
I suspect you want the line above to only run once (or maybe once per run of the outer loop, your code is pretty confusing so I'm only guessing at your intentions). That's not hard to fix, just move it up the file, either all the way near the top, just below the dictionary definition, or just above the for igood in range(...) line.
I'm also not sure what use that dictionary is, really, if you're not using more than one key in it. Just use a simple variable if you only want one array!

Unable to change value of dataframe at specific location

So I'm trying to go through my dataframe in pandas and if the value of two columns is equal to something, then I change a value in that location, here is a simplified version of the loop I've been using (I changed the values of the if/else function because the original used regex and stuff and was quite complicated):
pro_cr = ["IgA", "IgG", "IgE"] # CR's considered productive
rows_changed = 0
prod_to_unk = 0
unk_to_prod = 0
changed_ids = []
for index in df_sample.index:
if num=1 and color="red":
pass
elif num=2 and color="blue":
prod_to_unk += 1
changed_ids.append(df_sample.loc[index, "Sequence ID"])
df_sample.at[index, "Functionality"] = "unknown"
rows_changed += 1
elif num=3 and color="green":
unk_to_prod += 1
changed_ids.append(df_sample.loc[index, "Sequence ID"])
df_sample.at[index, "Functionality"] = "productive"
rows_changed += 1
else:
pass
print("Number of productive columns changed to unknown: {}".format(prod_to_unk))
print("Number of unknown columns changed to productive: {}".format(unk_to_prod))
print("Total number of rows changed: {}".format(rows_changed))
So the main problem is the changing code:
df_sample.at[index, "Functionality"] = "unknown" # or productive
If I run this code without these lines of code, it works properly, it finds all the correct locations, tells me how many were changed and what their ID's are, which I can use to validate with the CSV file.
If I use df_sample["Functionality"][index] = "unknown" # or productive the code runs, but checking the rows that have been changed shows that they were not changed at all.
When I use df.at[row, column] = value I get "AttributeError: 'BlockManager' object has no attribute 'T'"
I have no idea why this is showing up. There are no duplicate columns. Hope this was clear (if not let me know and I'll try to clarify it). Thanks!
To be honest, I've never used df.at - but try using df.loc instead:
df_sample.loc[index, "Functionality"] = "unknown"
You can also iat.
Example: df.iat[iTH row, jTH column]

Separating if conditions for which there can be some overlapping cases

Given a pandas dataframe wb, which looks like this (in Excel, before bringing
it into pandas with read_csv():
Column ad_tag_name is in groups of 3. I want to append _level2 to every second of each group of 3, and _level3 to the value of this column in every third of each group of 3, so I end up with something like:
I have decided to use mod division, with the logic that "if it divides evently by both 2 and 3, then append _level3; if it divides evenly only by 2, then append _level2. if it divides evenly only by 3, then append _level3 Otherwise, leave it alone."
for index, elem in enumerate(wb['ad_requests']):
if np.mod(index+1,2) == 0 and np.mod(index+1,3) == 0:
wb.at[index,'\xef\xbb\xbf"ad_tag_name"'] = wb.at[index,'\xef\xbb\xbf"ad_tag_name"'] + "_level3"
elif np.mod(index+1,3) == 0:
wb.at[index,'\xef\xbb\xbf"ad_tag_name"'] = wb.at[index,'\xef\xbb\xbf"ad_tag_name"'] + "_level3"
elif np.mod(index+1,2) == 0:
wb.at[index,'\xef\xbb\xbf"ad_tag_name"'] = wb.at[index,'\xef\xbb\xbf"ad_tag_name"'] + "_level2"
Yet when I save the resulting CSV and examine it, I see:
The pattern is: no suffix, _level2, _level3, level2, no suffix, level3, no suffix, level2, level3 and then this repeats. So it's correct in 8 out of 9 cases, but really that is an accident. I don't like the fact that there may be some overlap between the ifs/elifs I have defined, and I am sure it is this flawed logic that it as the root of the problem.
How can we re-write the conditions so that they are properly achieving the logic I have in mind?
Python: 2.7.10
Pandas: 0.18.0
While pandas can provide some elegant shortcuts, it can also lead one down rabbit-holes of trial-and-error.
Sometimes going back to basics, to what Python provides built in, is the way to go.
for i in range(len(wb))[2::3]:
wb.at[i,'\xef\xbb\xbf"ad_tag_name"'] = wb.at[i,'\xef\xbb\xbf"ad_tag_name"'] + "_level3"
for i in range(len(wb))[1::3]:
wb.at[i,'\xef\xbb\xbf"ad_tag_name"'] = wb.at[i,'\xef\xbb\xbf"ad_tag_name"'] + "_level2"

Add up the value of data[x] to data[x+1]

I have a long list of data which I am working with now,containing a list of 'timestamp' versus 'quantity'. However, the timestamp in the list is not all in order (for example,timestamp[x] can be 140056 while timestamp[x+1] can be 560). I am not going to arrange them, but to add up the value of timestamp[x] to timestamp[x+1] when this happens.
ps:The arrangement of quantity needs to be in the same order as in the list when plotting.
I have been working with this using the following code, which timestamp is the name of the list which contain all the timestamp values:
for t in timestamp:
previous = timestamp[t-1]
increment = 0
if previous > timestamp[t]:
increment = previous
t += increment
delta = datetime.timedelta(0, (t - startTimeStamp) / 1000);
timeAtT = fileStartDate + (delta + startTime)
print("time at t=" + str(t) + " is: " + str(timeAtT));
previous = t
However it comes out with TypeError: list indices must be integers, not tuples. May I know how to solve this, or any other ways of doing this task? Thanks!
The problem is that you're treating t as if it is an index of the list. In your case, t holds the actual values of the list, so constructions like timestamp[t] are not valid. You either want:
for t in range(len(timestamp)):
Or if you want both an index and the value:
for (t, value) in enumerate(timestamp):
When you for the in timestamp you are making t take on the value of each item in timestamp. But then you try to use t as an index to make previous. To do this, try:
for i, t, in enumerate(timestamp):
previous = timestamp[i]
current = t
Also when you get TypeErrors like this make sure you try printing out the intermediate steps, so you can see exactly what is going wrong.

Categories

Resources