I have a table of permissible stress (pink) values for slenderness ratio (green) and yield strength (blue) from local govt byelaws.
Taking values using df.at[A, B], where A and B are variables coming from pre-processes, say A = 100 and B = 250, I want to take values 110 from (green) and 72 (from pink).
What I have tried:
I need these values for interpolation, I have tried interpolation methods which come with pandas, but for the particular use case, I need to interpolate these values mathematically.
I have also tried taking values from index, by first finding index of the value, and then adding 1 to the index, but that method is not a viable option for various reasons.
I have also tried simply adding the column value interval to the A and B values, but as can be observed, while the green values (A) is uniform and incremental, the value intervals for yield strength (B) (in blue), while incremental, is not uniform.
[edit 2]
I am have tried df.where, There should be another way to find co-ordinates of a value.
I have been stuck for a while, any help/suggestions will be appreciated! Thanks!
Not sure if I understood what you are trying to achieve. If I got it right, you want to get the pink value based on the slenderness ratio index that follows a certain value und a yield strength at a specific value (or column).
df.loc[np.roll(df.index == A, shift=1), B]
This would shift the logical index (green) by one.
Related
I would like to generate a list of random values with it's highest value at given index and decreasing values towards start and end of the list. E.g. if I have 1 dim 10 element long list and I'm at index 3 then this would be my highest value and other values decreases towards index 0 and index 9. So the two primary parameters would be list length and Index of top value. It would be also nice to control random value range and mean of the list.
Would anyone know function (combination of functions) from numpy / scipy etc. that would satisfy this case? I was looking at numpy's different kind of norm functions but this is not what I'm looking for.
I have the following data (see attached - easier this way). I am trying to find the first occurrence of the value 0 for each customer ID. Then, I plan to use code similar to below to create a Kaplan-Meier curve:
from lifelines import KaplanMeierFitter
## Example Data
durations = [5,6,6,2.5,4,4]
event_observed = [1, 0, 0, 1, 1, 1]
## create a kmf object
kmf = KaplanMeierFitter()
## Fit the data into the model
kmf.fit(durations, event_observed,label='Kaplan Meier Estimate')
## Create an estimate
kmf.plot(ci_show=False) ## ci_show is meant for Confidence interval, since our data set is too tiny, thus i am not showing it.
(this code is from here).
What' the simplest way to do this? Note that I want to ignore the NAs: I have plenty of them and there's no getting around that.
Thanks!
I'm gonna assume that all rows contain at least one non-NaN value.
One thing we'd have to do first is just ensure that we operate on a dataframe where there is indeed a zero; we can accomplish this with min.
This will give us a series, and we just have to select on the rows that contain zero:
df.loc[min_series == 0]
Then, we can use idxmin:
df.idxmin(1, skipna=True)
This should spit out the period on which the first 0 is encountered (we've guaranteed that all rows contain a 0).
Then, this should give you what you're looking for!
I'm trying to find a vectorized way of determining the first instance where my column of data has a sign change. I looked at this question and it gets close to what I want, except it evaluates my first zeros as true. I'm open to different solutions including changing how the data is set up in the first place. I'll detail what I'm doing below.
I have two columns, let's call them positive and negative, that look at a third column. The third column has values ranging between [-5, 5]. When this column is [3, 5], my positive column gets a +1 on that same row; all other rows are 0 in that column. Likewise, when the third column is between [-5, -3], my negative column gets a -1 in that row; all other rows are 0.
I combine these columns into one column. You can conceptualize this as 'turn machine on, keep it on/off, turn it off, keep it on/off, turn machine on ... etc.' The problem I've having is that my combined column looks something like below:
pos = [1,1,1,0, 0, 0,0,0,0,0,1, 0,1]
neg = [0,0,0,0,-1,-1,0,0,0,0,0,-1,0]
com = [1,1,1,0,-1,-1,0,0,0,0,1,-1,1]
# Below is what I want to have as the final column.
cor = [1,0,0,0,-1, 0,0,0,0,0,1,-1,1]
The problem with what I've linked is that it gets close, but it evaluates the first 0 as a sign change as well. 0's should be ignored and I tried a few things, but seem to be creating new errors. For the sake of completeness, this is what the code linked outputs:
lnk = [True,False,False,True,True,False,True,False,False,False,True,True,True]
As you can see, it's doing the 1 and -1 not flipping fine, but the zero's it's flipping. Not sure if I should change how the combined column is made or if I should change the logic for the creation of the component columns, both. The big thing is I need to vectorize this code for performance concerns.
Any help would be greatly appreciated!
Let's suppose your dataframe is named df with columns pos and neg then you can try something like the following :
df.loc[:, "switch_pos"] = (np.diff(df.pos, prepend=0) > 0)*1
df.loc[:, "switch_neg"] = (np.diff(df.neg, prepend=0) > 0)*(-1)
You can then combine your two switchs columns.
Explanations
no.diff gives you the difference row by row but setting (for pos columns) 1 for 0 to 1 and - 1 for 1 to 0. Considering your desired output, you want to keep only your 0 to 1, that's why you need to keep only the more than zero output
Is it possible to use .nlargest to get the two highest numbers in a set of number, but ensure that they are x amount of rows apart?
For examples, in the following code I would want to find the largest values but ensure that they are more than 5 values apart from each other. Is there an easy way to do this?
data = {'Pressure' : [100,112,114,120,123,420,1222,132,123,333,123,1230,132,1,23,13,13,13,123,13,123,3,222,2303,1233,1233,1,1,30,20,40,401,10,40,12,122,1,12,333],
}
If I understand the question correctly, you need to output the largest value, and then the next largest value that's at least X rows apart from it (based on the index).
First value is just data.Pressure.max(). Its index is data.Pressure.idxmax()
Second value is either before or after the first value's index:
max_before = df.Pressure.loc[:df.Pressure.idxmax() - X].max()
max_after = df.Pressure.loc[df.Pressure.idxmax() + X:].max()
second_value = max(max_before, max_after)
I have a data frame that I divide into 2 subsets based on the values of one column (positive or negative).
Let's say one columns contains the following values:
1
4
9
2
1
I basically want to create a column in each subset computing the difference between the one value and the one just before.So it would give here something like this:
n/a
3
5
-7
-1
Then I just want to shift the value one row above. I used the code below that in the end gives me the result but I always get this warning that I don't understand.
"A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead"
Can you please help?
df_left = df_s[df_s['Benchmark Sigma'] < 0]
df_right = df_s[df_s['Benchmark Sigma'] > 0]
df_left['Benchmark Sigma Interval'] = (df_left['Benchmark Sigma']-df_left['Benchmark Sigma'].shift(1))
df_right['Benchmark Sigma Interval'] = (df_right['Benchmark Sigma']-df_right['Benchmark Sigma'].shift(1))
df_left['Benchmark Sigma Interval']=df_left['Benchmark Sigma Interval'].shift(-1)
df_right['Benchmark Sigma Interval']=df_right['Benchmark Sigma Interval'].shift(-1)
This warning is just letting you know that you may be modifying a copy of a df and not the original. In my experience, it's often a false positive. If you find this to be the case for you as well, you can turn off the warning.
For more information, see this post, and especially the links #Garrett provided.