Modify Pandas dataFrame column values based on a condition - python

I want to modify only the values that are greater than 750 on a column of a pandas dataframe
datf.iloc[:,index][datf.iloc[:,index] > 750] = datf.iloc[:,index][datf.iloc[:,index] > 750]/10.7639
I think that the syntax is fine but i get a Pandas warning so i don't know if its correct this way:
<ipython-input-24-72eef50951a4>:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-
docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
What is the correct way to do this without getting this warning?

You can use the apply method to make your modification to your column using your custom function.
N.B you can also use the applymap for multiple columns
def my_func(x):
if x > 750:
x= #do your modification
else:
x
return x
new_dta= datf['col_name'].apply(my_func)

Related

Transforming negative to positive values in dataframe - Python [duplicate]

This question already has answers here:
Absolute value for column in Python
(2 answers)
Closed last year.
I am trying to transform the negative values under the 'Age' column in my dataset to positive values. So if Age = -15, then Age_new = +15, else if Age >=0, then Age_new remains as Age.
My orginal dataframe is called df_no_mv.
So I have the following code:
def tran_neg_to_pos(df):
if df['Age'] < 0:
return df['Age'] * (-1)
elif df['Age'] >0:
return df['Age']
#create Age_new
df_no_mv['Age_new']=df_no_mv.apply(tran_neg_to_pos,axis=1)
df_no_mv
I see that a new column Age_new is successfully created according to above logic. However I get this warning message:
C:\Users\Admin\anaconda3\lib\site-packages\ipykernel_launcher.py:20: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
How can I fix this?
Just use the built-in abs() function:
df_no_mv['Age_new'] = df_no_mv['Age'].abs()
This is just as per https://stackoverflow.com/a/29077254/1021819
REF: https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.DataFrame.abs.html
FWIW, python has abs() at the built-in level. numpy also has it.
Pandas series have .abs(), so:
df_no_mv['Age_new']=df_no_mv['Age'].abs()

Python Pandas dataframe modify column value based on function that cleans string value and assign to new column

I have a certain data to clean, it's some keys where the keys have six leading zeros that I want to get rid of, and if the keys are not ending with "ABC" or it's not ending with "DEFG", then I need to clean the currency code in the last 3 indexes. If the key doesn't start with leading zeros, then just return the key as it is.
To achieve this I wrote a function that deals with string as below:
def cleanAttainKey(dirtyAttainKey):
if dirtyAttainKey[0] != "0":
return dirtyAttainKey
else:
dirtyAttainKey = dirtyAttainKey.strip("0")
if dirtyAttainKey[-3:] != "ABC" and dirtyAttainKey[-3:] != "DEFG":
dirtyAttainKey = dirtyAttainKey[:-3]
cleanAttainKey = dirtyAttainKey
return cleanAttainKey
Now I build a dummy data frame to test it but it's reporting errors:
data frame
df = pd.DataFrame({'dirtyKey':["00000012345ABC","0000012345DEFG","0000023456DEFGUSD"],'amount':[100,101,102]},
columns=["dirtyKey","amount"])
I need to get a new column called "cleanAttainKey" in the df, then modify each value in the "dirtyKey" using the "cleanAttainKey" function, then assign the cleaned key to the new column "cleanAttainKey", however it seems pandas doesn't support this type of modification.
# add a new column in df called cleanAttainKey
df['cleanAttainKey'] = ""
# I want to clean the keys and get into the new column of cleanAttainKey
dirtyAttainKeyList = df['dirtyKey'].tolist()
for i in range(len(df['cleanAttainKey'])):
df['cleanAttainKey'][i] = cleanAttainKey(vpAttainKeyList[i])
I am getting the below error message:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
The result should be the same as the df2 below:
df2 = pd.DataFrame({'dirtyKey':["00000012345ABC","0000012345DEFG","0000023456DEFGUSD"],'amount':[100,101,102],
'cleanAttainKey':["12345ABC","12345DEFG","23456DEFG"]},
columns=["dirtyKey","cleanAttainKey","amount"])
df2
Is there any better way to modify the dirty keys and get a new column with the clean keys in Pandas?
Thanks
Here is the culprit:
df['cleanAttainKey'][i] = cleanAttainKey(vpAttainKeyList[i])
When you use extract of the dataframe, Pandas reserves the ability to choose to make a copy or a view. It does not matter if you are just reading the data, but it means that you should never modify it.
The idiomatic way is to use loc (or iloc or [i]at):
df.loc[i, 'cleanAttainKey'] = cleanAttainKey(vpAttainKeyList[i])
(above assumes a natural range index...)

Get a KeyError in Pandas

I am trying to call a function from a different module as below:
module1 - func1: returns a dataframe
module1 - func2(p_df_in_fromfunc1)
function 2:
for i in range(0,len(p_df_in_fromfunc1):
# Trying to retrieve row values of individual columns and assign to variables
v_tmp = p_df_in_fromfunc1.loc[i,"Col1"]
When trying to run the above code, I get the error:
KeyError 0
Could the issue be because I don't have a zero numbered row?
Without knowing much of you're code, well my guess is, for positional indexing try using iloc instead of loc, if you're interesed in going index-wise.
Something like:
v_tmp = p_df_in_fromfunc1.iloc[i,"Col1"]
You may have a missed to close the quote in the loc function after Col1 ?
v_tmp = p_df_in_fromfunc1.loc[i,"Col1"]
For retrieving a row for specific columns do:
columns = ['Col1', 'Col2']
df[columns].iloc[index]
If you only want one column, you can simplify it to: df['Col1'].iloc(index)
As per your comment, you do not need to reset the index, you can iterate over the values of your index array: df.index

Is this Pandas 'SettingWithCopyWarning' a False Positive?

I have a dataframe that I subset to produce a new dataframe:
temp_df = initial_df.loc[initial_df['col'] == val]
And then I add columns to this dataframe, setting all values to np.nan:
temp_df[new_col] = np.nan
This triggers a 'SettingWithCopyWarning', as it should, and tells me:
Try using .loc[row_indexer,col_indexer] = value instead
However, when I do that, like so:
temp_df.loc[:,new_col] = np.nan
I still get the same warning. In fact, I get one instance of the warning using the 1st method, but get two instances of the warning using .loc:
Is this warning incorrect here? I don't care that the new column I am adding doesn't make it back to the initial_df. Is it a false positive? And why are there two warnings?

Proper way to utilize .loc in python's pandas

When trying to change a column of numbers from object to float dtypes using pandas dataframes, I receive the following warning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
Now, the code runs just fine, but what would be the proper and intended way to avoid this warning and still achieve the goal of:
df2[col] = df2[col].astype('float')
Let it be noted that df2 is a subset of df1 using a condition similar to:
df2 = df1[df1[some col] == value]
Use the copy method. Instead of:
df2 = df1[df1[some col] == value]
Just write:
df2 = df1[df1[some col] == value].copy()
Initially, df2 is a slice of df1 and not a new dataframe. Which is why, when you try to modify it, python raises an error.

Categories

Resources