If conditions Pandas - python

all my values are int in my data frame and I am trying to do this if condition
if the value is greater than 1, than multiply by 1 else multiple the value with -1 and add to a new column, but it gives error
'>' not supported between instances of 'str' and 'int'
below is the code I wrote
Cfile["Equ"] = [i*1 if i>1 else i*-1 for i in Cfile["Net Salary"]]

s = df['Net Salary']
df['Equ'] = np.where(s.gt(1), s, s.mul(-1))

Use from this code:
df['new'] = (df.net>1)*df.net - (1≥df.net)*df.net

Related

AttributeError: 'DataFrame' object has no attribute 'str' while trying to fix my dataframe

I am trying to take all columns with a '%' and removing the '%,' turning the string into a float, and dividing by 100, then turning it into a decimal.
I created a list of all the columns that I want to do that to with:
percentages = (df.filter(like='%').columns)
perclist = percentages.tolist()
Then I run this code but it won't work:
df[perclist] = df[perclist].str.rstrip('%').astype('float') / 100.0
str.rstrip is for Series not DataFrame.
df[perclist] = df[perclist].replace('%$', '', regex=True).astype('float') / 100.0
Tip: avoid to create a useless subset of your dataframe:
Replace:
percentages = (df.filter(like='%').columns)
perclist = percentages.tolist()
By:
perclist = df.columns[df.columns.str.contains('%')]

cant change column name from 'NaN' to smth else

i have tried to string the column names and then change them - no succces, it left it NaN
data.rename(columns=str).rename(columns={'NaN':'Tip Analiza','NaN':'Limite' }, inplace=True)
i tried to use the in function to replace NaN- no succes - it gave an error,
TypeError: argument of type 'float' is not iterable
data.columns = pd.Series([np.nan if 'Unnamed:' in x else x for x in data.columns.values]).ffill().values.flatten()
what should i try ?
Try:
data.columns=map(str, data)
# in case of unique column names
data=data.replace({"col1": "rnm1", "col2": "rnm2"})
# otherwise ignore first line, and just do
data.columns=["rnm1", "rnm2"]

'>=' not supported between instances of 'str' and 'datetime.datetime' - used strptime as well

I need to select the data frame using my train period shown below but it ran into error always.
train_period = [
['1/1/2018', '10/30/2018']]
train_period = [[datetime.strptime(y,'%m/%d/%Y') for y in x] for x in train_period]
for tp in train_period:
print()
#print('Begin:%d End:%d' % (tp[0], tp[1]))
print()
df_train_period = df_sku[
(df_sku['To_Date'] >= tp[begin]) & (df_sku['To_Date'] <= tp[end])]
Your 'To_Date' column needs to be of dtype np.datetime in order to do datetime string filtering, so firstly convert first:
df_sku['To_Date'] = pd.to_datetime(df_sku, format='%m/%d/%Y')
then your code will work. You can always check the dtype by calling df_sku['To_Date'].dtype

pandas function return multiple values error - TypeError: unhashable type: 'list'

I have written a pandas function and it runs fine (the second last line of my code). When i try to assign my function's output to columns in dataframes i get an error TypeError: unhashable type: 'list'
i posted a something similar and i am using method shown in the answer of that question in the below function. But still it fails :(
import pandas as pd
import numpy as np
def benford_function(value):
if value == '':
return []
if ("." in value):
before_decimal=value.split(".")[0]
if len(before_decimal)==0:
bd_first="0"
bd_second="0"
if len(before_decimal)>1:
before_decimal=before_decimal[:2]
bd_first=before_decimal[0]
bd_second=before_decimal[1]
elif len(before_decimal)==1:
bd_first="0"
bd_second=before_decimal[0]
after_decimal=value.split(".")[1]
if len(after_decimal)>1:
ad_first=after_decimal[0]
ad_second=after_decimal[1]
elif len(after_decimal)==1:
ad_first=after_decimal[0]
ad_second="0"
else:
ad_first="0"
ad_second="0"
else:
ad_first="0"
ad_second="0"
if len(value)>1:
bd_first=value[0]
bd_second=value[1]
else:
bd_first="0"
bd_second=value[0]
return pd.Series([bd_first,bd_second,ad_first,ad_second])
df = pd.DataFrame(data = {'a': ["123"]})
df.apply(lambda row: benford_function(row['a']), axis=1)
df[['bd_first'],['bd_second'],['ad_first'],['ad_second']]= df.apply(lambda row: benford_function(row['a']), axis=1)
Change:
df[['bd_first'],['bd_second'],['ad_first'],['ad_second']] = ...
to
df[['bd_first', 'bd_second', 'ad_first', 'ad_second']] = ...
This will fix your type-error, since index elements must be hashable. The way you tried to index into the Dataframe by passing a tuple of single-element lists will interpret each of those single element lists as indices

ValueError: Unknown format code 'f' for object of type 'str' - why do I get this the second time but not the first time?

Below is my code. I'm trying to print both the Top 250 lines and the Bottom 250 lines, and strategized to make a copy of my main dataframe, re-sort, and then format the percentages into a string format with a "%" sign. Unfortunately I'm getting a ValueError: Unknown format code 'f' for object of type 'str' on the line with upreportdataframe, but not with downreportdataframe. Why would this be?
Does this have something to do with how the dataframe is copied?
upreportdataframe.sort(['dailypctchange'], ascending = False, inplace=True)
downreportdataframe = upreportdataframe
downreportdataframe.is_copy = False
downreportdataframe.sort(['dailypctchange'], ascending = True, inplace = True)
downreportdataframe['dailypctchange'] = pd.Series(
["{0:.2f}%".format(val * 100)
for val in downreportdataframe['dailypctchange']],
downreportdataframe.index)
upreportdataframe['dailypctchange'] = pd.Series(
["{0:.2f}%".format(val * 100)
for val in upreportdataframe['dailypctchange']],
upreportdataframe.index)
downreportdataframe is not a copy of upreportdataframe; it is instead just another reference to the same object.
If you wanted a copy, use the dataframe.copy() method:
downreportdataframe = upreportdataframe.copy()

Categories

Resources