Python pd using a variable with column name in groupby dot notation

Python pd using a variable with column name in groupby dot notation - python

I am trying to use a list that holds the column names for my groupby notation. My end goal is to loop through multiple columns and run the calculation without having to re-write the same line multiple times. Is this possible?
a_list = list(['','BTC_','ETH_'])
a_variable = ('{}ClosePrice'.format(a_list[0]))
proccessing_data['RSI'] = proccessing_data.groupby('Symbol').**a_variable**.transform(lambda x: talib.RSI(x, timeperiod=14))
this is the error I currently get because it thinks I want the column 'a_variable' which doesn't exist.
AttributeError: 'DataFrameGroupBy' object has no attribute 'a_variable'

Apparently this notation below works:
proccessing_data['RSI'] = proccessing_data.groupby('Symbol')[('{}ClosePrice'.format(a_list[0]))].transform(lambda x: talib.RSI(x, timeperiod=14))

Related

'str' object has no attribute 'loc' when using dataframe string name

I have dataframe's name like this df_1,df_2, df_3... df_10, now I am creating a loop from 1 to 10 and each loop refers to different dataframe which name is df+ loop name
name=['1','2','3','4','5','6','7','8','9','10']
for na in name:
data=f'df_{na}'.iloc[:,0]
if I do like above, I got an error of AttributeError: 'str' object has no attribute 'loc'
so I need to convert the string into dataframe's name
how to do it?

Based on our chat, you're trying to make 100 copies of a single dataframe. Since making variable variables is bad, use a dict instead:
names = ["df_" + int(i) for i in range(1, 101)]
dataframes = {name: df.copy() for name in names} # df is the existing dataframe
for key, dataframe in dataframes.items():
temp_data = dataframe.iloc(:, 0)
do_something(temp_data)

Python Pandas - 'DataFrame' object has no attribute 'str' - .str.replace error

I am trying to replace "," by "" for 80 columns in a panda dataframe.
I have create a list of this headers to iterate:
headers = ['h1', 'h2', 'h3'... 'h80']
and then I am using a list of headers to replace multiple columns string value as bellow:
dataFrame[headers] = dataFrame[headers].str.replace(',','')
Which gave me this error : AttributeError: 'DataFrame' object has no attribute 'str'
When I try the same on only one header it works well, and I need to use the "str.replace" because the only "replace" method does sadly not replace the ",".
Thank you

Using df.apply
pd.Series.str.replace is a series method not for data frames. You can use apply on each row/column series instead.
dataFrame[headers] = dataFrame[headers].apply(lambda x: x.str.replace(',',''))
Using df.applymap
Or, you can use applymap and treat each cell as a string and use replace directly on them -
dataFrame[headers] = dataFrame[headers].applymap(lambda x: x.replace(',',''))
Using df.replace
You can also use df.replace which is a method available to replace values in df directly across all columns selected. But, for this purpose you will have to set regex=True
dataFrame[headers] = dataFrame[headers].replace(',','',regex=True)

Unpivot dataframe in Python - 'builtin_function_or_method' object has no attribute 'insert'

I unpivoted a dataframe:
Like this:
full_unpivot = full.unstack.reset_index(name='Value')
full_unpivot.rename(columns={'level_0': 'Attribute', 'level_1': 'Scenario'}, inplace=True)
Now I wanted to drop decimals in values and add a column filled with 1 or -1 depending on the sign of the 'value' column.
However when I try to do:
full_unpivot = full_unpivot.applymap(np.int64)
or
list='Value'
full_unpivot[list] = full_unpivot[list].astype(int)
or
full_unpivot = full_unpivot.insert(4,'sign',1)
I get an error:
'builtin_function_or_method' object has no attribute 'insert'
Does anyone know what could be the problem.. ?
Thanks in advance!

I believe you need numpy.sign:
full_unpivot['sign'] = np.sign(full_unpivot['value'])
Problem in your code should be used variable list, what is code word.
Solution should be reassign to builtins:
list = builtins.list
Also if want use insert to second column called sign filled values of function np.sign use:
full_unpivot.insert(1,'sign',np.sign(full_unpivot['value']))

loop over names of several pandas DataFrames

I have a couple of DataFrames from different files, which are named for example df001, df002 and so on.
Now I want to loop over those DataFrames to execute similar tasks. But I can't figure out how to address them.
This failed (AttributeError: 'str' object has no attribute 'iloc'):
names = ['df001', 'df002']
for name in names:
name.iloc[1,1]

Can you try this?
names = [df001, df002]
for name in names:
name.iloc[1,1]

If you use the string name for purposes other than looping, you can always store the dataframes in a dictionary:
d = {'df001': df001, 'df002': df002}
for name in d:
d[name].iloc[1, 1]

Getting "'int' object is not subscriptable" error while apply a method to a pandas data series

I have a stocks_df data frame that looks like that one in the picture. When I apply the lambda as in the picture it doesn't throw any errors.
However when I do
list = pandas.Series([1,2,3,4,5])
new_list = list.apply(lambda x: x/x[0])
It gives me "'int' object is not subscriptable" error. Is there any difference between the two? What am I doing wrong here?

For a series, apply operates element wise. To reference the first element of the series, you need to use list[0] instead of x[0]:
new_list = list.apply(lambda x: x/list[0])
For a DataFrame, apply by default operates column wise, that's why x/x[0] works.
To use the same syntax, you could use:
new_list = list.to_frame().apply(lambda x: x/x[0])
By the way, using built-in type name (list) as variable name is not a good idea.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python pd using a variable with column name in groupby dot notation - python

Apparently this notation below works: proccessing_data['RSI'] = proccessing_data.groupby('Symbol')[('{}ClosePrice'.format(a_list[0]))].transform(lambda x: talib.RSI(x, timeperiod=14))

Related

'str' object has no attribute 'loc' when using dataframe string name

Python Pandas - 'DataFrame' object has no attribute 'str' - .str.replace error

Unpivot dataframe in Python - 'builtin_function_or_method' object has no attribute 'insert'

loop over names of several pandas DataFrames

Getting "'int' object is not subscriptable" error while apply a method to a pandas data series

Categories

Resources