I have the following class and the print statement returns an empty dataframe even though I'm sure my get_percent_change method is returning the values. I even tried just assigning test to three. Still, empty dataframe.
Is it something to do with the fact it's inside a class? Inside the init method? I tried using self.metrics too.
class options_metrics:
def __init__(self, calls, puts):
self.calls, self.puts = calls, puts
self.calls = self.calls.drop(["Type"])
self.puts = self.puts.drop(["Type"])
metrics = pd.DataFrame()
metrics['Perc_Chg_Vol_Call'], metrics['Perc_Chg_Open_Int_Call'] = self.get_percent_change(self.calls)
metrics['Test'] = 3
print(metrics)
input()
def get_percent_change(self, option_df):
perc_changes = option_df.pct_change(axis=1)
print(perc_changes)
return (perc_changes.ix['Vol',1], perc_changes.ix['Open_Int',1])
Here is the output:
Empty DataFrame
Columns: [Perc_Chg_Vol_Call, Perc_Chg_Open_Int_Call, Test]
Index: []
Switching the DataFrame to a Series worked.
Related
I'm sure I'm missing something in how classes work here, but basically this is my class:
import pandas as pd
import numpy as np
import scipy
#example DF with OHLC columns and 100 rows
gold = pd.DataFrame({'Open':[i for i in range(100)],'Close':[i for i in range(100)],'High':[i for i in range(100)],'Low':[i for i in range(100)]})
class Backtest:
def __init__(self, ticker, df):
self.ticker = ticker
self.df = df
self.levels = pivot_points(self.df)
def pivot_points(self,df,period=30):
highs = scipy.signal.argrelmax(df.High.values,order=period)
lows = scipy.signal.argrelmin(df.Low.values,order=period)
return list(df.High[highs[0]]) + list(df.Low[lows[0]])
inst = Backtest('gold',gold) #gold is a Pandas Dataframe with Open High Low Close columns and data
inst.levels # This give me the whole dataframe (inst.df) instead of the expected output of the pivot_point function (a list of integers)
The problem is inst.levels returns the whole DataFrame instead of the return value of the function pivot_points (which is supposed to be a list of integers)
When I called the pivot_points function on the same DataFrame outside this class I got the list I expected
I expected to get the result of the pivot_points() function after assigning it to self.levels inside the init but instead I got the entire DataFrame
You would have to address pivot_points() as self.pivot_points()
And there is no need to add period as an argument if you are not changing it, if you are, its okay there.
I'm not sure if this helps, but here are some tips about your class:
class Backtest:
def __init__(self, ticker, df):
self.ticker = ticker
self.df = df
# no need to define a instance variable here, you can access the method directly
# self.levels = pivot_points(self.df)
def pivot_points(self):
period = 30
# period is a local variable to pivot_points so I can access it directly
print(f'period inside Backtest.pivot_points: {period}')
# df is an instance variable and can be accessed in any method of Backtest after it is instantiated
print(f'self.df inside Backtest.pivot_points(): {self.df}')
# to get any values out of pivot_points we return some calcualtions
return 1 + 1
# if you do need an attribute like level to access it by inst.level you could create a property
#property
def level(self):
return self.pivot_points()
gold = 'some data'
inst = Backtest('gold', gold) # gold is a Pandas Dataframe with Open High Low Close columns and data
print(f'inst.pivot_points() outside the class: {inst.pivot_points()}')
print(f'inst.level outside the class: {inst.level}')
This would be the result:
period inside Backtest.pivot_points: 30
self.df inside Backtest.pivot_points(): some data
inst.pivot_points() outside the class: 2
period inside Backtest.pivot_points: 30
self.df inside Backtest.pivot_points(): some data
inst.level outside the class: 2
Thanks to the commenter Henry Ecker I found that I had the function by the same name defined elsewhere in the file where the output is the df. After changing that my original code is working as expected
I want my code to:
read data from a CSV and make a dataframe: "source_df"
see if the dataframe contains any columns specified in a list:
"possible_columns"
call a unique function to replace the values in each column whose header is found in the "possible_columns" the list, then insert the modified values in a new dataframe: "destination_df"
Here it is:
import pandas as pd
#creates source_df
file = "yes-no-true-false.csv"
data = pd.read_csv(file)
source_df = pd.DataFrame(data)
#creates destination_df
blanklist = []
destination_df = pd.DataFrame(blanklist)
#create the column header lists for comparison in the while loop
columns = source_df.head(0)
possible_columns = ['yes/no','true/false']
#establish the functions list and define the functions to replace column values
fix_functions_list = ['yes_no_fix()','true_false_fix()']
def yes_no_fix():
destination_df['yes/no'] = destination_df['yes/no fixed'].replace("No","0").replace("Yes","1")
def true_false_fix():
destination_df['true/false'] = destination_df['true/false fixed'].replace('False', '1').replace('True', '0')
'''use the counter to call a unique function from the function list to replace the values in each column whose header is found in the "possible_columns" the list, insert the modified values in "destination_df, then advance the counter'''
counter = 0
while counter < len(possible_columns):
if possible_columns[counter] in columns:
destination_df.insert(counter, possible_columns[counter], source_df[possible_columns[counter]])
fix_functions_list[counter]
counter = counter + 1
#see if it works
print(destination_df.head(10))
When I print(destination_df), I see the unmodified column values from source_df. When I call the functions independently they work, which makes me think something is going wrong in my while loop.
Your issue is that you are trying to call a function that is stored in a list as a string.
fix_functions_list[cnt]
This will not actually run the function just access the string value.
I would try and find another way to run these functions.
def yes_no_fix():
destination_df['yes/no'] = destination_df['yes/no fixed'].replace("No","0").replace("Yes","1")
def true_false_fix():
destination_df['true/false'] = destination_df['true/false fixed'].replace('False', '1').replace('True', '0')
fix_functions_list = {0:yes_no_fix,1:true_false_fix}
and change the function calling to like below
fix_functions_list[counter]()
#creates source_df
file = "yes-no-true-false.csv"
data = pd.read_csv(file)
source_df = pd.DataFrame(data)
possible_columns = ['yes/no','true/false']
mapping_dict={'yes/no':{"No":"0","Yes":"1"} ,'true/false': {'False':'1','True': '0'}
old_columns=[if column not in possible_columns for column in source_df.columns]
existed_columns=[if column in possible_columns for column in source_df.columns]
new_df=source_df[existed_columns]
for column in new_df.columns:
new_df[column].map(mapping_dict[column])
new_df[old_columns]=source_df[old_columns]
I am sending an ajax GET request to a flask server (http://localhost:5000/req/?q=139,2,10,60,5,1462,7,5,6,9,17,78) in order to retrieve some values and assign them to a Dataframe. Doing it manually, it works fine:
df = pd.DataFrame(data=[[139,2,10,60,5,1462,7,5,6,9,17,78]],columns=['col1','col2','col3','col4','col5','col6','col7','col8','col9','col10','col11','col12'])
but i need the numbers to come from request.args via ajax and then be based in the Dataframe as an array.
#app.route('/req/', methods=['GET'])
def foo():
args = dict(request.args.to_dict())
t = request.args["q"]
return getResults(t), 200
And the getResults() would be something like:
def getResults(name):
df = pd.DataFrame(data=[[name]], columns=['col1','col2','col3','col4','col5','col6','col7','col8','col9','col10','col11','col12'])
""""
but of course this doesn't work. Gives an error: ValueError: 12 columns passed, passed data had 1 columns
How can i do this ? I've tried splitting the string, try to convert to an array..nothing worked.
The args is resolved as a string, so after t = request.args["q"], t is "139,2,10,60,5,1462,7,5,6,9,17,78", you need a list of int
#app.route('/req') # GET only is default method
def foo():
t = request.args["q"]
t = [int(val) for val in t.split(",")]
return getResults(t) # 200 is the default status code
And
def getResults(name):
df = pd.DataFrame(data=[name], # no extra []
Also prefer /req (that allows both with and without trailing slash) rather than /req/ that accept only one , refer to this for detail
I am writing a function that will serve as filter for rows that I wanted to use.
The sample data frame is as follow:
df = pd.DataFrame()
df ['Xstart'] = [1,2.5,3,4,5]
df ['Xend'] = [6,8,9,10,12]
df ['Ystart'] = [0,1,2,3,4]
df ['Yend'] = [6,8,9,10,12]
df ['GW'] = [1,1,2,3,4]
def filter(data,Game_week):
pass_data = data [(data['GW'] == Game_week)]
when I recall the function filter as follow, I got an error.
df1 = filter(df,1)
The error message is
AttributeError: 'NoneType' object has no attribute 'head'
but when I use manual filter, it works.
pass_data = df [(df['GW'] == [1])]
This is my first issue.
My second issue is that I want to filter the rows with multiple GW (1,2,3) etc.
For that I can manually do it as follow:
pass_data = df [(df['GW'] == [1])|(df['GW'] == [2])|(df['GW'] == [3])]
if I want to use in function input as list [1,2,3]
how can I write it in function such that I can input a range of 1 to 3?
Could anyone please advise?
Thanks,
Zep
Use isin for pass list of values instead scalar, also filter is existing function in python, so better is change function name:
def filter_vals(data,Game_week):
return data[data['GW'].isin(Game_week)]
df1 = filter_vals(df,range(1,4))
Because you don't return in the function, so it will be None, not the desired dataframe, so do (note that also no need parenthesis inside the data[...]):
def filter(data,Game_week):
return data[data['GW'] == Game_week]
Also, isin may well be better:
def filter(data,Game_week):
return data[data['GW'].isin(Game_week)]
Use return to return data from the function for the first part. For the second, use -
def filter(data,Game_week):
return data[data['GW'].isin(Game_week)]
Now apply the filter function -
df1 = filter(df,[1,2])
class Dataframe: #Recommended to instatiate your dataframe with your csv name.
"""
otg_merge = Dataframe("/Users/zachary/Desktop/otg_merge.csv") #instaiate as a pandas dataframe
"""
def __init__(self, filepath, filename = None):
pd = __import__('pandas') #import pandas when the class is instatiated
self.filepath = filepath
self.filename = filename
def df(self): #it makes the DataFrame
df = pd.read_csv(self.filepath, encoding = "cp949", index_col= 0) #index col is not included
return df
def shape(self): #it returns the Dimension of DataFrame
shape = list(df.shape)
return shape
def head(self): #it reutrns the Head of Dataframe
primer = pd.DataFrame.head(df)
del primer["Unnamed: 0"]
return primer
def cust_types(self): #it returns the list of cust_type included in .csv
cust_type = []
for i in range(0, shape[0]):
if df.at[i, "cust_type"] not in cust_type: #if it's new..
cust_type.append(df.at[i, "cust_type"]) #append it as a new list element
return cust_type
I am doing some wrapping pandas functions wrapping for whom doesn't necessarily need to know the pandas.
If you see the code, at the third def, shape returns shape as a list of such as [11000, 134] as a xdim and ydim.
Now I'd like to use the shape again at the last def cust_types, however,, it returns the shape is not defined.
How can I share the variable "share" across defs in the same class?
intersetingly, I didn't do nth, but the df is shared from second df to thrid shape without error
First prepend "self." in all your attributes which you will know after trying out some python oops tutorials. Another issue which you might miss is
def df(self):
df = pd.read_csv(self.filepath, encoding = "cp949", index_col= 0)
return df
Here, the method name and the variable name takes the same name which is fine, if the variable name is not an instance attribute as it is not. But in case if you prepend "self." and make it as an instance attribute, your instance attribute will be self.df and it can't be a function after the first function call self.df().