Declaring a pandas series with a user-defined function - python

I am trying to simplify some code with a function. The intent is to use the function to declare blank series to populate later.
The code currently declares each series on a separate line like this:
series1=pd.Series()
series2=pd.Series()
This approach works well but makes the code lengthy with many series.
I would like to do the following:
Create a list of blank objects to use in the function with the names series1, series2, etc. or with a more descriptive name for each
series_list=[series1,series2]
Declare function
def series(name):
name=pd.Series()
return name
Call function with input
for i in series_list:
series(i)
However, when I try to declare the series_list, it returns the NameError: [variable] is not defined. Is there a way to populate the series_list with empty objects(i.e. no data but with the names series1, series2, ... series1000)?

Here's how you instantiate the Series objects iteratively, then use the generated list to assign to known variables
def assign_series(n):
series_list = []
#series_dict = {}
num_of_series = n
for i in range(num_of_series):
series_list.append(pd.Series())
#or if you want to call them by name
#series_dict['series'+str(i)] = pd.Series()
return series_list
corporate_securities, agency_securities, unrealized_gainloss = assign_series(3)
corporate_securities
Series([], dtype: float64)

Related

Dynamically adding functions to array columns

I'm trying to dynamically add function calls to fill in array columns. I will be accessing the array millions of times so it needs to be quick.
I'm thinking to add the call of a function into a dictionary by using a string variable
numpy_array[row,column] = dict[key[index containing function call]]
The full scope of the code I'm working with is too large to post here is an equivalent simplistic example I've tried.
def hello(input):
return input
dict1 = {}
#another function returns the name and ID values
name = 'hello'
ID = 0
dict1["hi"] = globals()[name](ID)
print (dict1)
but it literally activates the function when using
globals()[name](ID)
instead of copy pasting hello(0) as a variable into the dictionary.
I'm a bit out of my depth here.
What is the proper way to implement this?
Is there a more efficient way to do this than reading into a dictionary on every call of
numpy_array[row,column] = dict[key[index containing function call]]
as I will be accessing and updating it millions of times.
I don't know if the dictionary is called every time the array is written to or if the location of the column is already saved into cache.
Would appreciate the help.
edit
Ultimately what I'm trying to do is initialize some arrays, dictionaries, and values with a function
def initialize(*args):
create arrays and dictionaries
assign values to global and local variables, arrays, dictionaries
Each time the initialize() function is used it creates a new set of variables (names, values, ect) that direct to a different function with a different set of variables.
I have an numpy array which I want to store information from the function and associated values created from the initialize() function.
So in other words, in the above example hello(0), the name of the function, it's value, and some other things as set up within initialize()
What I'm trying to do is add the function with these settings to the numpy array as a new column before I run the main program.
So as another example. If I was setting up hello() (and hello() was a complex function) and when I used initialize() it might give me a value of 1 for hello(1).
Then if I use initialize again it might give me a value of 2 for hello(2).
If I used it one more time it might give the value 0 for the function goodbye(0).
So in this scenaro let's say I have an array
array[row,0] = stuff()
array[row,1] = things()
array[row,2] = more_stuff()
array[row,3] = more_things()
Now I want it to look like
array[row,0] = stuff()
array[row,1] = things()
array[row,2] = more_stuff()
array[row,3] = more_things()
array[row,4] = hello(1)
array[row,5] = hello(2)
array[row,6] = goodbye(0)
As a third, example.
def function1():
do something
def function2():
do something
def function3():
do something
numpy_array(size)
initialize():
do some stuff
then add function1(23) to the next column in numpy_array
initialize():
do some stuff
then add function2(5) to the next column in numpy_array
initialize():
do some stuff
then add function3(50) to the next column in numpy_array
So as you can see. I need to permanently append new columns to the array and feed the new columns with the function/value as directed by the initialize() function without manual intervention.
So fundamentally I need to figure out how to assign syntax to an array column based upon a string value without activating the syntax on assignment.
edit #2
I guess my explanations weren't clear enough.
Here is another way to look at it.
I'm trying to dynamically assign functions to an additional column in a numpy array based upon the output of a function.
The functions added to the array column will be used to fill the array millions of times with data.
The functions added to the array can be various different function with various different input values and the amount of functions added can vary.
I've tried assigning the functions to a dictionary using exec(), eval(), and globals() but when using these during assignment it just instantly activates the functions instead of assigning them.
numpy_array = np.array((1,5))
def some_function():
do some stuff
return ('other_function(15)')
#somehow add 'other_function(15)' to the array column.
numpy_array([1,6] = other_function(15)
The functions returned by some_function() may or may not exist each time the program is run so the functions added to the array are also dynamic.
I'm not sure this is what the OP is after, but here is a way to make an indirection of functions by name:
def make_fun_dict():
magic = 17
def foo(x):
return x + magic
def bar(x):
return 2 * x + 1
def hello(x):
return x**2
return {k: f for k, f in locals().items() if hasattr(f, '__name__')}
mydict = make_fun_dict()
>>> mydict
{'foo': <function __main__.make_fun_dict.<locals>.foo(x)>,
'bar': <function __main__.make_fun_dict.<locals>.bar(x)>,
'hello': <function __main__.make_fun_dict.<locals>.hello(x)>}
>>> mydict['foo'](0)
17
Example usage:
x = np.arange(5, dtype=int)
names = ['foo', 'bar', 'hello', 'foo', 'hello']
>>> np.array([mydict[name](v) for name, v in zip(names, x)])
array([17, 3, 4, 20, 16])

Specify helper function that's used by another helper function inside a class

Update to question:
I want to include a helper function in my class that uses another helper function that's only used within one of the methods of the class. Using #staticmethod and self.func_name is what I'd do if I had one staticmethod.. However, if I want to call another staticmethod from a staticmethod and specify that using self.helper_func, I get an 'name 'self' is not defined' error.
To give you some context, the reason I'm doing this is because in my actual use case, I'm working with a list of grouped dataframes. Then within that outer apply statement, I then iterate through sets of specific columns in each grouped dataframe and apply the actual function. So the outer helper function is just an apply over the groups in the grouped dataframes, and it then calls the inner helper that performs manipulations on groups of columns.
import pandas as pd
import numpy as np
class DataManipulation():
def __init__(self, data):
self.data = data
#staticmethod
def helper_func(const):
return const
#staticmethod
def add_constant(var):
res = var+self.helper_func(5)
return res
def manipulate_data(self):
res = self.data.apply(add_constant)
return res
test_df = pd.DataFrame({'a': np.arange(4), 'b': np.arange(4)})
data_manip = DataManipulation(test_df)
data_manip.manipulate_data()
how can static #staticmethod access self
Static method can be called without creating an object or instance.
So what will be self when staticmethod is called before creating any object?
PS. Well that's my opinion, I may be wrong (I am new to python, that's how it was in C / C++ / Java).
Maybe you need to call DataManipulation.helper_func(5) instead of self.helper_func(5).

Call many python functions from a module by looping through a list of function names and making them variables

I have three similar functions in tld_list.py. I am working out of mainBase.py file.
I am trying to create a variable string which will call the appropriate function by looping through the list of all functions. My code reads from a list of function names, iterates through the list and running the function on each iteration. Each function returns 10 pieces of information from separate websites
I have tried 2 variations annotated as Option A and Option B below
# This is mainBase.py
import tld_list # I use this in conjunction with Option A
from tld_list import * # I use this with Option B
functionList = ["functionA", "functionB", "functionC"]
tldIterator = 0
while tldIterator < len(functionList):
# This will determine which function is called first
# In the first case, the function is functionA
currentFunction = str(functionList[tldIterator])
Option A
currentFunction = "tld_list." + currentFunction
websiteName = currentFunction(x, y)
print(websiteName[1]
print(websiteName[2]
...
print(websiteName[10]
Option B
websiteName = currentFunction(x, y)
print(websiteName[1]
print(websiteName[2]
...
print(websiteName[10]
Even though it is not seen, I continue to loop through the iteration by ending each loop with tldIterator += 1
Both options fail for the same reason stating TypeError: 'str' object is not callable
I am wondering what I am doing wrong, or if it is even possible to call a function in a loop with a variable
You have the function names but what you really want are the function objects bound to those names in tld_list. Since function names are attributes of the module, getattr does the job. Also, it seems like list iteration rather than keeping track of your own tldIterator index would suffice.
import tld_list
function_names = ["functionA", "functionB", "functionC"]
functions = [getattr(tld_list, name) for name in function_names]
for fctn in functions:
website_name = fctn(x,y)
You can create a dictionary to provide a name to function conversion:
def funcA(...): pass
def funcB(...): pass
def funcC(...): pass
func_find = {"Huey": funcA, "Dewey": funcB, "Louie": FuncC}
Then you can call them, e.g.
result = func_find["Huey"](...)
You should avoid this type of code. Try using if's, or references instead. But you can try:
websiteName = exec('{}(x, y)'.format(currentFunction))

Create variable from string in python

I have read a lot of posts about this subject but I haven't found an answer to my problem.
Wants to write a function that allows you to create DF with different names and columns.
So I try this:
def createDT(name,c1,c2,c3):
name = pd.DataFrame(columns = [c1,c2,c3])
print(type(name))
return name
createDT(DT,"col1","col2","col3")
and I receive:
NameError: name 'DT' is not defined
when I change the "name" variable to String I receives the message:
<class 'pandas.core.frame.DataFrame'>
and the table below
Which confirms the creation of DF, but if I want to call the DT variable I get a
NameError: name 'DT' is not defined
I know I can do it this way
DT2 = createDT(DT,"col1","col2","col3")
But then I have to name the variables again and I would like to avoid that and
I want it to be written as a function. Any ideas on how to solve it?
It's not that easy unfortunately:
def createDT(name,c1,c2,c3):
globals()[name] = pd.DataFrame(columns = [c1,c2,c3])
print(type(globals()[name]))
return globals()[name]
createDT("DT","col1","col2","col3")
But a preferred and efficient solution would be:
def createDT(name,c1,c2,c3):
return pd.DataFrame(columns = [c1,c2,c3])
createDT("col1","col2","col3")
Wouldn't simple
def createDT(c1,c2,c3):
temp = pd.DataFrame(columns = [c1,c2,c3])
print(type(temp))
return temp
DT = createDT("col1","col2","col3")
work?
In Python you (almost always) don't use function parameters as return value. And you don't need to worry about copying since in Python everything is (kind of like) pointers.

How do I call a function within my class?

I'm trying to practice classes in python by trying to make a class which normalizes the currency to all GBP using an exchnage rate table. I'm not sure why i'm getting the below error. CurrencyCombo is a column name in the exchnagerate table which i pass into init as 'CurrencyPairCol'
rateList = ['EURGBP','USDGBP', 'SEKGBP']
Month = ['2018-11', '2018-12', '2019-01', '2019-02', '2019-03']
class CurrencyNormalize():
def __init__(self,filename,rateList,monthList,orders_filename,CurrencyPair):
self.ExchangeRate = pd.read_csv(filename)
self.OrdersTrain = pd.read_csv(orders_filename)
self.rateList=rateList
self.monthList=monthList
self.currencyPairCol=self.ExchangeRate[CurrencyPair]
def remove_char(self):
return (self.replace('GBP', ''))
def normalize(self):
ExchangeRateFilt= self.ExchangeRate[self.ExchangeRate.CurrencyCombo.isin(self.rateList)]
monthOnly= ExchangeRateFilt[ExchangeRateFilt.TradeMonth.isin(self.monthList)]
print(monthOnly['CurrencyCombo'])
monthOnly['CurrencyCombo] = monthOnly['CurrencyCombo].apply(self.remove_char())
I want to apply the function remove_char in the normalize function but i'm not sure if i'm doing it wrong. WHen i run the above as follows:
CurrencyNormalize('MonthlyExchangeRates.csv',rateList,Month,'Orderss.csv','CurrencyCombo').normalize()
I get the following error:
AttributeError: 'CurrencyNormalize' object has no attribute 'replace'
I think the error has something to do with how i appply the remove_char function, before i tried the OOP way the function was:
def remove_char(col):#
return (col.replace('GBP', ''))
and i would call it as :
ExchangeRate['CurrencyCombo'].apply(remove_char)
where Exchange rate is the df. How do i generalilse the function remove_char within the class?
self refers to your class. When you call self.replace() you are trying to run the method replace (which doesn't exist). What you want to do is something like:
self.ExchangeRate[CurrencyPair].replace('GBP', '')
EDIT: Since you correctly defined the attribute currencyPairCol you can simply call:
self.currencyPairCol.replace('GBP', '')
Clearly, the latter will modify only the attribute currencyPairCol and not the originally imported dataframe ExchangeRate (nor the column CurrencyPair in it)

Categories

Resources