Naming dataframes from the loop - python

I have a properly working function and would like to add the naming of DF while looping.
There is a function:
def(function1)
v0 = (x,y,z)
v1 = (aa,bb,cc)
for link, name in zip(v0,v1)
df = function1(v0)
It seems there is an issue as I cannot pass the variable from the loop to the data frame name.
The result I want to achieve:
df.aa from function1(x)
df.bb from function1(y)
df.cc from function1(z)

If I understand correctly, you would want to use a dictionary to store named results of a function call
def foo(x):
return some_dataframe
v0 = (x,y,z)
v1 = ('aa' , 'bb' , 'cc')
data = dict()
for v, name in zip(v0,v1):
data[name] = foo(v)

Related

How to return arrays from a function in python

I have a function to read some data files and make some pandas data. I have 4 paths I want to read and make dataframes.
def read_files(paths:np.array,scalings:np.array):
names = ['E','I']
for p,s in zip(paths,scalings):
df = pd.read_csv(p,engine = 'python', sep ='\s+', names=names)
energy_ = df['E']
intensity_ = df['I']
return energy_, intensity_
I want to make 2 arrays which have all of the dfs inside them to use for other functions.
(below) where each energy0 is the df['E'] from the first path in the paths array and so on.
energy = [energy_0, energy_1, energy2, energy_3]
intensity = [intensity_0, intensity_1, intensity_2, intensity_3]
to use in
fig = plotfunction(energy,intensity,etc)
How can I get call each specific dataframe so I can make an array with them? Edit: If I want to use the energy and intensity dataframes from the path3 if i use paths = [path0,path1,path2,path3]
How to return arrays from a function in python
Here's an example on how to return 2 array from a function:
def function():
array = [1,2,3]
array2 = [4,5,6]
return array, array2
a, a2 = function()
print(a)
print(a2)
How can I get call each specific dataframe
What?
I can only guess you want
def read_files(paths:np.array):
names = ['E','I']
energy_ = [] # create an array here
intensity_ = [] # create another array here
for p,s in zip(paths, scalings):
df = pd.read_csv(p,engine = 'python', sep ='\s+', names=names)
energy_.append(df['E']) # append to that array
intensity_.append(df['I']) # append to that other array
return energy_, intensity_ # return both arrays
Note that scalings is not defined in the code you have shown us.

Python, loops with changeable parts of filenames

I have a bunch of very similar commands which all look like this (df means pandas dataframe):
df1_part1=...
df1_part2=...
...
df1_part5=...
df2_part1=...
I would like to make a loop for it, as follows:
for i in range(1,5):
for j in range(1,5):
df%i_part%j=...
Of course, it doesn't work with %. But is has to be some easy way to do it, I suppose.
Could You help me please?
You can try one of the following options:
Create a dictionary which maps the your df and access it by the name of the dataframe:
mapping = {"df1_part1": df1_part1, "df1_part2": df1_part2}
for i in range(1,5):
for j in range(1,5):
mapping[f"df{i}_part{j}"] = ...
Use globals to access dynamically your variables:
df1_part1=...
df1_part2=...
...
df1_part5=...
df2_part1=...
for i in range(1,5):
for j in range(1,5):
globals()[f"df{i}_part{j}"] = ...
One way would be to collect your pandas dataframes in a list of lists and iterate over that list instead of trying dynamically parse your python code.
df1_part1=...
df1_part2=...
...
df1_part5=...
df2_part1=...
dflist = [[df1_part1, df1_part2, df1_part3, df1_part4, df1_part5],
[df2_part1, df2_part2, df2_part3, df2_part4, df2_part5]]
for df in dflist:
for df_part in df:
# do something with df_part
Assuming that this process is part of data preparation, I would like to mention that you should try to work with "data preparation pipelines" whenever it is possible. Otherwise, the code will be a huge mess to read after a couple of months.
There are several ways to deal with this problem.
A dictionary is the most straightforward way to deal with this.
df_parts = {
'df1' : {'part1': df1_part1, 'part2': df1_part2,...,'partN': df1_partN},
'df2' : {'part1': df1_part1, 'part2': df1_part2,...,'partN': df2_partN},
'...' : {'part1': ..._part1, 'part2': ..._part2,...,'partN': ..._partN},
'dfN' : {'part1': dfN_part1, 'part2': dfN_part2,...,'partN': dfN_partN},
}
# print parts from `dfN`
for val in for df_parts['dfN'].values():
print(val)
# print part1 for all dfs
for df in df_parts.values():
print(df['part1'])
# print everything
for df in df_parts:
for val in df_parts[df].values():
print(val)
The good thing with this approach is that you can iterate through the whole dictionary, but you don't include range which may be confusing later. Also, it is better to assign every df_part directly to a dict instead of assigning N*N variables which may be used once or twice. In this case you can just use 1 variable and re-assign it as you progress:
# code using df1_partN
df1 = df_parts['df1']['partN']
# stuff to do
# happy? checkpoint
df_parts['df1']['partN'] = df1

creating lambda expressions and apply on dataframe column in a loop in python

I have jsonDict dictionary , which I get from json file , which tells me for which column which method to be executed and the createMethodName method in Utilities creates the code to be executed
for col in columnList:
if (col in df_output.columns):
methodNameWithParam = Utilities.createMethodName(
jsonDict["MethodMapper"][col] , dict1[col])
df_output[col] = df_input.apply(lambda x: methodNameWithParam , axis=1)
sample
suppose the col = Coffee
json method mapper list will give me the method name Coffee_maker, dict1[col] will give me the list of inputs to the method -- suppose the inputs are [coffee,sugar,milk]
so the createMethodName will give me a output as Maker.Coffee_maker(x['coffee'],x['sugar'],x['milk'])
so this I will add in the lambda command -- Maker is the class where all the method is present.
methodNameWithParam = Maker.Coffee_maker(x['coffee'],x['sugar'],x['milk'])
so for each iteration it becomes like
df_output[col] = df_input.apply(
lambda x: Maker.Coffee_maker(x['coffee'],x['sugar'],x['milk']), axis=1)
other examples
col = tea
jsonDict["MethodMapper"][col] = tea_maker
dict1[col] = tea,milk,sugar
methodNameWithParam = Maker.tea_maker(x['tea'],x['milk'],x['sugar'])
But the issue I am getting , apply function is treating it as string , and the whole column is filled with Maker.tea_maker(x['tea'],x['milk'],x['sugar']) , its not executing the function
below the createMethodName function
class Uitilities
def createMethodName(methodname, inputlist):
args= []
for item in inputlist:
args.append('x['+item+']')
argsString= ','.join(args)
return 'Maker.'+ methodname +'(' + argsString +')'

issue in writing function to filter rows data frame

I am writing a function that will serve as filter for rows that I wanted to use.
The sample data frame is as follow:
df = pd.DataFrame()
df ['Xstart'] = [1,2.5,3,4,5]
df ['Xend'] = [6,8,9,10,12]
df ['Ystart'] = [0,1,2,3,4]
df ['Yend'] = [6,8,9,10,12]
df ['GW'] = [1,1,2,3,4]
def filter(data,Game_week):
pass_data = data [(data['GW'] == Game_week)]
when I recall the function filter as follow, I got an error.
df1 = filter(df,1)
The error message is
AttributeError: 'NoneType' object has no attribute 'head'
but when I use manual filter, it works.
pass_data = df [(df['GW'] == [1])]
This is my first issue.
My second issue is that I want to filter the rows with multiple GW (1,2,3) etc.
For that I can manually do it as follow:
pass_data = df [(df['GW'] == [1])|(df['GW'] == [2])|(df['GW'] == [3])]
if I want to use in function input as list [1,2,3]
how can I write it in function such that I can input a range of 1 to 3?
Could anyone please advise?
Thanks,
Zep
Use isin for pass list of values instead scalar, also filter is existing function in python, so better is change function name:
def filter_vals(data,Game_week):
return data[data['GW'].isin(Game_week)]
df1 = filter_vals(df,range(1,4))
Because you don't return in the function, so it will be None, not the desired dataframe, so do (note that also no need parenthesis inside the data[...]):
def filter(data,Game_week):
return data[data['GW'] == Game_week]
Also, isin may well be better:
def filter(data,Game_week):
return data[data['GW'].isin(Game_week)]
Use return to return data from the function for the first part. For the second, use -
def filter(data,Game_week):
return data[data['GW'].isin(Game_week)]
Now apply the filter function -
df1 = filter(df,[1,2])

pandas: fill a column by applying a class method to another column (which contains classes)

i have a pandas dataframe where one of the column is filled with class objects, like the code below:
import pandas as pd
class rec:
def test(self, a):
return a
class rec1:
def test(self, a):
return a*3
x= rec()
y = rec1()
list = [x,y]
df=pd.DataFrame(list, columns=['first'])
df['second']=['a1','b1']
print(df)
first second
0 <__main__.rec object at 0x000000180AAE9208> a1
1 <__main__.rec1 object at 0x000000180AACBEB8> b1
now, i wish to create a new column by applying the method "test" to column 'first', by reading input for "test" from column 'second'.
this loop works:
df['third']=0
for i in (0,1):
df['third'][i] = df['first'][i].test(df['second'][i])
but i wonder if i can avoid the loop and use something more similar to the following code (which does not work):
df['third'] = df['first'].test(df['second'])
any advice? thank you
This isn't that hard to do actually. You can use np.vectorize.
f = lambda x, y: x.test(y)
v = np.vectorize(f)
df['third'] = v(df['first'], df['second'])
df
first second third
0 <__main__.rec object at 0x1038b1ef0> a1 a1
1 <__main__.rec1 object at 0x1038b1c18> b1 b1b1b1

Categories

Resources