Loop to dynamically assign new pandas DataFrame variables

Loop to dynamically assign new pandas DataFrame variables - python

I have an imports dictionary with:
keys equal to names of new variables I would like to build, for example dataset_1, dataset_2 etc.
values being the pandas DataFrames (the type of each value is pd.DataFrame)
What I would like to achieve is to build new variables in amount of len(keys). The name of each variable would be equal to the name of key and the variable would hold a respective pd.DataFrame.
The code below doesn't work, but nevertheless, I have deep feeling that still it's a bad approach and a 'regular programmer' would do this another way.
for key in imports.keys():
import_str = '{} = imports.get({})'.format(key, key)
globalize = 'global {}'.format(key)
exec(globalize)
exec(import_str)
Can you please advise how to proceed?

Related

How to name subsets of a dataframe inside a loop

I'm having trouble naming the subsets I create inside a loop. I want to give each one the five first letters of the condition (or even just the iteration number) as a name but I haven't figured out how to.
Here's my code
list_mun=list(ensud21.NOM_MUN.unique())
for mun in list_mun:
name=ensud21[ensud21['NOM_MUN']== mun]
list_mun is a list with the unique values that a column of my dataframe can take. Inside the for loop I wrote name where I want what I explained before. I am unable to give each dataframe a different name. Thankyou!

You shouldn't try to set variable names dynamically. Use a container, a dictionary is perfect here:
list_mun=list(ensud21.NOM_MUN.unique())
out_dic = {}
for mun in list_mun:
# here we set "mun" as key
out_dict[mun] = ensud21[ensud21['NOM_MUN']== mun]
Then subsets with:
out_dic[the_mun_you_want]

How can I manipulate a DataFrame name within a function?

How can I manipulate a DataFrame name within a function so that I can have a new DataFrame with a new name that is derived from the input DataFrame name in return?
let say I have this:
def some_func(df):
# some operations
return(df_copy)
and whatever df I put inside this function it should return the new df as ..._copy, e.g. some_func(my_frame) should return my_frame_copy.
Things that I considered are as follows:
As in string operations;
new_df_name = "{}_copy".format(df) -- I know this will not work since the df refers to an object but it just helps to explain what I am trying to do.
def date_timer(df):
df_copy = df.copy()
dates = df_copy.columns[df_copy.columns.str.contains('date')]
for i in range(len(dates)):
df_copy[dates[i]] = pd.to_datetime(df_copy[dates[i]].str.replace('T', ' '), errors='coerce')
return(df_copy)
Actually this was the first thing that I tried, If only DataFrame had a "name" attribute which allowed us to manipulate the name but this also not there:
df.name
Maybe f-string or any kind of string operations could be able to make it happen. if not, it might not be possible to do in python.
I think this might be related to variable name assignment rules in python. And in a sense what I want is reverse engineer that but probably not possible.
Please advice...

It looks like you're trying to access / dynamically set the global/local namespace of a variable from your program.
Unless your data object belongs to a more structured namespace object, I'd discourage you from dynamically setting names with such a method since a lot can go wrong, as per the docs:
Changes may not affect the values of local and free variables used by the interpreter.
The name attribute of your df is not an ideal solution since the state of that attribute will not be set on default. Nor is it particularly common. However, here is a solid SO answer which addresses this.
You might be better off storing your data objects in a dictionary, using dates or something meaningful as keys. Example:
my_data = {}
for my_date in dates:
df_temp = df.copy(deep=True) # deep copy ensures no changes are translated to the parent object
# Modify your df here (not sure what you are trying to do exactly
df_temp[my_date] = "foo"
# Now save that df
my_data[my_date] = df_temp
Hope this answers your Q. Feel free to clarify in the comments.

Creating/Getting/Extracting multiple data frames from python dictionary of dataframes

I have a python dictionary with keys as dataset names and values as the entire data frames themselves, see the dictionary dict below
[Dictionary of Dataframes ]
One way id to write all the codes manually like below:
csv = dict['csv.pkl']
csv_emp = dict['csv_emp.pkl']
csv_emp_yr= dict['csv_emp_yr.pkl']
emp_wf=dict['emp_wf.pkl']
emp_yr_wf=dict['emp_yr_wf.pkl']
But this will get very inefficient with more number of datasets.
Any help on how to get this done over a loop?

Although I would not recommend this method but you can try this:
import sys
this = sys.modules[__name__] # this is now your current namespace
for key in dict.keys():
setattr(this, key, dict[key])
Now you can check new variables made with names same as keys of dictionary.
globals() has risk as it gives you what the namespace is currently pointing to but this can change and so modifying the return from globals() is not a good idea
List can also be used like (limited usecases):
dataframes = []
for key in dict.keys():
dataframes.append(dict[key])
Still this is your choice, both of the above methods have some limitations.

Creating new pandas dataframe in each loop iteration

I have several pandas dataframes (A,B,C,D) and I want to merge each one of them individually with another dataframe (E).
I wanted to write a for loop that allows me to run the merge code for all of them and save each resulting dataframe with a different name, so for example something like:
tables = [A,B,C,D]
n=0
for df in tables:
merged_n = df.merge(E, left_index = True, right_index = True)
n=n+1
I can't find a way to get the different names for the new dataframes created in the loop. I have searched stackoverflow but people say this should never be done (but couldn't find an explanation why) or to use dictionaries, but having dataframes inside dictionaries is not as practical.

you want to clutter the namespace with automatically generated variable names? if so, don't do that. just use a dictionary.
if you really don't want to use a dictionary (really think about why you don't want to do this), you can just do it the slow-to-write, obvious way:
ea = E.merge(A)
eb = E.merge(B)
...
edit: if you really want to add vars to your namespace, which i don't recommend, you can do something like this:
l = locals()
for c in 'abcd':
l[f'e{c}'] = E.merge(l[c.upper()])

Generating a list from several dataframes

I have several data frames named data1, data2, data3, data4, ... data100. How can I store them in a list so that I'm able to plot them with a for loop.
Thanks in advance.

The problem you are experiencing is a symptom of using numbered variables.
You can avoid the problem entirely by using a list of DataFrames (e.g. data)
instead of using numbered variables (data1, data2, data3, etc.)
The trick is to avoid creating the numbered variables in the first place.
If you have code of the form
data1 = ...
data2 = ...
data3 = ...
Try to replace it with something like
data = []
data.append(...)
data.append(...)
data.append(...)
or better yet, use a list comprehension or a for-loop to define data. For more specific suggestions, show us the code that defines the numbered variables.
Then you could loop over the DataFrames with
for df in data:
df.plot(...)
If for some reason you can not prevent (someone else?) from defining the numbered variables, then you could use globals() (or locals()) to access the numbered variables programmatically:
g = globals()
data = [g['data{}'.format(i)] for i in range(1, 101)]
globals() returns a dictionary whose keys are string representations of names
in the global namespace. The associated values are the Python objects bound to
those names. Thus, you can use globals() to look up the values bound to
variable names based on the string representation of those variable names.
Use locals() if the variable names are defined in the local (rather than global) namespace.
Still, try to avoid using numbered variables. This use of globals() is merely a workaround for trouble someone else is causing. Using string formatting to look up variable names is not great programming style when simple integer indexing (of a list) should suffice. The best solution is to convince that someone to stop using numbered variables and to instead deliver the values in a list.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Loop to dynamically assign new pandas DataFrame variables - python

Related

How to name subsets of a dataframe inside a loop

How can I manipulate a DataFrame name within a function?

Creating/Getting/Extracting multiple data frames from python dictionary of dataframes

Creating new pandas dataframe in each loop iteration

Generating a list from several dataframes

Categories

Resources