Creating/Getting/Extracting multiple data frames from python dictionary of dataframes - python

I have a python dictionary with keys as dataset names and values as the entire data frames themselves, see the dictionary dict below
[Dictionary of Dataframes ]
One way id to write all the codes manually like below:
csv = dict['csv.pkl']
csv_emp = dict['csv_emp.pkl']
csv_emp_yr= dict['csv_emp_yr.pkl']
emp_wf=dict['emp_wf.pkl']
emp_yr_wf=dict['emp_yr_wf.pkl']
But this will get very inefficient with more number of datasets.
Any help on how to get this done over a loop?

Although I would not recommend this method but you can try this:
import sys
this = sys.modules[__name__] # this is now your current namespace
for key in dict.keys():
setattr(this, key, dict[key])
Now you can check new variables made with names same as keys of dictionary.
globals() has risk as it gives you what the namespace is currently pointing to but this can change and so modifying the return from globals() is not a good idea
List can also be used like (limited usecases):
dataframes = []
for key in dict.keys():
dataframes.append(dict[key])
Still this is your choice, both of the above methods have some limitations.

Related

Loop to dynamically assign new pandas DataFrame variables

I have an imports dictionary with:
keys equal to names of new variables I would like to build, for example dataset_1, dataset_2 etc.
values being the pandas DataFrames (the type of each value is pd.DataFrame)
What I would like to achieve is to build new variables in amount of len(keys). The name of each variable would be equal to the name of key and the variable would hold a respective pd.DataFrame.
The code below doesn't work, but nevertheless, I have deep feeling that still it's a bad approach and a 'regular programmer' would do this another way.
for key in imports.keys():
import_str = '{} = imports.get({})'.format(key, key)
globalize = 'global {}'.format(key)
exec(globalize)
exec(import_str)
Can you please advise how to proceed?

Error Changing Dictionary Keys

I've two defaultdicts I eventually want to merge, but first I need to make their keys match. According to some threads I've seen here, I can use pop() to replace keys in a dictionary. But that only updates the existing dictionary, whereas I want to create a new dictionary with the new keys. So something like:
existing_dict_one -> new_dict_one
This is what I've so far:
def split_tabs(x):
"""
Function to split tab-separated strings, used to break up the keys that are separated by tabs.
"""
return x.split('\t')
def create_dict(old_dict):
"""
Function to create a new defaultdict from an existing defaultdict, just with
different keys.
"""
new_dict = old_dict.copy() # Create a copy of old_dict to house the new keys, but with the same values.
for key, value in new_dict.iteritems():
umi = split_tabs(key)[0] # Change key to be UMI, which is the 0th index of the tab-delimited key.
# new_key = key.replace(key, umi)
new_dict[umi] = new_dict.pop(key)
return new_dict
However, I'm getting the following error
RuntimeError: dictionary changed size during iteration
and I don't know how to fix it. Does anyone know how to correct it? I'd like to use the variable "umi" as the new key.
I'd like to post the variable "key" and dictionary "old_dict" I'm using for testing this code, but it's messy and takes up a lot of space. So here's a pastebin link that contains them instead.
Note that "umi" comes from variable "key" which is separated by tabs. So I split "key" and get the first object as "umi".
Just use a dict comprehension for this:
new_dict = {split_tabs(key)[0]: value for key, value in old_dict.iteritems()}
Trying to modify a dictionary while iterating over it is not a good idea in general.
If you use .items() instead of .iteritems(), you won't have that problem, because that will just return a list that is disconnected from the dictionary. In python 3 it would be 'list(new_dict.items())`.
Also if there's any possibility that the dictionary values are mutable, you'll have to use copy.deepcopy(old_dict) instead of just old_dict.copy().

Create a dictionary with name of variable

I am trying to write a program to parse a file, break it into sections, and read it into a nested dictionary. I want the output to be something like this:
output = {'section1':{'nested_section1':{'value1':'value2'}}}
I'm trying to do this by building separate dictionaries, than merging them, but I'm running into trouble naming them. I want the dictionaries inside of the others to be named based on the sections of the file they're taken from. But it seems I can't name a dictionary from a variable.
You can name a dictionary entry from a variable. If you have
text = "myKey" # or myNumber or any hashable type
data = dict()
You can do
data[text] = anyValue
Store all your dictionaries in a single root dictionary.
all_dicts['output'] = {'section1':{'nested_section1':{'value1':'value2'}}}
As you merge dictionaries, remove the children from all_dicts.
all_dicts['someotherdict']['key'] = all_dicts['output']
del all_dicts['output']

creating keys for a dictionary from a list containing the keys

I'm trying to build a matrix that holds several values in several levels.
I'm trying to generate a dictionary build up like this:
{'routername':{'channel':{'01':<value>,'02':<value>}}}
The number of keys on the highest level may vary.
The script is generating a list of available routers and another list of available channels.
I wrote a rather cumbersome function that test for a key and if it is not already there, it adds the key to the dictionary.
So, I was wondering if there isn't an easy way to create a dictionary with empty values for the keys in list 'routers'
def AddToChart(passed_seq):
try:
if str(passed_seq[0]) in chart_dict:
if str(passed_seq[1]) in chart_dict[passed_seq[0]]:
if str(passed_seq[2]) in chart_dict[passed_seq[0]][passed_seq[1]]:
if str(passed_seq[3]) in chart_dict[passed_seq[0]][passed_seq[1]][passed_seq[2]]:
chart_dict[passed_seq[0]][passed_seq[1]][passed_seq[2]][passed_seq[3]].update(err_sub_dict)
else:
chart_dict[passed_seq[0]][passed_seq[1]][passed_seq[2]].update({passed_seq[3]:{}})
chart_dict[passed_seq[0]][passed_seq[1]][passed_seq[2]][passed_seq[3]].update(err_sub_dict)
else:
chart_dict[passed_seq[0]][passed_seq[1]].update({passed_seq[2]:{passed_seq[3]:{}}})
chart_dict[passed_seq[0]][passed_seq[1]][passed_seq[2]][passed_seq[3]].update(err_sub_dict)
else:
chart_dict[passed_seq[0]].update({passed_seq[1]:{passed_seq[2]:{passed_seq[3]:{}}}})
chart_dict[passed_seq[0]][passed_seq[1]][passed_seq[2]][passed_seq[3]].update(err_sub_dict)
else:
chart_dict.update({passed_seq[0]:{passed_seq[1]:{passed_seq[2]:{passed_seq[3]:{}}}}})
chart_dict[passed_seq[0]][passed_seq[1]][passed_seq[2]][passed_seq[3]].update(err_sub_dict)
except ValueError:
print "AddToChart: ",err_sub_dict,sys.exc_info()[1][0]
except:
print sys.exc_info()
print "AddToChart: variable not defined: " + str(passed_seq)
I suggest using a nested collections.defaultdict for chart_dict. It lets you provide a factory function to set up new values, so any key you request will always work. It's a little tricky to get such a deeply nested structure set up, but I think the following will do the right thing for your four-level structure (I'm assuming your <value> items are also dictionaries, as it seems your current code expects):
chart_dict = defaultdict(lambda:defaultdict(lambda:defaultdict(dict)))
With that in place, you should then be able to do the following without worrying about whether any of the keys previously referenced anything in the dictionary:
a, b, c = passed_seq
chart_dict[a][b][c].update(err_sub_dict)
I'd suggest doing something like the variable unpacking above too, though you should probably use better names than a, b, and c. Good variable names can turn something incomprehensible into something easy to grasp.
You should use
dict.setdefault()
See docs.
Example:
d = {}
d = d.setdefault("k","eggs")
>> d["k"]
eggs
d2 = {"k":1}
d2 = d2.setdefault("k","spam")
>> d2["k"]
1

How to rewrite this Dictionary For Loop in Python?

I have a Dictionary of Classes where the classes hold attributes that are lists of strings.
I made this function to find out the max number of items are in one of those lists for a particular person.
def find_max_var_amt(some_person) #pass in a patient id number, get back their max number of variables for a type of variable
max_vars=0
for key, value in patients[some_person].__dict__.items():
challenger=len(value)
if max_vars < challenger:
max_vars= challenger
return max_vars
What I want to do is rewrite it so that I do not have to use the .iteritems() function. This find_max_var_amt function works fine as is, but I am converting my code from using a dictionary to be a database using the dbm module, so typical dictionary functions will no longer work for me even though the syntax for assigning and accessing the key:value pairs will be the same. Thanks for your help!
Since dbm doesn't let you iterate over the values directly, you can iterate over the keys. To do so, you could modify your for loop to look like
for key in patients[some_person].__dict__:
value = patients[some_person].__dict__[key]
# then continue as before
I think a bigger issue, though, will be the fact that dbm only stores strings. So you won't be able to store the list directly in the database; you'll have to store a string representation of it. And that means that when you try to compute the length of the list, it won't be as simple as len(value); you'll have to develop some code to figure out the length of the list based on whatever string representation you use. It could just be as simple as len(the_string.split(',')), just be aware that you have to do it.
By the way, your existing function could be rewritten using a generator, like so:
def find_max_var_amt(some_person):
return max(len(value) for value in patients[some_person].__dict__.itervalues())
and if you did it that way, the change to iterating over keys would look like
def find_max_var_amt(some_person):
dct = patients[some_person].__dict__
return max(len(dct[key]) for key in dct)

Categories

Resources