Assign variable names and data in a loop - python

I'm trying to load data into datasets in Python. My data is arranged by years. I want to assign variable names in a loop. Here's what it should look like in pseudocode:
import pandas as pd
for i in range(2010,2017):
Data+i = pd.read_csv("Data_from_" +str(i) + ".csv")
# Stores data from file "Data_from_YYYY.csv" as dataset DataYYYY.
The resulting datasets would be Data2010 - Data2017.

While this is possible, it is not a good idea. Code with dynamically-generated variable names is difficult to read and maintain. That said, you could do it using the exec function, which executes code from a string. This would allow you to dynamically construct your variables names using string concatenation.
However, you really should use a dictionary instead. This gives you an object with dynamically-named keys, which is much more suited to your purposes. For example:
import pandas as pd
Data = {}
for i in range(2010,2017):
Data[i] = pd.read_csv("Data_from_" +str(i) + ".csv")
# Stores data from file "Data_from_YYYY.csv" as dataset DataYYYY.
# Access data like this:
Data[2011]
You should also use snake_case for variable names in Python, so Data should be data.
If you really wanted to dynamically generate variables, you could do it like this. (But you aren't going to, right?)
import pandas as pd
for i in range(2010,2017):
exec("Data{} = pd.read_csv(\"Data_from_\" +str(i) + \".csv\")".format(i))
# Stores data from file "Data_from_YYYY.csv" as dataset DataYYYY.
You can do this without exec too; have a look at Jooks' answer.

You can use globals, locals or a __init__ which uses setattr on self to assign the variables to an instance.
In [1]: globals()['data' + 'a'] = 'n'
In [2]: print dataa
n
In [3]: locals()['data' + 'b'] = 'n'
In [4]: print datab
n
In [5]: class Data(object):
...: def __init__(self, **kwargs):
...: for k, v in kwargs.items():
...: setattr(self, k, v)
...:
In [6]: my_data = Data(a=1, b=2)
In [7]: my_data.a
Out[7]: 1
I would probably go the third route. You may be approaching your solution in an unconventional way, as this pattern is not very familiar, even if it is possible.

Related

Can I define how accessing python object (and not just its attributes) is handled?

I have a custom class in python, that I would like to behave in a certain way if the object itself (i.e., and not one if its methods/properties) is accessed.
This is a contrived minimal working example to show what I mean. I have a class that holds various pandas DataFrames so that they can separately be manipulated:
import pandas as pd
import numpy as np
class SplitDataFrame:
def __init__(self, df0, df1):
self._dfs = [df0, df1]
def increase(self, num, inc):
self._dfs[num] = self._dfs[num] + inc
#property
def asonedf(self):
return pd.concat(self._dfs, axis=1)
d = SplitDataFrame(pd.DataFrame(np.random.rand(2,2), columns=['a','b']),
pd.DataFrame(np.random.rand(2,2), columns=['q','r']))
d.increase(0, 10)
This works, and I can examine that d._dfs now indeed is
[ a b
0 10.845681 10.561956
1 10.036739 10.262282,
q r
0 0.164336 0.412171
1 0.440800 0.945003]
So far, so good.
Now, I would like to change/add to the class's definition so that, when not using the .increase method, it returns the concatenated dataframe. In other words, when accessing d, I would like it to return the same dataframe as when typing d.asonedf, i.e.,
a b q r
0 10.143904 10.154455 0.776952 0.247526
1 10.039038 10.619113 0.443737 0.040389
That way, the object more closely follows the pandas.DataFrame api:
instead of needing to use d.asonedf['a'], I could access d['a'];
instead of needing to use d.asonedf + 12, I could do d + 12;
etc.
Is that possible?
I could make SplitDataFrame inherit from pandas.DataFrame, but that does not magically add the desired behaviour.
Many thanks!
You could of course proxy all relevant magic methods to a concatenated dataframe on demand. If you don't want to repeat yourself endlessly, you could dynamically do that.
I'm not saying this is the way to go, but it kind of works:
import textwrap
import pandas as pd
import numpy as np
class SplitDataFrame:
def __init__(self, df0, df1):
self._dfs = [df0, df1]
def increase(self, num, inc):
self._dfs[num] = self._dfs[num] + inc
for name in dir(pd.DataFrame):
if name in (
"__init__",
"__new__",
"__getattribute__",
"__getattr__",
"__setattr__",
) or not callable(getattr(pd.DataFrame, name)):
continue
exec(
textwrap.dedent(
f"""
def {name}(self, *args, **kwargs):
return pd.concat(self._dfs, axis=1).{name}(*args, **kwargs)
"""
)
)
As you might guess, there's all kind of strings attached to this solution, and it uses horrible practices (using exec, using dir, ...).
At the very least I would implement __repr__ so you don't lie to yourself about what kind of object this is, and maybe you'd want to explicitly enumerate all methods you want defined instead of getting them via dir(). Instead of exec() you can define the function normally and then set it on the class with setattr.

Pandas style object members

I'm trying to figure out how Pandas manages to create new object members on the fly. For example, if you do this:
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
You can immediately do this:
df.col1
and get the contents of col1. How does Pandas create the col1 member on the fly?
Thanks.
Relevant code in the repository that checks for a dictionary input:
class DataFrame(NDFrame):
def __init__(self, data=None, index=None, columns=None, dtype=None,
copy=False):
if data is None:
data = {}
# Some other if-statements that check data types...
elif isinstance(data, dict): mgr = self._init_dict(data, index, columns, dtype=dtype)
Which uses_init_dict method:
def _init_dict(self, data, index, columns, dtype=None):
if columns is not None:
# Does some stuff - but this isn't your case
else:
keys = list(data.keys())
if not isinstance(data, OrderedDict):
# So this part is trying to sort keys to cols in alphabetical order
# The _try_sort function is simple, exists in pandas.core.common
keys = _try_sort(keys)
columns = data_names = Index(keys)
So the real work comes from the Index class in pandas.core.indexes.base. From there things start to get really complicated (and my understanding of what it means to explain the "how" of anything without continuing to regress until you get to machine code, started to melt away) but it's safe to say that if you give the pandas.Index class a 1-dimentional array of data it will create an object with a sliceable set and an associated data type.
Which is exactly what you're observing - you essentially fed it a bunch of keys and pandas understood that it needed to give you something back that you could access as an index (since df.col1 is just syntactic sugar for df['col1']), that you could slice (df[0:1]), and that knew its own data types.
And, of course, after asking the question, i find the answer myself.
It turns out you can use __getattr__ to achieve this. Easiest way (and the one i happen to want) is by using a dictionary, then using __getattr__ to return values from the dictionary, like so:
class test():
def __init__(self):
# super().__init__()
self.adict = {'spam' : 'eggs'}
def __getattr__(self, attr):
return self.adict[attr]
nt = test()
print(nt.spam)
__getattr__ is called when a class attribute isn't found, as is the case here. The interpreter can't find the spam attribute, so it defers this to __getattr__. Things to keep in mind:
If the key doesn't exist in the dictionary, this will raise a key error, not an attribute error.
Don't use __getattribute__, because it is called every time an attribute is called, messing up your entire class.
Thanks to everyone for the input on this.

Unpacking data with h5py

I want to write numpy arrays to a file and easily load them in again.
I would like to have a function save() that preferably works in the following way:
data = [a, b, c, d]
save('data.h5', data)
which then does the following
h5f = h5py.File('data.h5', 'w')
h5f.create_dataset('a', data=a)
h5f.create_dataset('b', data=b)
h5f.create_dataset('c', data=c)
h5f.create_dataset('d', data=d)
h5f.close()
Then subsequently I would like to easily load this data with for example
a, b, c, d = load('data.h5')
which does the following:
h5f = h5py.File('data.h5', 'r')
a = h5f['a'][:]
b = h5f['b'][:]
c = h5f['c'][:]
d = h5f['d'][:]
h5f.close()
I can think of the following for saving the data:
h5f = h5py.File('data.h5', 'w')
data_str = ['a', 'b', 'c', 'd']
for name in data_str:
h5f.create_dataset(name, data=eval(name))
h5f.close()
I can't think of a similar way of using data_str to then load the data again.
Rereading the question (was this edited or not?), I see load is supposed to function as:
a, b, c, d = load('data.h5')
This eliminates the global variable names issue that I worried about earlier. Just return the 4 arrays (as a tuple), and the calling expression takes care of assigning names. Of course this way, the global variable names do not have to match the names in the file, nor the names used inside the function.
def load(filename):
h5f = h5py.File(filename, 'r')
a = h5f['a'][:]
b = h5f['b'][:]
c = h5f['c'][:]
d = h5f['d'][:]
h5f.close()
return a,b,c,d
Or using a data_str parameter:
def load(filename, data_str=['a','b','c','d']):
h5f = h5py.File(filename, 'r')
arrays = []
for name in data_str:
var = h5f[name][:]
arrays.append(var)
h5f.close()
return arrays
For loading all the variables in the file, see Reading ALL variables in a .mat file with python h5py
An earlier answer that assumed you wanted to take the variable names from the file key names.
This isn't a h5py issue. It's about creating global (or local) variables using names from a dictionary (or other structure). In other words, how creat a variable, using a string as name.
This issue has come up often in connection with argparse, an commandline parser. It gives an object like args=namespace(a=1, b='value'). It is easy to turn that into a dictionary (with vars(args)), {'a':1, 'b':'value'}. But you have to do something tricky, and not Pythonic, to create a and b variables.
It's even worse if you create that dictionary inside a function, and then want to create global variables (i.e. outside the function).
The trick involves assigning to locals() or globals(). But since it's un-pythonic I'm reluctant to be more specific.
In so many words I'm saying the same thing as the accepted answer in https://stackoverflow.com/a/4467517/901925
For loading variables from a file into an Ipython environment, see
https://stackoverflow.com/a/28258184/901925 ipython-loading-variables-to-workspace
I would use deepdish (deepdish.io):
import deepdish as dd
dd.io.save(filename, {'dict1': dict1, 'obj2': obj2}, compression=('blosc', 9))

Class created with different number of class attributes every instance?

I know that sound generic, so I will try my best to explain the case.
I am getting back from a process, some data. I would like to organize this data in a format that I can interact with it, and access it.
So far I thought that I can make a simple class, which is empty, and at creation time it takes **kwargs and cycle trough them, adding them to the class.
Although I am not sure if this is the correct way to do so. Imagine the following data:
dict1={param1:a, param2:b, param3:c, param4:d} #Operation1
dict2={param2:t, param1:2, param1:r} #Operation2
dict3={param1:1, param7:2, param2:4, param4:b, param3:m} #Operation3
I would like to make a class that when creating, will take the parameters, resulting in the class attribute name taken from the parameter name, and as value, the value of that parameter:
myclass1(dict1)
myclass2.param1 return a, myclass1.param2 return b and so on
But if I want to make myclass using dict2, I can also do so:
myclass2(dict2)
myclass1.param2 return t, myclass2.param1 return 2 and so on
In this way I have a name with the parameter, and can retrieve it at later time, and at the same time I do not have to worry about how many element my data has, since the class will always get the number of parameters, and create class attributes, using the name of the key in the dictionary.
Is possible to achieve this in a simple way in python? I can use dictionary in a dictionary, but it feels utterly complicate, for big data sets.
You can do something like:
In [2]: dict1={'param1':'a', 'param2':'b', 'param3':'c', 'param4':'d'}
In [3]: class A(object):
...: def __init__(self, params):
...: for k, v in params.iteritems():
...: setattr(self, k, v)
...:
In [4]: a = A(dict1)
In [5]: a.param1
Out[5]: 'a'
In [6]: a.param2
Out[6]: 'b'

Export function that names the export file after the input variable

I'm looking to get a function to export Numpy arrays, but to use the name of the variable input to the function as the name of the exported file. Something like:
MyArray = [some numbers]
def export(Varb):
return np.savetxt("%s.dat" %Varb.name, Varb)
export(MyArray)
that will output a file called 'MyArray.dat' filled with [some numbers]. I can't work out how to do the 'Varb.name' bit. Any suggestions?
I'm new to Python and programming so I hope there is something simple I've missed!
Thanks.
I don't recommend such a code style, but you can do it this way:
import copy
myarray = range(4)
for k, v in iter(copy.copy(locals()).items()):
if myarray == v:
print k
This gives myarray as output. To do this in a function useful for exports use:
import copy
def export_with_name(arg):
""" Export variable's string representation with it's name as filename """
for k, v in iter(copy.copy(globals()).items()):
if arg == v:
with open(k, 'w') as handle:
handle.writelines(repr(arg))
locals() and globals() both give dictionaries holding the variable names as keys and the variable values as values.
Use the function the following way:
some_data = list(range(4))
export_with_name(some_data)
gives a file called some_data with
[0, 1, 2, 3]
as content.
Tested and compatible with Python 2.7 and 3.3
You can't. Python objects don't know what named variables happen to be referencing them at any particular time. By the time the variable has been dereferenced and sent to the function, you don't know where it came from. Consider this bit of code:
map(export, myvarbs)
Here, the varbs were in some sort of container and didn't even have a named variable referencing them.
import os
import inspect
import re
number_array = [8,3,90]
def varname(p):
for line in inspect.getframeinfo(inspect.currentframe().f_back)[3]:
m = re.search(r'\bvarname\s*\(\s*([A-Za-z_][A-Za-z0-9_]*)\s*\)', line)
if m:
return m.group(1)
def export(arg):
file_name = "%s.dat" % varname(arg)
fd = open(file_name, "w")
fd.writelines(str(arg))
fd.close()
export(number_array)
Refer How can you print a variable name in python? to get more details about def varname()

Categories

Resources