Imagine the following problem: you have a dictionary of some content in python and want to generate python code that would create this dict. (which is like eval but in reverse)
Is there something that can do this?
Scenario:
I am working with a remote python interpreter. I can give source files to it but no input. So I am now looking for a way to encode my input data into a python source file.
Example:
d = {'a': [1,4,7]}
str_d = reverse_eval(d)
# "{'a': [1, 4, 7]}"
eval(str_d) == d
repr(thing)
will output text that when executed will (in most cases) reproduce the dictionary.
Actually, it's important for which types of data do you want this reverse function to exist. If you're talking about built-in/standard classes, usually their .__repr__() method returns the code you want to access. But if your goal is to save something in a human-readable format, but to use an eval-like function to use this data in python, there is a json library.
It's better to use json for this reason because using eval is not safe.
Json's problem is that it can't save any type of data, it can save only standard objects, but if we're talking about not built-in types of data, you never know, what is at their .__repr__(), so there's no way to use repr-eval with this kind of data
So, there is no reverse function for all types of data, you can use repr-eval for built-in, but for built-in data the json library is better at least because it's safe
Related
I'm performing data analysis on a large number of variables contained in an hdf5 file. The code I've written loops over a list of variables and then performs analyses and outputs some graphs. It would be nice to be able to use the code for combinations of variables (like A+B or sqrt((A**2)+(B**2)) without having to put in a bunch of if statements, i.e. execute the statement in the string when loading the variables from my hdf5 file. If possible, I would like to avoid using pandas, but I'm not completely against it if that's the only efficient way to do what I want.
My hdf5 file looks something like this :
HDF5 "blahblah.hdf5" {
FILE_CONTENTS {
group /
group /all
dataset /all/blargle
dataset /all/blar
}
}
And what I would like to do is this (this functionality doesn't exist in h5py, so it bugs) :
myfile = h5py.File('/myfile/blahblah.hdf5')
varlist = ['blargle', 'blar', 'blargle+blar']
savelist = [None]*len(varlist)
for ido, varname in enumerate(varlist):
savelist[ido] = myfile['all'][varname]
#would like to evaluate varname upon loading
First you have to ask yourself: Do I know the arithmetic operations only at runtime or already at programming time?
If you know it already now, just write a function in Python for it.
If you know it only at runtime, you will need a parser. While there are libraries specialized on this out there (example), Python itself is already a parser. With exec you can execute strings containing Python code.
Now all you need to define is some sort of grammar for specific language. You need some conventions. You have them already, it seems you want to convert myfile['all']['blargle+blar'] to myfile['all']['blargle']+myfile['all']['blar']. In order to make life easier I recommend.
Put names of data sets in brackets.
varlist = ['[blargle]', '[blar]', '[blargle]+[blar]', 'sqrt(([blargle]**2)+([blar]**2)']
Then simply replace all terms in brackets by myfile['all'][name_in_brackets] and then execute the string with exec.
import re
for ido, varname in enumerate(varlist):
term = re.sub(r'\[(.*?)\]', lambda x: "myfile['all']['{}']".format(x), varname, flag='g')
savelist[ido] = exec(term)
The line using regular expression to match the variable names re.sub is actually not tested by me.
And still another drawback. I'm not sure reading data sets from an hdf5 object is fast since the same data set may be read multiple times and if hdf5 is not caching it might be better to store the data sets intermediately before doing computation on them.
Say, I have a python list:
arr = [
[1,2,3,4],
[11,22,33,44]
]
I want to dump this object to a file with the code so I can eval() it back soon, the content of the file should be :
[
[1,2,3,4],
[11,22,33,44]
]
I don't want to use pickle since it is way too slow.
print repr(arr).
Of course, pickle isn't especially slow. And, as zhangyangyu notes, while this works for a list, it won't work for objects whose repr cannot be eval'd.
I think you can use repr, and then write the result to a file.
repr(...)
repr(object) -> string
Return the canonical string representation of the object.
For most object types, eval(repr(object)) == object.
But this is not safe, if the file is changed, something terrible may happen.
And what's more, it seems the list in your file is of format. And when this happen how do you convert it back. When you read the contents back, you have to add logic to see if the string represents the list comes to end. If they are in one line, it may be easier.
So, using some existing module is not a bad idea and is the usual way.
Use cPickle. It's orders of magnitude faster than pickle.
Use cPickle, and you can do import cPickle as pickle so that existing code doesn't have to change. I'm using cPickle frequently myself and 20+MB files load decently fast (layered dictionaries/lists with thousands of entries).
Currently expensively parsing a file, which generates a dictionary of ~400 key, value pairs, which is seldomly updated. Previously had a function which parsed the file, wrote it to a text file in dictionary syntax (ie. dict = {'Adam': 'Room 430', 'Bob': 'Room 404'}) etc, and copied and pasted it into another function whose sole purpose was to return that parsed dictionary.
Hence, in every file where I would use that dictionary, I would import that function, and assign it to a variable, which is now that dictionary. Wondering if there's a more elegant way to do this, which does not involve explicitly copying and pasting code around? Using a database kind of seems unnecessary, and the text file gave me the benefit of seeing whether the parsing was done correctly before adding it to the function. But I'm open to suggestions.
Why not dump it to a JSON file, and then load it from there where you need it?
import json
with open('my_dict.json', 'w') as f:
json.dump(my_dict, f)
# elsewhere...
with open('my_dict.json') as f:
my_dict = json.load(f)
Loading from JSON is fairly efficient.
Another option would be to use pickle, but unlike JSON, the files it generates aren't human-readable so you lose out on the visual verification you liked from your old method.
Why mess with all these serialization methods? It's already written to a file as a Python dict (although with the unfortunate name 'dict'). Change your program to write out the data with a better variable name - maybe 'data', or 'catalog', and save the file as a Python file, say data.py. Then you can just import the data directly at runtime without any clumsy copy/pasting or JSON/shelve/etc. parsing:
from data import catalog
JSON is probably the right way to go in many cases; but there might be an alternative. It looks like your keys and your values are always strings, is that right? You might consider using dbm/anydbm. These are "databases" but they act almost exactly like dictionaries. They're great for cheap data persistence.
>>> import anydbm
>>> dict_of_strings = anydbm.open('data', 'c')
>>> dict_of_strings['foo'] = 'bar'
>>> dict_of_strings.close()
>>> dict_of_strings = anydbm.open('data')
>>> dict_of_strings['foo']
'bar'
If the keys are all strings, you can use the shelve module
A shelf is a persistent, dictionary-like object. The difference with
“dbm” databases is that the values (not the keys!) in a shelf can be
essentially arbitrary Python objects — anything that the pickle module
can handle. This includes most class instances, recursive data types,
and objects containing lots of shared sub-objects. The keys are
ordinary strings.
json would be a good choice if you need to use the data from other languages
If storage efficiency matters, use Pickle or CPickle(for execution performance gain). As Amber pointed out, you can also dump/load via Json. It will be human-readable, but takes more disk.
I suggest you consider using the shelve module since your data-structure is a mapping.
That was my answer to a similar question titled If I want to build a custom database, how could I? There's also a bit of sample code in another answer of mine promoting its use for the question How to get a object database?
ActiveState has a highly rated PersistentDict recipe which supports csv, json, and pickle output file formats. It's pretty fast since all three of those formats are implement in C (although the recipe itself is pure Python), so the fact that it reads the whole file into memory when it's opened might be acceptable.
JSON (or YAML, or whatever) serialisation is probably better, but if you're already writing the dictionary to a text file in python syntax, complete with a variable name binding, you could just write that to a .py file instead. Then that python file would be importable and usable as is. There's no need for the "function which returns a dictionary" approach, since you can directly use it as a global in that file. e.g.
# generated.py
please_dont_use_dict_as_a_variable_name = {'Adam': 'Room 430', 'Bob': 'Room 404'}
rather than:
# manually_copied.py
def get_dict():
return {'Adam': 'Room 430', 'Bob': 'Room 404'}
The only difference is that manually_copied.get_dict gives you a fresh copy of the dictionary every time, whereas generated.please_dont_use_dict_as_a_variable_name[1] is a single shared object. This may matter if you're modifying the dictionary in your program after retrieving it, but you can always use copy.copy or copy.deepcopy to create a new copy if you need to modify one independently of the others.
[1] dict, list, str, int, map, etc are generally viewed as bad variable names. The reason is that these are already defined as built-ins, and are used very commonly. So if you give something a name like that, at the least it's going to cause cognitive-dissonance for people reading your code (including you after you've been away for a while) as they have to keep in mind that "dict doesn't mean what it normally does here". It's also quite likely that at some point you'll get an infuriating-to-solve bug reporting that dict objects aren't callable (or something), because some piece of code is trying to use the type dict, but is getting the dictionary object you bound to the name dict instead.
on the JSON direction there is also something called simpleJSON. My first time using json in python the json library didnt work for me/ i couldnt figure it out. simpleJSON was...easier to use
I'm doing some work evaluating log data that has been saved as JSON objects to a file.
To facilitate my work I have created 2 small python scripts that filter out logged entries according to regular expressions and print out multiple fields from an event.
Now I'd like to be able to evaluate simple mathematical operations when printing fields. This way I could just say something like
./print.py type download/upload
and it would print the type and the upload to download ratio. My problem is that I can't use eval() because the values are actually inside a dict.
Is there a simple solution?
eval optionally takes a globals and locals dictionaries. You can therefore do this:
namespace = dict(foo=5, bar=6)
print eval('foo*bar', namespace)
Keep in mind that eval is "evil" because it's not safe if the executed string cannot be trusted. It should be fine for your helper script though.
For completness, there's also ast.literal_eval() which is safer but it evaluates literals only which means there's no way to give it a dict.
You could pass the dict to eval() as the locals. That will allow it to resolve download and upload as names if they are keys in the dict.
>>> d = {'a': 1, 'b': 2}
>>> eval('a+b', globals(), d)
3
Construct is a DSL implemented in Python used to describe data structures (binary and textual). Once you have the data structure described, construct can parse and build it for you. Which is good ("DRY", "Declarative", "Denotational-Semantics"...)
Usage example:
# code from construct.formats.graphics.png
itxt_info = Struct("itxt_info",
CString("keyword"),
UBInt8("compression_flag"),
compression_method,
CString("language_tag"),
CString("translated_keyword"),
OnDemand(
Field("text",
lambda ctx: ctx._.length - (len(ctx.keyword) +
len(ctx.language_tag) + len(ctx.translated_keyword) + 5),
),
),
)
I am in need for such a tool for Haskell and
I wonder if something like this exists.
I know of:
Data.Binary: User implements parsing and building seperately
Parsec: Only for parsing? Only for text?
I guess one must use Template Haskell to achieve this?
I'd say it depends what you want to do, and if you need to comply with any existing format.
Data.Binary will (surprise!) help you with binary data, both reading and writing.
You can either write the code to read/write yourself, or let go of the details and generate the required code for your data structures using some additional tools like DrIFT or Derive. DrIFT works as a preprocessor, while Derive can work as a preprocessor and with TemplateHaskell.
Parsec will only help you with parsing text. No binary data (as easily), and no writing. Work is done with regular Strings. There are ByteString equivalents on hackage.
For your example above I'd use Data.Binary and write custom put/geters myself.
Have a look at the parser category at hackage for more options.
Currently (afaik) there is no equivalent to Construct in Haskell.
One can be implemented using Template Haskell.
I don't know anything about Python or Construct, so this is probably not what you are searching for, but for simple data structures you can always just derive read:
data Test a = I Int | S a deriving (Read,Show)
Now, for the expression
read "S 123" :: Test Double
GHCi will emit: S 123.0
For anything more complex, you can make an instance of Read using Parsec.