Well to directly explain it, I wish to "parse" user input. - Simple input string -> output string.
However there isn't any "logical" way to parse this apart from simply checking the input string against a dictionary (regex check). Now this isn't difficult at all - I'd just create a dictionary with regex search string keys & regex/function pointer values.
However the problem is: there will be around ~100-200 "keys" probably. And I can easily see myself wishing to add/remove keys (maybe merge) in the future.
So is there a way that creating such a dictionary looks "structured"? Keeping the "data" away from the "code". (Data would be the regex-functionname pairs)?
Store the dictionary in the JSON format in file, with the function names as ordinary strings. Demo how to load a JSON file:
Content of sample file:
{"somestring":"myfunction"}
Code:
import json
d = json.load(open('very_small_dic.txt', 'r'))
print(d) # {'somestring': 'myfunction'}
How to get the string:function mapping:
First you load the dictionary from a file as illustrated in the code above. After that, you build a new dictionary where the strings of the function names are replaced by the actual functions. Demo:
def myfunction(x):
return 2*x
d = {'somestring': 'myfunction'} # in the real code this came from json.load
d = {k:globals()[v] for k,v in d.items()}
print(d) # {'somestring': <function myfunction at 0x7f36e69d8c20>}
print(d['somestring'](42)) # 84
You could also store your functions in a separate file myfunctions.py and use getattr. This is probably a cleaner way than using globals.
import myfunctions # for this demo, this module only contains the function myfunction
d = {'somestring': 'myfunction'} # in the real code this came from json.load
d = {k:getattr(myfunctions,v) for k,v in d.items()}
print(d) # {'somestring': <function myfunction at 0x7f36e69d8c20>}
print(d['somestring'](42)) # 84
You could also use JsonSchema (http://json-schema.org/example1.html).
I think in order to transform values belonging to certain keys you'd certainly have to write a function to perform the conversions.
If you'd just like to sanitize the input based on the presence of certain keys - It would be ideal to convert it into a dictionary, define a schema and then check for the (required/optional fields)/the type or validate a field against a list of enums.
Related
I have a python dictionary with keys as dataset names and values as the entire data frames themselves, see the dictionary dict below
[Dictionary of Dataframes ]
One way id to write all the codes manually like below:
csv = dict['csv.pkl']
csv_emp = dict['csv_emp.pkl']
csv_emp_yr= dict['csv_emp_yr.pkl']
emp_wf=dict['emp_wf.pkl']
emp_yr_wf=dict['emp_yr_wf.pkl']
But this will get very inefficient with more number of datasets.
Any help on how to get this done over a loop?
Although I would not recommend this method but you can try this:
import sys
this = sys.modules[__name__] # this is now your current namespace
for key in dict.keys():
setattr(this, key, dict[key])
Now you can check new variables made with names same as keys of dictionary.
globals() has risk as it gives you what the namespace is currently pointing to but this can change and so modifying the return from globals() is not a good idea
List can also be used like (limited usecases):
dataframes = []
for key in dict.keys():
dataframes.append(dict[key])
Still this is your choice, both of the above methods have some limitations.
I have a piece of code that allows me to print dictionary items returned using .json() method on an XHR response from a website:
teamStatDicts = responser[u'teamTableStats']
for statDict in teamStatDicts:
print("{seasonId},{tournamentRegionId},{minsPlayed},"
.decode('cp1252').format(**statDict))
This prints in the following format:
9155,5,900
9155,5,820
...
...
...
9155,5,900
9155,5,820
The above method works fine, providing the keys in the dictionary never change. However in some of the XHR submissions I am making they do. Is there a way that I can print all dictionary values in exactly the same format as above? I've tried a few things but really didn't get anywhere.
In general, give a dict, you can do:
print(','.join(str(v) for v in dct.values()))
The problem here is you don't know the order of the values. i.e. is the first value in the CSV data the seasonId? Is the the tournamentRegionId? the minsPlayed? Or is it something else entirely that you don't know about?
So, my point is that unless you know the field names, you can't put them in a string in any reliable order if the data comes to you as vanilla dicts.
If you're decoding the XHR elsewhere using json, you could make the object_pairs_hook an OrderedDict:
from collections import OrderedDict
import json
...
data = json.loads(datastring, object_pairs_hook=OrderedDict)
Now the data is guaranteed to be in the same order as the datastring was, but that only helps if the data in datastring was ordered in a particular way (which is usually not the case).
I have a list of dictionaries that is encoded:
[u"{'name':'Tom', 'uid':'asdlfkj223'}", u"{'name':'Jerry', 'uid':'alksd32'}", ...]
Is there anyway I can create a list of just the values of the key name?
Even better if someone knows Django ORM well enough to pull down a list of a data/column with properties from a PostgreSQL database.
Thanks!
To get only that value for the name column from the DB table, use:
names = Person.objects.values_list('name', flat=True)
(as per https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.values_list)
otherwise, given
people = [{'name':'Tom', 'uid':'asdlfkj223'}, {'name':'Jerry', 'uid':'alksd32'},]
this should do the job:
names = [person['name'] for person in people]
And you should find out why your data items are strings (containing a string representation of a dict) to start with—it doesn't look like the way it's supposed to be.
Or, if you're actually storing dict's in your database as strings, either prefer JSON over the Python string representation, or if you must use the current format, the AST parsing solution provided in another question here should do the job.
You can use ast.literal_eval:
>>> data = [u"{'name':'Tom', 'uid':'asdlfkj223'}",u"{'name':'Jerry', 'uid':'alksd32'}"]
>>> import ast
>>> [ast.literal_eval(d)['name'] for d in data]
['Tom', 'Jerry']
What is the best way to generate a unique key for the contents of a dictionary. My intention is to store each dictionary in a document store along with a unique id or hash so that I don't have to load the whole dictionary from the store to check if it exists already or not. Dictionaries with the same keys and values should generate the same id or hash.
I have the following code:
import hashlib
a={'name':'Danish', 'age':107}
b={'age':107, 'name':'Danish'}
print str(a)
print hashlib.sha1(str(a)).hexdigest()
print hashlib.sha1(str(b)).hexdigest()
The last two print statements generate the same string. Is this is a good implementation? or are there any pitfalls with this approach? Is there a better way to do this?
Update
Combining suggestions from the answers below, the following might be a good implementation
import hashlib
a={'name':'Danish', 'age':107}
b={'age':107, 'name':'Danish'}
def get_id_for_dict(dict):
unique_str = ''.join(["'%s':'%s';"%(key, val) for (key, val) in sorted(dict.items())])
return hashlib.sha1(unique_str).hexdigest()
print get_id_for_dict(a)
print get_id_for_dict(b)
I prefer serializing the dict as JSON and hashing that:
import hashlib
import json
a={'name':'Danish', 'age':107}
b={'age':107, 'name':'Danish'}
# Python 2
print hashlib.sha1(json.dumps(a, sort_keys=True)).hexdigest()
print hashlib.sha1(json.dumps(b, sort_keys=True)).hexdigest()
# Python 3
print(hashlib.sha1(json.dumps(a, sort_keys=True).encode()).hexdigest())
print(hashlib.sha1(json.dumps(b, sort_keys=True).encode()).hexdigest())
Returns:
71083588011445f0e65e11c80524640668d3797d
71083588011445f0e65e11c80524640668d3797d
No - you can't rely on particular order of elements when converting dictionary to a string.
You can, however, convert it to sorted list of (key,value) tuples, convert it to a string and compute a hash like this:
a_sorted_list = [(key, a[key]) for key in sorted(a.keys())]
print hashlib.sha1( str(a_sorted_list) ).hexdigest()
It's not fool-proof, as a formating of a list converted to a string or formatting of a tuple can change in some future major python version, sort order depends on locale etc. but I think it can be good enough.
A possible option would be using a serialized representation of the list that preserves order. I am not sure whether the default list to string mechanism imposes any kind of order, but it wouldn't surprise me if it were interpreter-dependent. So, I'd basically build something akin to urlencode that sorts the keys beforehand.
Not that I believe that you method would fail, but I'd rather play with predictable things and avoid undocumented and/or unpredictable behavior. It's true that despite "unordered", dictionaries end up having an order that may even be consistent, but the point is that you shouldn't take that for granted.
I'm trying to build a matrix that holds several values in several levels.
I'm trying to generate a dictionary build up like this:
{'routername':{'channel':{'01':<value>,'02':<value>}}}
The number of keys on the highest level may vary.
The script is generating a list of available routers and another list of available channels.
I wrote a rather cumbersome function that test for a key and if it is not already there, it adds the key to the dictionary.
So, I was wondering if there isn't an easy way to create a dictionary with empty values for the keys in list 'routers'
def AddToChart(passed_seq):
try:
if str(passed_seq[0]) in chart_dict:
if str(passed_seq[1]) in chart_dict[passed_seq[0]]:
if str(passed_seq[2]) in chart_dict[passed_seq[0]][passed_seq[1]]:
if str(passed_seq[3]) in chart_dict[passed_seq[0]][passed_seq[1]][passed_seq[2]]:
chart_dict[passed_seq[0]][passed_seq[1]][passed_seq[2]][passed_seq[3]].update(err_sub_dict)
else:
chart_dict[passed_seq[0]][passed_seq[1]][passed_seq[2]].update({passed_seq[3]:{}})
chart_dict[passed_seq[0]][passed_seq[1]][passed_seq[2]][passed_seq[3]].update(err_sub_dict)
else:
chart_dict[passed_seq[0]][passed_seq[1]].update({passed_seq[2]:{passed_seq[3]:{}}})
chart_dict[passed_seq[0]][passed_seq[1]][passed_seq[2]][passed_seq[3]].update(err_sub_dict)
else:
chart_dict[passed_seq[0]].update({passed_seq[1]:{passed_seq[2]:{passed_seq[3]:{}}}})
chart_dict[passed_seq[0]][passed_seq[1]][passed_seq[2]][passed_seq[3]].update(err_sub_dict)
else:
chart_dict.update({passed_seq[0]:{passed_seq[1]:{passed_seq[2]:{passed_seq[3]:{}}}}})
chart_dict[passed_seq[0]][passed_seq[1]][passed_seq[2]][passed_seq[3]].update(err_sub_dict)
except ValueError:
print "AddToChart: ",err_sub_dict,sys.exc_info()[1][0]
except:
print sys.exc_info()
print "AddToChart: variable not defined: " + str(passed_seq)
I suggest using a nested collections.defaultdict for chart_dict. It lets you provide a factory function to set up new values, so any key you request will always work. It's a little tricky to get such a deeply nested structure set up, but I think the following will do the right thing for your four-level structure (I'm assuming your <value> items are also dictionaries, as it seems your current code expects):
chart_dict = defaultdict(lambda:defaultdict(lambda:defaultdict(dict)))
With that in place, you should then be able to do the following without worrying about whether any of the keys previously referenced anything in the dictionary:
a, b, c = passed_seq
chart_dict[a][b][c].update(err_sub_dict)
I'd suggest doing something like the variable unpacking above too, though you should probably use better names than a, b, and c. Good variable names can turn something incomprehensible into something easy to grasp.
You should use
dict.setdefault()
See docs.
Example:
d = {}
d = d.setdefault("k","eggs")
>> d["k"]
eggs
d2 = {"k":1}
d2 = d2.setdefault("k","spam")
>> d2["k"]
1