How do I put a dictionary in the datastore? - python

Is there a good way to store a Python dictionary in the datastore? I want to do something like the following:
from google.appengine.ext import db
class Recipe(db.Model):
name = db.StringProperty()
style = db.StringProperty()
yeast = db.StringProperty()
hops = db.ListofDictionariesProperty()
Of course, that last line doesn't actually work. I need hops to be a list of key-value pairs, where the key is always a string and the value can be a string, int, or float, but I can't see anything in that would allow me to do that in the Property classes.

Serializing a dict with repr is a good way to do it. You can then reconstitute it with eval, or if you don't trust the data, a "safe eval".
An advantage of repr over pickling is that the data is readable in the database, even queryable in desperate cases.

You can use json

You could pickle the dictionary and store it as a StringProperty.

I'm pretty sure there's no way to store a Python dictionary. But why not just place what you'd like in hops as a second model?
Also, as mentioned by John you could use pickle, but (and correct me if I'm wrong) store it as a Blob value instead.

Your options are basically to use pickle, to use a db.Expando and make each key in the dict a separate property, or to have a StringListProperty of keys and one of values and zip() them back to a dict when reading.

You can use JsonProperty.
Value is a Python object (such as a list or a dict or a string) that is serializable using Python's json module; Cloud Datastore stores the JSON serialization as a blob. Unindexed by default.
Optional keyword argument: compressed.
from google.appengine.ext import ndb
class Article(ndb.Model):
title = ndb.StringProperty(required=True)
stars = ndb.IntegerProperty()
tags = ndb.StringProperty(repeated=True)
info = ndb.JsonProperty()

I did it like this:
class MyEntity(db.Model):
dictionary_string = db.StringProperty()
payload = {{}...{}}
# Store dict
my_entity = MyEntity(key_name=your_key_here)
my_entity.dictionary_string = str(payload)
my_entity.put()
# Get dict
import ast
my_entity_k = db.Key.from_path('MyEntity', your_key_here)
my_entity = db.get(my_entity_k)
payload = ast.literal_eval(my_entity.dictionary_string)

Related

OrderedDict of OrderedDict and storing data in YAML Issues

So basically I have an app I'm making that has user data which I want to backup and load in the database. I'm storing the data in yml files. Now, a user has posts. Each post has a timestamp, text and tags. I want to use an ordereddictionary in order to retain order when I write the data in the YAML files. Currently, I'm doing something like this:
def get_posts(user):
posts_arr = []
for post in user.posts.all():
temparr = OrderedDict()
temparr['timestamp'] = post.created_at.strftime("%Y-%m-%d %H:%M %p")
temparr['text'] = post.text
temparr['tags'] = (',').join(list(post.tags.all().values_list('field',flat=True)))
posts_arr.append(temparr)
return posts_arr
As you can see, I'm using an array of orderectionaries and that I think is the reason my posts for each user are not ordered. How can I resolve this issue.
I am returning this posts_arr object to be stored within another ordereddictionary.
Also, I since the posts text is kind of nested and is a large block of text, I want to make sure that text is also stored in string literal block styles.
Basically, your issue is a misunderstanding on how ordered dictionaries work in python. The python documentation states that an OrderedDict is a:
dict subclass that remembers the order entries were added
https://docs.python.org/3/library/collections.html#module-collections
Personally, I'd recommend a list of dictionaries created from a pre-sorted list of posts. In this case, it would look something like this if we were to keep the majority of your code as-is:
def get_posts(user):
posts_arr = []
sorted_posts = sorted(user.posts.all(), key=(lambda post: post.created_at)) # Sorts the posts based on their created_at date
for post in sorted_posts:
temparr = dict()
temparr['timestamp'] = post.created_at.strftime("%Y-%m-%d %H:%M %p")
temparr['text'] = post.text
temparr['tags'] = (',').join(list(post.tags.all().values_list('field',flat=True)))
posts_arr.append(temparr)
return posts_arr
You could use list comprehensions to build this list from the sorted one like chepner suggested, but I don't want to change too much.
Use an ordinary dict (or OrderedDict if you really need to) for each post, and use a list for the collection of all posts. Once you do that, it's a short jump to using a list comprehension to define the return value directly.
def get_posts(user):
return [{
'timestamp': post.created_at.strftime("%Y-%m-%d %H:%M %p"),
'text': post.text,
'tags': ','.join(list(post.tags.all().values_list('field', flat=True)))
} for post in user.posts.all()]

How to retrieve key value pairs from URL using python

I am working on a python 2.7 script such that given an URL with certain number of key-value pairs (not fixed numer of them), retrieves this values on a json structure.
This is what I have done so far:
from furl import furl
url = "https://search/address?size=10&city=Madrid&offer_type=1&query=Gran%20v"
f = furl(url)
fields = ['size', 'city', 'offer_type', 'query']
l = []
l.append(f.args['size'])
l.append(f.args['city'])
l.append(f.args['offer_type'])
l.append(f.args['query'])
body = {
fields[0]: f.args[fields[0]],
fields[1]: f.args[fields[1]],
fields[2]: f.args[fields[2]],
fields[3]: f.args[fields[3]]
}
This code works, but just for the case in which I know that there will be 4 key-value pairs, and the names of those pairs. I do not know how to face the problem if, for example, my url is shorter or larger.
Using this command length = len(f.args) I can obtain the number of pairs, but no idea of how to extract the key names from the f.args object
Thank you very much,
Álvaro
I'm slightly confused... f.args is already a dictionary-like object of the type you want. If you want to explicitly convert it to a dictionary, you can use:
body = dict(f.args)
But even this seems unnecessary. If you want a new copy of the object so that you can change it without affecting the original instance, you can call the .copy() method.
is this what you're looking for?
from furl import furl
url = "https://search/address?size=10&city=Madrid&offer_type=1&query=Gran%20v"
f = furl(url)
print zip(f.args.keys(),f.args.values())
Output:
[('size', '10'), ('city', 'Madrid'), ('offer_type', '1'), ('query', 'Gran v')]
The furl library is not especially well documented, but digging through the source code shows that f.args is a property that redirects eventually to an orderedmultidict.omdict object. This supports all the standard dictionary methods, in addition to lots more interesting stuff (also not well documented).
You can therefore just use f.args wherever you need body. If you need a copy for some reason, do f.args.copy(), or possibly dict(f.args).

Is parsing a json naively into a Python class or struct secure?

Some background first: I have a few rather simple data structures which are persisted as json files on disk. These json files are shared between applications of different languages and different environments (like web frontend and data manipulation tools).
For each of the files I want to create a Python "POPO" (Plain Old Python Object), and a corresponding data mapper class for each item should implement some simple CRUD like behavior (e.g. save will serialize the class and store as json file on disk).
I think a simple mapper (which only knows about basic types) will work. However, I'm concerned about security. Some of the json files will be generated by a web frontend, so a possible security risk if a user feeds me some bad json.
Finally, here is the simple mapping code (found at How to convert JSON data into a Python object):
class User(object):
def __init__(self, name, username):
self.name = name
self.username = username
import json
j = json.loads(your_json)
u = User(**j)
What possible security issues do you see?
NB: I'm new to Python.
Edit: Thanks all for your comments. I've found out that I have one json where I have 2 arrays, each having a map. Unfortunately this starts to look like it gets cumbersome when I get more of these.
I'm extending the question to mapping a json input to a recordtype. The original code is from here: https://stackoverflow.com/a/15882054/1708349.
Since I need mutable objects, I'd change it to use a namedlist instead of a namedtuple:
import json
from namedlist import namedlist
data = '{"name": "John Smith", "hometown": {"name": "New York", "id": 123}}'
# Parse JSON into an object with attributes corresponding to dict keys.
x = json.loads(data, object_hook=lambda d: namedlist('X', d.keys())(*d.values()))
print x.name, x.hometown.name, x.hometown.id
Is it still safe?
There's not much wrong that can happen in the first case. You're limiting what arguments can be provided and it's easy to add validation/conversion right after loading from JSON.
The second example is a bit worse. Packing things into records like this will not help you in any way. You don't inherit any methods, because each type you define is new. You can't compare values easily, because dicts are not ordered. You don't know if you have all arguments handled, or if there is some extra data, which can lead to hidden problems later.
So in summary: with User(**data), you're pretty safe. With namedlist there's space for ambiguity and you don't really gain anything. (compared to bare, parsed json)
If you blindly accept users json input without sanity check, you are at risk of become json injection victim.
See detail explanation of json injection attack here: https://www.acunetix.com/blog/web-security-zone/what-are-json-injections/
Besides security vulnerability, parse JSON to Python object this way is not type safe.
With your example of User class, I would assume you expect both fields name and username to be string type. What if the json input is like this:
{
"name": "my name",
"username": 1
}
j = json.loads(your_json)
u = User(**j)
type(u.username) # int
You have gotten an object with unexpected type.
One solution to make sure type safe is to use json schema to validate input json. more about json schema: https://json-schema.org/

StringListProperty limited to 500 char strings (Google App Engine / Python)

It seems that StringListProperty can only contain strings up to 500 chars each, just like StringProperty...
Is there a way to store longer strings than that? I don't need them to be indexed or anything. What I would need would be something like a "TextListProperty", where each string in the list can be any length and not limited to 500 chars.
Can I create a property like that? Or can you experts suggest a different approach? Perhaps I should use a plain list and pickle/unpickle it in a Blob field, or something like that? I'm a bit new to Python and GAE and I would greatly appreciate some pointers instead of spending days on trial and error...thanks!
Alex already answered long ago, but in case someone else comes along with the same issue:
You'd just make item_type equal to db.Text (as OP mentions in a comment).
Here's a simple example:
from google.appengine.ext import db
class LargeTextList(db.Model):
large_text_list = db.ListProperty(item_type=db.Text)
def post(self):
# get value from a POST request,
# split into list using some delimiter
# add to datastore
L = self.request.get('large_text_list').split() # your delimiter here
LTL = [db.Text(i) for i in L]
new = LargeTextList()
new.large_text_list = LTL
new.put()
def get(self):
# return one to make sure it's working
query = LargeTextList.all()
results = query.fetch(limit=1)
self.render('index.html',
{ 'results': results,
'title': 'LargeTextList Example',
})
You can use a generic ListProperty with an item_type as you require (str, or unicode, or whatever).

List of non-datastore types in AppEngine?

I'm building an AppEngine model class. I need a simple list of tuples:
class MyTuple(object):
field1 = "string"
field2 = 3
class MyModel(db.Model):
the_list = db.ListProperty(MyTuple)
This does not work, since AppEngine does not accept MyTuple as a valid field.
Solutions I can think of:
Make MyTuple extend db.Model. But doesn't that mean every entry in the list will be stored in a dedicated MyTuple table?
Make it a list of strings, which are a "serialized" form of MyTuple; add parsing (unserializing) code. Yuck.
Maintain two lists (one of strings, one of ints). Another yuck.
Any other solution that I'm missing?
In app-engine-patch there's a FakeModelListProperty and FakeModel (import both from ragendja.dbutils). Derive MyTuple from FakeModel and set fields = ('field1', 'field2'). Those fields will automatically get converted to JSON when stored in the list, so you could manually edit them in a textarea. Of course, this only works for primitive types (strings, integers, etc.). Take a look at the source if this doesn't suffice.
http://code.google.com/p/app-engine-patch/

Categories

Resources