I have a problem that I would like to know how to efficiently tackle.
I have data that is JSON-formatted (used with dumps / loads) and contains unicode.
This is part of a protocol implemented with JSON to send messages. So messages will be sent as strings and then loaded into python dictionaries. This means that the representation, as a python dictionary, afterwards will look something like:
{u"mykey": u"myVal"}
It is no problem in itself for the system to handle such structures, but the thing happens when I'm going to make a database query to store this structure.
I'm using pyOrient towards OrientDB. The command ends up something like:
"CREATE VERTEX TestVertex SET data = {u'mykey': u'myVal'}"
Which will end up in the data field getting the following values in OrientDB:
{'_NOT_PARSED_': '_NOT_PARSED_'}
I'm assuming this problem relates to other cases as well when you wish to make a query or somehow represent a data object containing unicode.
How could I efficiently get a representation of this data, of arbitrary depth, to be able to use it in a query?
To clarify even more, this is the string the db expects:
"CREATE VERTEX TestVertex SET data = {'mykey': 'myVal'}"
If I'm simply stating the wrong problem/question and should handle it some other way, I'm very much open to suggestions. But what I want to achieve is to have an efficient way to use python2.7 to build a db-query towards orientdb (using pyorient) that specifies an arbitrary data structure. The data property being set is of the OrientDB type EMBEDDEDMAP.
Any help greatly appreciated.
EDIT1:
More explicitly stating that the first code block shows the object as a dict AFTER being dumped / loaded with json to avoid confusion.
Dargolith:
ok based on your last response it seems you are simply looking for code that will dump python expression in a way that you can control how unicode and other data types print. Here is a very simply function that provides this control. There are ways to make this function more efficient (for example, by using a string buffer rather than doing all of the recursive string concatenation happening here). Still this is a very simple function, and as it stands its execution is probably still dominated by your DB lookup.
As you can see in each of the 'if' statements, you have full control of how each data type prints.
def expr_to_str(thing):
if hasattr(thing, 'keys'):
pairs = ['%s:%s' % (expr_to_str(k),expr_to_str(v)) for k,v in thing.iteritems()]
return '{%s}' % ', '.join(pairs)
if hasattr(thing, '__setslice__'):
parts = [expr_to_str(ele) for ele in thing]
return '[%s]' % (', '.join(parts),)
if isinstance(thing, basestring):
return "'%s'" % (str(thing),)
return str(thing)
print "dumped: %s" % expr_to_str({'one': 33, 'two': [u'unicode', 'just a str', 44.44, {'hash': 'here'}]})
outputs:
dumped: {'two':['unicode', 'just a str', 44.44, {'hash':'here'}], 'one':33}
I went on to use json.dumps() as sobolevn suggested in the comment. I didn't think of that one at first since I wasn't really using json in the driver. It turned out however that json.dumps() provided exactly the formats I needed on all the data types I use. Some examples:
>>> json.dumps('test')
'"test"'
>>> json.dumps(['test1', 'test2'])
'["test1", "test2"]'
>>> json.dumps([u'test1', u'test2'])
'["test1", "test2"]'
>>> json.dumps({u'key1': u'val1', u'key2': [u'val21', 'val22', 1]})
'{"key2": ["val21", "val22", 1], "key1": "val1"}'
If you need to take more control of the format, quotes or other things regarding this conversion, see the reply by Dan Oblinger.
Related
Im trying to retrieve data from a database named RethinkDB, they output JSON when called with r.db("Databasename").table("tablename").insert([{ "id or primary key": line}]).run(), when doing so it outputs [{'id': 'ValueInRowOfid\n'}] and I want to parse that to just the value eg. "ValueInRowOfid". Ive tried with JSON in Python, but I always end up with the typeerror: list indices must be integers or slices, not str, and Ive been told that it is because the Database outputs invalid JSON format. My question is how can a JSON format be invalid (I cant see what is invalid with the output) and also what would be the best way to parse it so that the value "ValueInRowOfid" is left in a Operator eg. Value = ("ValueInRowOfid").
This part imports the modules used and connects to RethinkDB:
import json
from rethinkdb import RethinkDB
r = RethinkDB()
r.connect( "localhost", 28015).repl()
This part is getting the output/value and my trial at parsing it:
getvalue = r.db("Databasename").table("tablename").sample(1).run() # gets a single row/value from the table
print(getvalue) # If I print that, it will show as [{'id': 'ValueInRowOfid\n'}]
dumper = json.dumps(getvalue) # I cant use `json.loads(dumper)` as JSON object must be str. Which the output of the database isnt (The output is a list)
parsevalue = json.loads(dumper) # After `json.dumps(getvalue)` I can now load it, but I cant use the loaded JSON.
print(parsevalue["id"]) # When doing this it now says that the list is a str and it needs to be an integers or slices. Quite frustrating for me as it is opposing it self eg. It first wants str and now it cant use str
print(parsevalue{'id'}) # I also tried to shuffle it around as seen here, but still the same result
I know this is janky and is very hard to comprehend this level of stupidity that I might be on. As I dont know if it is the most simple problem or something that just isnt possible (Which it should or else I cant use my data in the database.)
Thank you for reading this through and not jumping straight into the comments and say that I have to read the JSON documentation, because I have and I havent found a single piece that could help me.
I tried reading the documentation and watching tutorials about JSON and JSON parsing. I also looked for others whom have had the same problems as me and couldnt find.
It looks like it's returning a dictionary ({}) inside a list ([]) of one element.
Try:
getvalue = r.db("Databasename").table("tablename").sample(1).run()
print(getvalue[0]['id'])
Studying Python, I am following an excellent Corey Schafer tutorial on Flask, he does this (I have extracted and summarized it for obvious reasons):
from folder_app import app # I did it to follow the structure and that the code is equal to the original
s = Serializer(app.config['SECRET_KEY'], 1800) # key, seconds
token = s.dumps({'user_id': 1}).decode('utf-8')
s = Serializer(app.config['SECRET_KEY'])
user_id = s.loads(token)['user_id'] # This is where I have the doubt
print(user_id)
print(type(s.loads(token)))
The code works, the problem I have is that although as you can see (s.loads (token)) is a dict, I expected to see something like this s.loads ({token ['user_id']}), or s.loads (token ['user_id']) or something like that. That is, it is a dict but it does not seem so. And my doubt goes in the sense if this comes from a greater concept of those they call "pythonic" (which I have not seen so far), or is something that only happens particularly as in this case. Incidentally, https://itsdangerous.palletsprojects.com/en/1.1.x/jws/ this appears: loads (self, s, salt = None, return_header = False) the arguments are in parentheses. I hope it is clear what my doubt is :)
I know this is not answer per say but just to add to my comment. This is an example of how the loads function works on dictionaries with the json module. https://docs.python.org/3/library/json.html#json.loads. What it does is take a json string and return the dictionary type object in Python. Your Serializer is doing something similar. It takes the token string and represents it as an object like dict
The s.dumps I am assuming is similar to json.dumps which gives you the json string representation of python dictionary.
import json
my_dict = json.loads('{"user_id": "Mane", "name": "Joe"}')
my_dict['user_id']
So you could just do json.loads('{"user_id": "Mane", "name": "Joe"}')['user_id'] which is just chaining the operations.
I get back an XML object from a mssql server when I call a SP from Python (2.7). I get it in the following form:
{u'XML_F52E2B61-18A1-11d1-B105-00805F49916B': 'D\x02i\x00d\x00D\x05d\x00e\x00s\x00c\x00r\x00D\x0bd\x00a\x00t\x00a\x00t\x00y\x00p\x00e\x00_\x00i\x00d\x00D\x13e\x00n\x00u\x00m\x00e\x00r\x00a\x00t\x00i\x00o\x00n\x00_\x00t\x00y\x00p\x00e\x00_\x00i\x00d\x00D\rs\x00y\x00s\x00t\x00e\x00m\x00f\x00e\x00a\x00t\x00u\x00r\x00e\x00D\x04l\x00i\x00n\x00k\x00D\x07F\x00e\x00a\x00t\x00u\x00r\x00e\x00\x01\x00\x08F\x00e\x00a\x00t\x00u\x00r\x00e\x00S\x00A\x01\x07A\x01\x01A\x03B\x01\x00\x00\x00\x81\x01\x01\x02A\x03\x11\x1a\x00r\x00e\x00s\x00p\x00o\x00n\x00d\x00e\x00n\x00t\x00_\x00i\x00d\x00\x81\x02\x01\x03A\x03B\x01\x00\x00\x00\x81\x03\x01\x05A\x03F\x01\x81\x05\x01\x06A\x03F\x00\x81\x06\x81\x07\x01\x07A\x01\x01A\x03B\x02\x00\x00\x00\x81\x01\x01\x02A\x03\x11 \x00W\x00o\x00r\x00k\x00s\x00 \x00a\x00t\x00 \x00c\x00o\x00m\x00p\x00a\x00n\x00y\x00\x81\x02\x01\x03A\x03B\x01\x00\x00\x00\x81\x03\x01\x05A\x03F\x01\x81\x05\x01\x06A\x03F\x01\x81\x06\x81\x07\x01\x07A\x01\x01A\x03B\x03\x00\x00\x00\x81\x01\x01\x02A\x03\x11\x0c\x00G\x00e\x00n\x00d\x00e\x00r\x00\x81\x02\x01\x03A\x03B\x08\x00\x00\x00\x81\x03\x01\x04A\x03B\x01\x00\x00\x00\x81\x04\x01\x05A\x03F\x00\x81\x05\x01\x06A\x03F\x00\x81\x06\x81\x07\x81\x00\x08F\x00e\x00a\x00t\x00u\x00r\x00e\x00S\x00'}
I have two questions:
1: What encoding is this?
2: What library should I use to decode this?
Addition:
The XML as it shows in the SQL Management Studio:
The SP:
ALTER PROCEDURE [dbo].[rdb_sql2python]
AS
BEGIN
SET NOCOUNT ON
SELECT * FROM [_rdb].[dbo].[features] FOR XML RAW ('Feature'), ROOT ('FeatureS'), ELEMENTS
SET NOCOUNT OFF
END
I try something like an answer, at least to the question: What is this:
At this JSON-viewer your string as you presented it did not work. But when I removed the "u", replaced the single quotes with double quotes and removed the "D" it worked somehow:
This string
{"XML_F52E2B61-18A1-11d1-B105-00805F49916B":
"\x02i\x00d\x00D\x05d\x00e\x00s\x00c\x00r\x00D\x0bd\x00a\x00t\x00a\x00t\x00y\x00p\x00e\x00_\x00i\x00d\x00D\x13e\x00n\x00u\x00m\x00e\x00r\x00a\x00t\x00i\x00o\x00n\x00_\x00t\x00y\x00p\x00e\x00_\x00i\x00d\x00D\rs\x00y\x00s\x00t\x00e\x00m\x00f\x00e\x00a\x00t\x00u\x00r\x00e\x00D\x04l\x00i\x00n\x00k\x00D\x07F\x00e\x00a\x00t\x00u\x00r\x00e\x00\x01\x00\x08F\x00e\x00a\x00t\x00u\x00r\x00e\x00S\x00A\x01\x07A\x01\x01A\x03B\x01\x00\x00\x00\x81\x01\x01\x02A\x03\x11\x1a\x00r\x00e\x00s\x00p\x00o\x00n\x00d\x00e\x00n\x00t\x00_\x00i\x00d\x00\x81\x02\x01\x03A\x03B\x01\x00\x00\x00\x81\x03\x01\x05A\x03F\x01\x81\x05\x01\x06A\x03F\x00\x81\x06\x81\x07\x01\x07A\x01\x01A\x03B\x02\x00\x00\x00\x81\x01\x01\x02A\x03\x11
\x00W\x00o\x00r\x00k\x00s\x00 \x00a\x00t\x00
\x00c\x00o\x00m\x00p\x00a\x00n\x00y\x00\x81\x02\x01\x03A\x03B\x01\x00\x00\x00\x81\x03\x01\x05A\x03F\x01\x81\x05\x01\x06A\x03F\x01\x81\x06\x81\x07\x01\x07A\x01\x01A\x03B\x03\x00\x00\x00\x81\x01\x01\x02A\x03\x11\x0c\x00G\x00e\x00n\x00d\x00e\x00r\x00\x81\x02\x01\x03A\x03B\x08\x00\x00\x00\x81\x03\x01\x04A\x03B\x01\x00\x00\x00\x81\x04\x01\x05A\x03F\x00\x81\x05\x01\x06A\x03F\x00\x81\x06\x81\x07\x81\x00\x08F\x00e\x00a\x00t\x00u\x00r\x00e\x00S\x00"}
converts to
Name: XML_F52E2B61-18A1-11d1-B105-00805F49916B
Value: "idDdescrDdatatype_idDenumeration_type_idD systemfeatureDlinkDFeatureFeatureSAAABArespondent_idABAFAFAABA Works at companyABAFAFAABAGenderABABAFAFFeatureS"
This is - for sure - not the final solution, but it's clear, that this is BSON encoded JSON.
It might be a good idea to show (the relevant parts of) you(r) SP and the way you are calling this. Might be, that there is a completely different / better approach...
I'm writing API for a mongo database. I need to pass JSON object as GET parameter:
example.com/api/obj/list/1/?find={"foo":"bar"}
How should I organize this better?
I thought about using JSON-like objects without quotes and spaces, for example:
{$or:[{a:foo+bar},{b:2}]}
So is there any tools to parse it in Python/Django?
It should be fine as long as the JSON objects aren't too big, they don't contain sensitive data (it sucks to see your password in your browser history) and you URL-escape them.
Unfortunately, you have to take shortcuts if you want to have a human-readable JSON parameter. All JSON brackets ({, }, [, ]) are recommended for escaping. You don't have to escape them, but you are taking a risk if you don't. More annoying is the :, which is ubiquitous in JSON and must be escaped.
If you want human-readable query strings, then the sensible solution is to encode all query parameters explicitly. A compromise that might work quite well is to unpack the top-level JSON object into explicit query parameters, each of remains JSON-encoded. Going a small step further, you could drop any top-level delimiters that remain, e.g.:
JSON: {"foo":"bar", "items":[1, 2, 3], "staff":{"id":432, "first":"John", "last":"Doe"}}
Query: foo=bar&items=1,2,3&staff="id"%3A432,"first"%3A"John","last"%3A"Doe"
Since you know that foo is a string, items is an array and staff is an object, you can rehydrate the JSON syntax correctly before sending the lot to a JSON parser.
I'd like to set a cookie via Django with that has several different values to it, similar to .NET's HttpCookie.Values property. Looking at the documentation, I can't tell if this is possible. It looks like it just takes a string, so is there another way?
I've tried passing it an array ([10, 20, 30]) and dictionary ({'name': 'Scott', 'id': 1}) but they just get converted to their string format. My current solution is to just use an arbitrary separator and then parse it when reading it in, which feels icky. If multi-values aren't possible, is there a better way? I'd rather not use lots of cookies, because that would get annoying.
.NETs multi-value cookies work exactly the same way as what you're doing in django using a separator. They've just abstracted that away for you. What you're doing is fine and proper, and I don't think Django has anything specific to 'solve' this problem.
I will say that you're doing the right thing, in not using multiple cookies. Keep the over-the-wire overhead down by doing what you're doing.
If you're looking for something a little more abstracted, try using sessions. I believe the way they work is by storing an id in the cookie that matches a database record. You can store whatever you want in it. It's not exactly the same as what you're looking for, but it could work if you don't mind a small amount of db overhead.
(Late answer!)
This will be bulkier, but you call always use python's built in serializing.
You could always do something like:
import pickle
class MultiCookie():
def __init__(self,cookie=None,values=None):
if cookie != None:
try:
self.values = pickle.loads(cookie)
except:
# assume that it used to just hold a string value
self.values = cookie
elif values != None:
self.values = values
else:
self.values = None
def __str__(self):
return pickle.dumps(self.values)
Then, you can get the cookie:
newcookie = MultiCookie(cookie=request.COOKIES.get('multi'))
values_for_cookie = newcookie.values
Or set the values:
mylist = [ 1, 2, 3 ]
newcookie = MultiCookie(values=mylist)
request.set_cookie('multi',value=newcookie)
Django does not support it. The best way would be to separate the values with arbitrary separator and then just split the string, like you already said.