Mongodb is returning wierdly formatted data - python

When i try to get this data from my mongodb database using flask-restfuland pymongo i get some wierdly formatted data.
For example.
This is what the data looks like in the database.
{ "_id" : ObjectId("5217f3cc7466c06862c4a4f7"), "Hello" : "World" }
This is what it looks like when it gets returned from the database.
"{\"_id\": {\"$oid\": \"5217f3cc7466c06862c4a4f7\"}, \"Hello\": \"World\"}"
Using this code:
def toJSON(data):
return json.dumps(data, default=json_util.default)
And this:
def get(self, objectid):
collection = db["products"]
result = collection.find_one({"_id": ObjectId(objectid)})
return toJSON(result)
Anyone know what i'm doing wrong?

No, that's supposed to be like that.
MongoDB uses BSON, which extends JSON with some extra types, such as ObjectId. To represent those in JSON, you get the weird-looking $oid and friends.
The backslashes are most likely added by some tool to allow for quotes inside of a String literal (which is enclosed by quotes). Unless you are somehow double-encoding things.

flask-restful expects you to return a dictionary and not json here. It would convert the dictionary into json on its own. So your code should look like
def get(self, objectid):
collection = db["products"]
result = collection.find_one({"_id": ObjectId(objectid)})
result['_id'] = result['_id'].__str__()
return result
When you return json flask-restful sees that and infers that it is a string and escapes the double quotes.

Related

Unable to get records using Django ORM

Problem:
Trying to get a record using Django ORM, from a table that contains a JSON field, I'm using the following line:
test_object = House.objects.get(id=301)
Error
TypeError: the JSON object must be str, bytes or bytearray, not dict
Possible issue
Noticed that a previous developer updated the format of the JSON field in the table, seems that JSON had a bad format. Script used to format the JSON column:
for i in data:
jsonstring = json.dumps(i.result)
new_clear = jsonstring.replace("\\", "")
new_clear = jsonstring.replace("NaN", "null")
i.result = json.loads(new_clear)
i.save()
Comments
In pgAdmin the JSON field looks good and it is formatted properly, see a partial copy of the JSON below:
{"owner_id": 45897, "first_name": "John", "last_name": "DNC", "estate_id": 3201, "sale_date": "3/18/19", "property_street": "123 main st", "property_city": "Miami", "property_state": "FL", "property_zipcode": 33125, "Input_First_Name": "John", "Input_Last_Name": "DNC"}
I would like to know how to deal with this JSON field in order to query the object. Any help will be appreciated. Thanks.
Check if there's a custom decoder being used in the field (docs reference).
If the json data is valid in db, try connecting to db in shell using psycopg2.connect(), query, and decode using json.loads().
I'd intended to post this as a comment, but not enough reputation, in case this is of any concern.

How to deserialize relaxed extended json or simplified json string in Python and get bson objects where appropriate?

I have a simplified json string such as follows:
j = '{"_id": {"_id": "5923e0e8bf681d1000abea4c", "copyingData": true}, "currency": "USD"}'
What I'd like to do is to deserialize it so that in return, I'd have a dictionary with _id as bson.objectid an currency as string - similar to the original document retrieved using pymongo.
How do I go about it?
I tried to use bson.json_util.loads with various arguments but it just loaded it as a simple json (so that _id is a dict)
Thank you!

Having a special character such as a period in a python key

I'm trying to generate sql insert statements using sqlalchemy like this.
def get_sql(self):
"""Returns SQL as String"""
baz_ins = baz.insert().values(
Id=self._id,
Foo.Bar=self.foo_dot_bar,
)
return str(baz_ins.compile(dialect=mysql.dialect(),
compile_kwargs={"literal_binds": True}))
This returns keyword cannot be expression. Escaping the period like \. also doesn't work.
One solution I came up with is using FooDOTBar instead of Foo.Bar and then replacing all "DOT" with "." in the generated sql files, this corrupts some other data though and is not optimal. Any better suggestions to deal with this from the ground up?
In your query, values can be assigned with dictionary. So you can do something like:
baz_ins = baz.insert().values({"Id": self._id, "Foo.Bar": self.foo_dot_bar})
Check out the insert documentation for more options
There's an alternate form of .values where you pass in a dict instead:
baz.insert().values({
"Id": self._id,
"Foo.Bar": self.foo_dot_bar,
})

python convert unicode to readable character

I am using python 2.7 and psycopg2 for connecting to postgresql
I read a bunch of data from a source which has strings like 'Aéropostale'. I then store it in the database. However, in postgresql it is ending up as 'A\u00e9ropostale'. But I want it to get stored as 'Aéropostale'.
The encoding of postgresql database is utf-8.
Please tell me how can I store the actual string 'Aéropostale' instead.
I suspect that the problem is happening in python. Please advice.
EDIT:
Here is my data source
response_json = json.loads(response.json())
response is obtained via service call and looks like:
print(type(response.json())
>> <type'str'>
print(response.json())
>> {"NameRecommendation": ["ValueRecommendation": [{"Value": "\"Handmade\""}, { "Value": "Abercrombie & Fitch"}, {"Value": "A\u00e9ropostale"}, {"Value": "Ann Taylor"}}]
From the above data, my goal is to construct a list of all ValueRecommendation.Value and store in a postgresql json datatype column. So the python equivalent list that I want to store is
py_list = ["Handmade", "Abercrombie & Fitch", "A\u00e9ropostale", "Ann Taylor"]
Then I convert py_list in to json representation using json.dumps()
json_py_list = json.dumps(py_list)
And finally, to insert, I use psycopg2.cursor() and mogrify()
conn = psycopg2.connect("connectionString")
cursor = conn.cursor()
cursor.execute(cursor.mogrify("INSERT INTO table (columnName) VALUES (%s), (json_py_list,)))
As I mentioned earlier, using the above logic, string with special charaters like è are getting stored as utf8 character code.
Please spot my mistake.
json.dumps escapes non-ASCII characters by default so its output can work in non-Unicode-safe environments. You can turn this off with:
json_py_list = json.dumps(py_list, ensure_ascii=False)
Now you will get UTF-8-encoded bytes (unless you change that too with encoding=) so you'll need to make sure your database connection is using that encoding.
In general it shouldn't make any difference as both forms are valid JSON and even with ensure_ascii off there are still characters that get \u-encoded.

Python MySQL insert and retrieve a list in Blob

I'm trying to insert a list of element into a MySQL database (into a Blob column). This is an example of my code is:
myList = [1345,22,3,4,5]
myListString = str(myList)
myQuery = 'INSERT INTO table (blobData) VALUES (%s)'
cursor.execute(query, myListString)
Everything works fine and I have my list stored in my database. But, when I want to retrieve my list, because it's now a string I have no idea how to get a real integer list instead of a string.
For example, if now i do :
myQuery = 'SELECT blobData FROM db.table'
cursor.execute(myQuery)
myRetrievedList = cursor.fetch_all()
print myRetrievedList[0]
I ll get :
[
instead of :
1345
Is there any way to transform my string [1345,22,3,4,5] into a list ?
You have to pick a data format for your list, common solutions in order of my preference are:
json -- fast, readable, allows nested data, very useful if your table is ever used by any other system. checks if blob is valid format. use json.dumps() and json.loads() to convert to and from string/blob representation
repr() -- fast, readable, works across Python versions. unsafe if someone gets into your db. user repr() and eval() to get data to and from string/blob format
pickle -- fast, unreadable, does not work across multiple architectures (afaik). does not check if blob is truncated. use cPickle.dumps(..., protocol=(cPickle.HIGHEST_PROTOCOL)) and cPickle.loads(...) to convert your data.
As per the comments in this answer, the OP has a list of lists being entered as the blob field. In that case, the JSON seems a better way to go.
import json
...
...
myRetrievedList = cursor.fetch_all()
jsonOfBlob = json.loads(myRetrievedList)
integerListOfLists = []
for oneList in jsonOfBlob:
listOfInts = [int(x) for x in oneList]
integerListOfLists.append(listOfInts)
return integerListOfLists #or print, or whatever

Categories

Resources