Postgresql Json column not saving utf-8 character - python

Hi i'm trying to save data that i get from this api into my Json column in my postgresql using sqlalchemy and python requests.
r = requests.get(api)
content = r.content
data = json.loads(content)
crawl_item = {}
crawl_item = session.query(CrawlItem).filter_by(site_id=3, href=list_id).first()
crawl_item.description = data['ad']['body']
crawl_item.meta_data = {}
crawl_item.meta_data["ward"] = data['ad_params']['ward']['value']
try:
session.commit()
except:
session.rollback()
raise
finally:
ret_id = crawl_item.id
session.close()
my model:
class CrawlItem(Base):
...
description = Column(Text)
meta_data = Column(postgresql.JSON)
i want to get the value of ward :
"ward": {
"id": "ward",
"value": "Thị trấn Trạm Trôi",
"label": " Phường, thị xã, thị trấn"
}
I already encoding my postgresql to utf-8 so other fields that are not json column (description = Column(Text)) save utf-8 characters normally only my json column data are not decode:
{
"ward":"Th\u1ecb tr\u1ea5n Tr\u1ea1m Tr\u00f4i"
}
description column:
meta_data column:
i had tried using :
crawl_item.meta_data["ward"] = data['ad_params']['ward']['value'].decode('utf-8')
but the ward data don't get save
I have no idea what is wrong, hope someone can help me
EDIT:
i checked the data with psql and got these:
description column:
meta_data column:
It seems like only meta_data json column have trouble with the characters

Sqlalchemy serializes JSON field before save to db (see url and url and url).
json_serializer = dialect._json_serializer or json.dumps
By default, the PostgreSQL' dialect uses json.dumps and json.loads.
When you work with Text column, the data is converted in the following flow:
str -> bytes in utf-8 encoding
When you work with JSON column for PostgreSQL dialect, the data is converted in the following flow:
dict -> str with escaped non-ascii symbols -> bytes in utf-8 encoding
You can override the serializer in your engine configuration using json_serializer field:
json_serializer=partial(json.dumps, ensure_ascii=False)

use "jsonb" data type for your json column or cast "meta_data" field to "jsonb" like this:
select meta_data::jsonb from your_table;

Related

How to convert Django TextField string into JSON?

I have the following model:
class Car(models.Model):
data = models.TextField(default="[]")
Also, I have the following serializer:
class CarSerializer(serializers.ModelSerializer):
data = serializers.ListField(child=serializers.CharField())
The REST API gets data and saves it as text field. In my to_dict method of Car, I want to convert self.data into JSON and return the dict:
def to_dict(self):
result = dict()
result['data']= json.loads(self.data)
return result
But it fails with the error:
json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)
As I understand, the reason is that self.data is:
"['a', 'b', 'c']"
And not:
'["a", "b", "c"]'
I'm familiar with JsonField, but since I'm using SQLite without JSON1 externation, I can't use it. How can I convert self.data to JSON?
You can use python json.dumps() method to convert string into json format and then use json.loads() to convert json into python object.
import json
def to_dict(self):
result = dict()
data = json.dumps(self.data)
result['data'] = json.loads(data)
return result
The simplest way to solve this problem is json.loads(self.data.replace('\'','\"')).
Replace ' to ".
Or you can try eval(self.data)
you can watch a sample here about the usage of eval

Access Nested Data from JSONField in Django

I have the following model:
class BaseTransaction(models.Model):
"""Model to store JSON data"""
name = models.CharField(max_length=255)
json_data = JSONField(null=True)
If I create an instance with the following data:
base_transaction = models.BaseTransaction.objects.create(
name="Test Transaction",
json_data={{"sales": 3.24, "date": "2020-06-05"},
{"sales": 5.50, "date": "2020-06-04"},
{"sales": 256.53, "date": "2020-06-02"}}
)
How would I access the second row of data without a key? Or is this the wrong format for JSON? I am using this format because the original data is from a CSV and this is how it converts to JSON.
No, the above structure is not inJSON format. You can always validate if it's JSON or not using JSON Formatter & Validator
You would want to restructure is according to the rules of JSON, and manually if that can be done so. Once it's in JSON format, you can access the second row without keys using a for loop and a counter, e.g.
counter = 0
for (key in obj) {
counter+=1
if (counter == 2):
# Do anything
else:
print("Key: " + key)
print("Value: " + obj[key])
}

Datatype conversion using Python Marshmallow

I am trying to use Marshmallow schema to serialize the python object. Below is the schema I have defined for my data.
from marshmallow import Schema, fields
class User:
def __init__(self, name = None, age = None, is_active = None, details = None):
self.name = name
self.age = age
self.is_active = is_active
self.details = details
class UserSchema(Schema):
name = fields.Str()
age = fields.Int()
is_active = fields.Bool()
details = fields.Dict()
The input will be in dictionary format and all the values will be in string.
user_data = {"name":"xyz", "age":"20", "is_active": 'true',"details":"{'key1':'val1', 'key2':'val2'}"}
When I try to run the below snippet, values of age and is_active got converted into respective datatype but details remains unchanged.
user_schema = UserSchema()
user_dump_data = user_schema.dump(user_data)
print(user_dump_data)
Output:
{'name': 'xyz', 'is_active': True, 'details': "{'key1':'val1', 'key2':'val2'}", 'age': 20}
I need to serialize the input data into respective datatype I defined in my schema. Is there anything I am doing wrongly? Can anyone guide me how to acheive this using Marshmallow?
I am using
python 3.6
marshmallow 3.5.1
Edit
The above input data is fetched from HBase. By default HBase stores all its values as bytes and return as bytes. Below is the format I get from HBase
{b'name': b'xyz', b'age': b'20', b'is_active': b'true', b'details': b"{'key1':'val1', 'key2':'val2'}"}
Then I decode this dictionary and pass it to my UserSchema to serialize it to be used in web API.
You're confusing serializing (dumping) and deserializing (loading).
Dumping is going from object form to json-serializable basic python types (using Schema.dump) or json string (using Schema.dumps). Loading is the reverse operation.
Typically, your API loads (and validates) data from the outside world and dumps (without validation) your objects to the outside world.
If your input data is this data and you want to load it into objects, you need to use load, not dump.
user_data = {"name":"xyz", "age":"20", "is_active": 'true',"details":"{'key1':'val1', 'key2':'val2'}"}
user_loaded_data = user_schema.load(user_data)
user = User(**user_loaded_data)
Except if you do so, you'll be caught by another issue. DictField expects data as a dict, not a str. You need to enter
user_data = {"name":"xyz", "age":"20", "is_active": 'true',"details": {'key1':'val1', 'key2':'val2'}}
As Jérôme mentioned you're confusing serializing(dumping) with deserializing(loading). As per your requirement, you should use Schema.load as suggested.
Since, all the input values are expected to be of string type. You can use, pre_load to register a method for pre-processing the data as follows:
from marshmallow import Schema, fields, pre_load
class UserSchema(Schema):
name = fields.Str()
age = fields.Int()
is_active = fields.Bool()
details = fields.Dict()
#pre_load
def pre_process_details(self, data, **kwarg):
data['details'] = eval(data['details'])
return data
user_data = {"name":"xyz", "age":"20", "is_active": 'true',"details":"{'key1':'val1', 'key2':'val2'}"}
user_schema = UserSchema()
user_loaded_data = user_schema.load(user_data)
print(user_loaded_data)
Here, pre_process_details will convert string type to dictionary for correct deserialization.

Change string to objectId within json in pymongo+flask and insert it into mongodb

I'm using Swagger+pymongo and I wanted something really simple to convert a string into an ObjectId. How can I do this very easily without touching other db schemas.(Minimal effort)?
Code:
jsonResponse = request.json['business']
# convert business_id Datatype to ObjectId
business_id=ObjectId(jsonResponse['business_id'])
#add business_id (ObjectId)to mongodb
data = collection.insert_one(jsonResponse).inserted_id
return data
response = request.json['business']
response_oid = ObjectId(response['business_id'])
mongo_item = response.copy()
mongo_item['business_id'] = response_oid
return collection.insert_one(mongo_item).inserted_id
Should do the job.

Python - How to parse JSON string

I am trying to find a way to parse a JSON string and save them into mysql.
This is my json!
{"title": My title, "desc": mydesc, "url": http//example.com}
From now i don't have problem to save all json into one column usining json.dumps() so actually I'm trying to parse each joson data string to send him to mysql table. Title | Desc | Url.
This is my python code for desc example (pyspider-resultdb.py)
def _parse(self, data):
for key, value in list(six.iteritems(data)):
if isinstance(value, (bytearray, six.binary_type)):
data[key] = utils.text(value)
if 'result' in data:
decoded = json.loads(data['result'])
data['result'] = json.dumps(decoded['desc'])
return data
def _stringify(self, data):
if 'result' in data:
decoded = json.loads(data['result'])
data['result'] = json.dumps(decoded['desc'])
return data
It's unclear from your question what you trying to achieve, but If your question is how to convert JSON to python dict and then load to the table, then that's how you can do it:
my_dict = json.loads('{"title": "foo", "dest": "bar"}')
curs.execute('INSERT INTO test (title, dest) values(%(title)s, %(dest)s)', my_dict)

Categories

Resources