Querying Binary data using Django and PostgreSQL

Querying Binary data using Django and PostgreSQL - python

I am trying to print out to the console the actual content value (which is html) of the field 'htmlfile': 16543. (See below)
So far I am able to print out the whole row using .values() method
Here is what I am getting in my python shell:
>>>
>>> Htmlfiles.objects.values()[0]
{'id': 1, 'name': 'error.html', 'htmlfile': 16543}
>>>
I want to print out the content of 16543.. I have scanned through the Django QuerySet docs so many times and still cannot find the right method..
Here is my data model in models.py:
class Htmlfiles(models.Model):
name = models.CharField(max_length=30, blank=True, null=True)
htmlfile = models.TextField(blank=True, null=True)
class Meta:
managed = False
db_table = 'htmlfiles'
Any assistance would be greatly appreciated.

You can fetch only the htmlfield value with:
Htmlfiles.objects.values('htmlfile')
Which, for each row, will give you an dictionary like so:
{'htmlfile': 12345}
So to print all the htmlfile values something like this is what you need:
objects = Htmlfiles.objects.values('htmlfile')
for obj in objects:
print(obj['htmlfile'])

Related

Unexpected error while request parsing using a serializer

While parsing my request data from front-end and converting into JSON format using a serializer. I am getting some unexpected errors.
while request parsing pattern using serializers given as mentioned below, it shows me the following error:(I found the below error using: contact_serializer.errors)
{'address': {u'non_field_errors': [u'Invalid data. Expected a dictionary, but got str.']}}

I do not think it will work like this. You have to remember here is that if you input the values like this, it will ultimately be stored in DB, and it is hard coded values. Even if you insist to do it like this, then use a list of dictionary like this:
request.data['phone_number'] = [{'number': '9999999999'}]
request.data['cont_email'] = [{'email':'tim#gmail.com'}]
And update the serializer like this:
class CrmContactSerializer(serializers.ModelSerializer):
phone_number = PhoneNumberSerializer(source = 'contact_number', many=True)
cont_email = ContactEmailSerializer(source = 'contact_email', many=True)
class Meta:
model = RestaurantContactAssociation
fields = ('id','phone_number','cont_email','contact')
def create(self, validated_data):
phone_number = validated_data.pop('contact_number')
cont_email = validated_data.pop('contact_email')
restaurant = super(CrmContactSerializer, self).create(validated_data)
phone_instance = PhoneNumber(**phone_number)
phone_instance.restaurant = restaurant
phone_instance.save()
email_instance = ContactEmail(**phone_number)
email_instance.restaurant = restaurant
email_instance.save()
return restaurant
Reason for many=True is that one restaurant can have multiple numbers or emails(as it has one to many relationship with respective models).
Now, if you think of proper way of implementing, you can make phone_number and cont_email read only fields, so that it will be used when only reading, not writing:
class CrmContactSerializer(serializers.ModelSerializer):
phone_number = PhoneNumberSerializer(source = 'contact_number', read_only=True)
cont_email = ContactEmailSerializer(source = 'contact_email', read_only=True)
class Meta:
model = RestaurantContactAssociation
fields = ('id','phone_number','cont_email','contact')
In that way, validation error can be handled for phone number and cont email.

How to make a python object json-serialized?

I want to serialize a python object, after saved it into mysql(based on Django ORM) I want to get it and pass this object to a function which need this kind of object as a param.
Following two parts are my main logic code:
1 save param part :
class Param(object):
def __init__(self, name=None, targeting=None, start_time=None, end_time=None):
self.name = name
self.targeting = targeting
self.start_time = start_time
self.end_time = end_time
#...
param = Param()
param.name = "name1"
param.targeting= "targeting1"
task_param = {
"task_id":task_id, # string
"user_name":user_name, # string
"param":param, # Param object
"save_param":save_param_dict, # dictionary
"access_token":access_token, # string
"account_id": account_id, # string
"page_id": page_id, # string
"task_name":"sync_create_ad" # string
}
class SyncTaskList(models.Model):
task_id = models.CharField(max_length=128, blank=True, null=True)
ad_name = models.CharField(max_length=128, blank=True, null=True)
user_name = models.CharField(max_length=128, blank=True, null=True)
task_status = models.SmallIntegerField(blank=True, null=True)
task_fail_reason = models.CharField(max_length=255, blank=True, null=True)
task_name = models.CharField(max_length=128, blank=True, null=True)
start_time = models.DateTimeField()
end_time = models.DateTimeField(blank=True, null=True)
task_param = models.TextField(blank=True, null=True)
class Meta:
managed = False
db_table = 'sync_task_list'
SyncTaskList(
task_id=task_id,
ad_name=param.name,
user_name=user_name,
task_status=0,
task_param = task_param,
).save()
2 use param part
def add_param(param, access_token):
pass
task_list = SyncTaskList.objects.filter(task_status=0)
for task in task_list:
task_param = json.loads(task.task_param)
add_param(task_param["param"], task_param["access_token"]) # pass param object to function add_param
If I directly use Django ORM to save task_param into mysql, I get error,
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
for after ORM operation, I get string who's property name enclosed in single quotes like :
# in mysql it saved as
task_param: " task_param: {'task_id': 'e4b8b240cefaf58fa9fa5a591221c90a',
'user_name': 'jimmy',
'param': Param(name='name1',
targeting='geo_locations',
),
'save_param': {}}"
I am now confused with serializing an python object, then how to load this original object and pass it to a function?
Any commentary is very welcome. great thanks.
update my solution so far
task_param = {
# ...
"param":vars(param), # turn Param object to dictionary
# ...
}
SyncTaskList(
#...
task_param = json.dumps(task_param),
#...
).save()
#task_list = SyncTaskList.objects.filter(task_status=0)
#for task in task_list:
task_param = json.loads(task.task_param)
add_param(Param(**task_param["param"]), task_param["access_token"])
update based on #AJS's answer
directly pickle dumps and saved it as an binary field, then pickle loadsit also works
Any better solution for this?

Try looking into msgpack
https://msgpack.org/index.html
unlike pickle, which is python-specific, msgpack is supported by many languages (so the language you use to write to mysql can be different than the language used to read).
There are also some projects out there that integrate these serializer-libraries into Django model fields:
Pickle: https://pypi.org/project/django-picklefield/
MsgPack: https://github.com/vakorol/django-msgpackfield/blob/master/msgpackfield/msgpackfield.py

You can use pickle basically you are serializing your python object and save it as bytes in your MySQL db using BinaryField as your model field type in Django, as i don't think JSON serialization would work in your case as you have a python object as a value as well in your dict, when you fetch your data from db simpily unpickle it syntax is similar to json library see below.
import pickle
#to pickle
data = pickle.dumps({'name':'testname'})
# to unpickle just do
pickle.loads(data)
so in your case when you unpickle your object you should get your data in same form as it was before you did pickle.
Hope this helps.

Django full text search using indexes with PostgreSQL

After solving the problem I asked about in this question, I am trying to optimize performance of the FTS using indexes.
I issued on my db the command:
CREATE INDEX my_table_idx ON my_table USING gin(to_tsvector('italian', very_important_field), to_tsvector('italian', also_important_field), to_tsvector('italian', not_so_important_field), to_tsvector('italian', not_important_field), to_tsvector('italian', tags));
Then I edited my model's Meta class as follows:
class MyEntry(models.Model):
very_important_field = models.TextField(blank=True, null=True)
also_important_field = models.TextField(blank=True, null=True)
not_so_important_field = models.TextField(blank=True, null=True)
not_important_field = models.TextField(blank=True, null=True)
tags = models.TextField(blank=True, null=True)
class Meta:
managed = False
db_table = 'my_table'
indexes = [
GinIndex(
fields=['very_important_field', 'also_important_field', 'not_so_important_field', 'not_important_field', 'tags'],
name='my_table_idx'
)
]
But nothing seems to have changed. The lookup takes exactly the same amount of time as before.
This is the lookup script:
from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
# other unrelated stuff here
vector = SearchVector("very_important_field", weight="A") + \
SearchVector("tags", weight="A") + \
SearchVector("also_important_field", weight="B") + \
SearchVector("not_so_important_field", weight="C") + \
SearchVector("not_important_field", weight="D")
query = SearchQuery(search_string, config="italian")
rank = SearchRank(vector, query, weights=[0.4, 0.6, 0.8, 1.0]). # D, C, B, A
full_text_search_qs = MyEntry.objects.annotate(rank=rank).filter(rank__gte=0.4).order_by("-rank")
What am I doing wrong?
Edit:
The above lookup is wrapped in a function I use a decorator on to time. The function actually returns a list, like this:
#timeit
def search(search_string):
# the above code here
qs = list(full_text_search_qs)
return qs
Might this be the problem, maybe?

You need to add a SearchVectorField to your MyEntry, update it from your actual text fields and then perform the search on this field. However, the update can only be performed after the record has been saved to the database.
Essentially:
from django.contrib.postgres.indexes import GinIndex
from django.contrib.postgres.search import SearchVector, SearchVectorField
class MyEntry(models.Model):
# The fields that contain the raw data.
very_important_field = models.TextField(blank=True, null=True)
also_important_field = models.TextField(blank=True, null=True)
not_so_important_field = models.TextField(blank=True, null=True)
not_important_field = models.TextField(blank=True, null=True)
tags = models.TextField(blank=True, null=True)
# The field we actually going to search.
# Must be null=True because we cannot set it immediately during create()
search_vector = SearchVectorField(editable=False, null=True)
class Meta:
# The search index pointing to our actual search field.
indexes = [GinIndex(fields=["search_vector"])]
Then you can create the plain instance as usual, for example:
# Does not set MyEntry.search_vector yet.
my_entry = MyEntry.objects.create(
very_important_field="something very important", # Fake Italien text ;-)
also_important_field="something different but equally important"
not_so_important_field="this one matters less"
not_important_field="we don't care are about that one at all"
tags="things, stuff, whatever"
Now that the entry exists in the database, you can update the search_vector field using all kinds of options. For example weight to specify the importance and config to use one of the default language configurations. You can also completely omit fields you don't want to search:
# Update search vector on existing database record.
my_entry.search_vector = (
SearchVector("very_important_field", weight="A", config="italien")
+ SearchVector("also_important_field", weight="A", config="italien")
+ SearchVector("not_so_important_field", weight="C", config="italien")
+ SearchVector("tags", weight="B", config="italien")
)
my_entry.save()
Manually updating the search_vector field every time some of the text fields change can be error prone, so you might consider adding an SQL trigger to do that for you using a Django migration. For an example on how to do that see for instance a blog article on Full-text Search with Django and PostgreSQL.
To actually search in MyEntry using the index you need to filter and rank by your search_vector field. The config for the SearchQuery should match the one of the SearchVector above (to use the same stopword, stemming etc).
For example:
from django.contrib.postgres.search import SearchQuery, SearchRank
from django.core.exceptions import ValidationError
from django.db.models import F, QuerySet
search_query = SearchQuery("important", search_type="websearch", config="italien")
search_rank = SearchRank(F("search_vector"), search_query)
my_entries_found = (
MyEntry.objects.annotate(rank=search_rank)
.filter(search_vector=search_query) # Perform full text search on index.
.order_by("-rank") # Yield most relevant entries first.
)

I'm not sure but according to postgresql documentation (https://www.postgresql.org/docs/9.5/static/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX):
Because the two-argument version of to_tsvector was used in the index
above, only a query reference that uses the 2-argument version of
to_tsvector with the same configuration name will use that index. That
is, WHERE to_tsvector('english', body) ## 'a & b' can use the index,
but WHERE to_tsvector(body) ## 'a & b' cannot. This ensures that an
index will be used only with the same configuration used to create the
index entries.
I don't know what configuration django uses but you can try to remove first argument

How do I store a string in ArrayField? (Django and PostgreSQL)

I am unable to store a string in ArrayField. There are no exceptions thrown when I try to save something in it, but the array remains empty.
Here is some code from models.py :
# models.py
from django.db import models
import uuid
from django.contrib.auth.models import User
from django.contrib.postgres.fields import JSONField, ArrayField
# Create your models here.
class UserDetail(models.Model):
user = models.OneToOneField(User, on_delete=models.CASCADE)
key = models.CharField(max_length=50, default=False, primary_key=True)
api_secret = models.CharField(max_length=50)
user_categories = ArrayField(models.CharField(max_length = 1000), default = list)
def __str__(self):
return self.key
class PreParentProduct(models.Model):
product_user = models.ForeignKey(UserDetail, default=False, on_delete=models.CASCADE)
product_url = models.URLField(max_length = 1000)
pre_product_title = models.CharField(max_length=600)
pre_product_description = models.CharField(max_length=2000)
pre_product_variants_data = JSONField(blank=True, null=True)
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
def __str__(self):
return self.pre_product_title
I try to save it this way:
catlist = ast.literal_eval(res.text)
for jsonitem in catlist:
key = jsonitem.get('name')
id = jsonitem.get("id")
dictionary = {}
dictionary['name'] = key
dictionary['id'] = id
tba = json.dumps(dictionary)
print("It works till here.")
print(type(tba))
usersearch[0].user_categories.append(tba)
print(usersearch[0].user_categories)
usersearch[0].save()
print(usersearch[0].user_categories)
The output I get is:
It works till here.
<class 'str'>
[]
It works till here.
<class 'str'>
[]
[]
Is this the correct way to store a string inside ArrayField?
I cannot store JSONField inside an ArrayField, so I had to convert it to a string.
How do I fix this?

Solution to the append problem.
You haven't demonstrated how your usersearch[0] I suspect it's something like this:
usersearch = UserDetail.objects.all()
If that is so you are making changes to a resultset, those things are immutable. Try this you will see that the id is unchanged too:
usersearch[0].id = 1000
print usersearch.id
But this works
usersearch = list(UserDetail.objects.all())
and so does
u = usersearch[0]
Solution to the real problem
user_categories = ArrayField(models.CharField(max_length = 1000), default = list)
This is wrong. ArrayFields shouldn't be used in this manner. You will soon find that you need to search through them and
Arrays are not sets; searching for specific array elements can be a
sign of database misdesign. Consider using a separate table with a row
for each item that would be an array element. This will be easier to
search, and is likely to scale better for a large number of elements
ref: https://www.postgresql.org/docs/9.5/static/arrays.html
You need to normalize your data. You need to have a category model and your UserDetail should be related to it through a foreign key.

How to sort a Django QuerySet by (field, custom function, field)

I am looking for getting a QuerySet that is sorted by field1, function, field2.
The model:
class Task(models.Model):
issue_id = models.CharField(max_length=20, unique=True)
title = models.CharField(max_length=100)
priority_id = models.IntegerField(blank=True, null=True)
created_date = models.DateTimeField(auto_now_add=True)
def due_date(self):
...
return ageing
I'm looking for something like:
taskList = Task.objects.all().order_by('priority_id', ***duedate***, 'title')
Obviously, you can't sort a queryset by custom function. Any advise?

Since the actual sorting happens in the database, which does not speak Python, you cannot use a Python function for ordering. You will need to implement your due date logic in an SQL expression, as an Queryset.extra(select={...}) calculated field, something along the lines of:
due_date_expr = '(implementation of your logic in SQL)'
taskList = Task.objects.all().extra(select={'due_date': due_date_expr}).order_by('priority_id', 'due_date', 'title')
If your logic is too complicated, you might need to implement it as a stored procedure in your database.
Alternatively, if your data set is very small (say, tens to a few hundred records), you can fetch the entire result set in a list and sort it post-factum:
taskList = list(Task.objects.all())
taskList.sort(cmp=comparison_function) // or .sort(key=key_function)

The answer by #lanzz, even though seems correct, didn't work for me but this answer from another thread did the magic for me:
https://stackoverflow.com/a/37648265/6420686
from django.db.models import Case, When
ids = [list of ids]
preserved = Case(*[When(id=pk, then=pos) for pos, pk in enumerate(ids)])
filtered_users = User.objects \
.filter(id__in=ids) \
.order_by(preserved)

You can use sort in Python if the queryset is not too large:
ordered = sorted(Task.objects.all(), key=lambda o: (o.priority_id, o.due_date(), o.title))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Querying Binary data using Django and PostgreSQL - python

Related

Unexpected error while request parsing using a serializer

How to make a python object json-serialized?

Django full text search using indexes with PostgreSQL

How do I store a string in ArrayField? (Django and PostgreSQL)

How to sort a Django QuerySet by (field, custom function, field)

Categories

Resources