Storing a binary hash value in a Django model field

Storing a binary hash value in a Django model field - python

I have a twenty byte hex hash that I would like to store in a django model.
If I use a text field, it's interpreted as unicode and it comes back garbled.
Currently I'm encoding it and decoding it, which really clutters up the code,
because I have to be able to filter by it.
def get_changeset(self):
return bin(self._changeset)
def set_changeset(self, value):
self._changeset = hex(value)
changeset = property(get_changeset, set_changeset)
Here's an example for filtering
Change.objects.get(_changeset=hex(ctx.node()))
This is the approach that was recommended by a django developer, but I'm really struggling to come to terms with the fact that it's this ugly to just store twenty bytes.
Maybe I'm too much of a purist, but ideally I would be able to write
Change.objects.get(changeset=ctx.node())
The properties allow me to write:
change.changeset = ctx.node()
So that's as good as I can ask.

Starting with 1.6, Django has BinaryField allowing to store raw binary data. However, for hashes and other values up to 128 bits it's more efficient (at least with the PostgreSQL backend) to use UUIDField available in Django 1.8+.

I'm assuming if you were writing raw SQL you'd be using a Postgres bytea or a MySQL VARBINARY. There's a ticket with a patch (marked "needs testing") that purportedly makes a field like this (Ticket 2417: Support for binary type fields (aka: bytea in postgres and VARBINARY in mysql)).
Otherwise, you could probably try your hand at writing a custom field type.

"I have a twenty byte hex hash that I would like to store in a django model."
Django does this. They use hex digests, which are -- technically -- strings. Not bytes.
Do not use someHash.digest() -- you get bytes, which you cannot easily store.
Use someHash.hexdigest() -- you get a string, which you can easily store.
Edit -- The code is nearly identical.
See http://docs.python.org/library/hashlib.html

You could also write your own custom Model Manager that does the escaping and unescaping for you.

If this issue is still of interest, Disqus' django-bitfield fits the bill:
https://github.com/disqus/django-bitfield
... the example code on GitHub is a little confusing at first w/r/t the modules' actual function, because of the asinine variable names -- generally I am hardly the sort of person with either the wherewithal or the high ground to take someone elses' goofy identifiers to task... but flaggy_foo?? Srsly, U guys.
If that project isn't to your taste, and you're on Postgres, you have a lot of excellent options as many people have written and released code for an assortment of Django fields that take advantage of Postgres' native type. Here's an hstore model field:
https://github.com/jordanm/django-hstore -- I have used this and it works well.
Here's a full-text search implementation that uses Postgres' termvector types:
https://github.com/aino/django-pgindex
And while I cannot vouch for this specific project, there are Django bytea fields as well:
https://github.com/aino/django-arrayfields

Related

Django. Check in template or use an extra field?

I have some text fields in my Django model that are filled by a script, with values in English (the list of values is known).
But the app is actually made for Russian clients only. I'd like to translate those fields into Russian, and here comes a little question. These values are taken from an API response, which means I should check the value to translate it. What's faster: to check and translate fields in template or to make extra fields and translate strings in the Python script?

The problem is overhead of compiling Templates when rendering. So the more complicated the template gets (method calls etc), the performance tends to get slow (like py files are converted to pyc). Django has template caching but that also is limited (I don't know how much). I have faced performance issue because of lot of logic in templates. Plus its always good to have a dumb client (template). I will prefer the Python approach because of the idea to keep client thin and not because of the performance gap. Plus if tomorrow you need to add one more language then changing templates is always going to be difficult then server.

The maximum number of objects that can be instantiated with a Django model?

I wrote an app to record the user interactions with the website search box,
the query string is saved as an object of the model SearchQuery. Whenever a user enters some data in the search box, I can save the search query and some info related to the query on the database.
This is for the idea of getting the search trends,
the fields in my database models are,
A Character Field (max_length=30)
A PositiveIntegerField
A BooleanField
My Questions are,
How many objects can be instantiated from the model SearchQuery? If there is a limit on numbers?
As the objects are not related (no db relationships) should I use MongoDB or some kind of NoSQLs for performance?
Is this a good design or should I do some more work to make it efficient?
Django version 1.6.5
Python version 2.7

How many objects can be instantiated from the model SearchQuery? If there is a limit on numbers?
As many as your chosen database can handle, this is probably in the millions. If you are concerned you can use a scheduler to delete older queries when they are no longer useful.
As the objects are not related (no db relationships) should I use MongoDB or some kind of NoSQLs for performance?
Could you, but its unlikely to give you much (if any efficiency gains). Because you are doing frequent writes and (presumably) infrequent reads, then its unlikely to hit the database very hard at all.
Is this a good design or should I do some more work to make it efficient?
There are probably two recommendations I'd make.
a. If you are going to be doing frequent reads on the Search log, look at using multiple databases. One for your log, and one for everything else.
b. Consider just using a regular log file for this information. Again, you will probably only be examining this data infrequently. So there are strng arguments to piping it into a log file, probably CSV-like, to make data analysis of it easier.

How to implement database compatible python objects with arbitrary fields/arbitrary number of fields

I'm working on creating a python program to interact with many different types of conceptual objects. For example, it might represent a person, in which case it'd have something like this:
type = "person"
name = "Bono"
profession = "performer"
nationality = "Irish"
However, it might also represent a magazine, in which case it'd look something like this
type = "publication"
name = "Rolling Stone"
editor = ("Jann Wenner" , "Will Dana")
founding_year = "1967"
Aside from type and name, all of the other fields are optional. Here's the tricky bit -- it's part of code written for a scraper, so all of the other fields are determined/created on the fly. In other words, we won't know that we need an "editor" field until the scraper spits back "editor" to the code
Ideally, this would be implemented fairly straightforwardly as a python dictionary of lists. However, we'll be working with a large number of records -- too many to keep in memory at the same time. As a result, I'd like to have database compatibility -- something like Django's MVC, so we can easily query the record set.
One option I had considered was Django fieldsets, but it looks like they're still in beta and I worry that I'll lose some generality in what I can store -- ideally, I'd be able to store any type of data with a key, (value_list) pair. I'd love any input on the feasibility of fieldsets or example code.
Another option I had considered was a combination of the Django MVC and JSON. In this case, I'd have three columns for each object -- type, name, and attributes. Attributes would be a JSON serialization (or other appropriate pickling method) of all of the other attributes, so that once you had the object, you could reconstitute it's attributes and query the set. I'd store something like this or this (links). With this method, I'd lose the ability to easily do a search for any of the attributes in the dict.
I'd very much appreciate any input or guidance. If anyone knows of similar projects, I'd love to know.

This seems like an excellent opportunity to use NoSQL databases. Something like MongoDB doesn't rely on a fixed schema, so it might be suitable for your scenario.

Storing a python set in a database with django

I have a need to store a python set in a database for accessing later. What's the best way to go about doing this? My initial plan was to use a textfield on my model and just store the set as a comma or pipe delimited string, then when I need to pull it back out for use in my app I could initialize a set by calling split on the string. Obviously if there is a simple way to serialize the set to store it in the db so I can pull it back out as a set when I need to use it later that would be best.

If your database is better at storing blobs of binary data, you can pickle your set. Actually, pickle stores data as text by default, so it might be better than the delimited string approach anyway. Just pickle.dumps(your_set) and unpickled = pickle.loads(database_string) later.

There are a number of options here, depending on what kind of data you wish to store in the set.
If it's regular integers, CommaSeparatedIntegerField might work fine, although it often feels like a clumsy storage method to me.
If it's other kinds of Python objects, you can try pickling it before saving it to the database, and unpickling it when you load it again. That seems like a good approach.
If you want something human-readable in your database though, you could even JSON-encode it into a TextField, as long as the data you're storing doesn't include Python objects.

Redis natively stores sets (as well as other data structures (lists, dicts, queue)) and provides set operations - and its rocket fast too. I find it's the swiss army knife for python development.
I know its not a relational database per se, but it does solve this problem very concisely.

What about CommaSeparatedIntegerField?
If you need other type (string for example) you can create your own field which would work like CommaSeparatedIntegerField but will use strings (without commas).
Or, if you need other type, probably a better way of doing it: have a dictionary which maps integers to your values.

Is this a good approach to avoid using SQLAlchemy/SQLObject?

Rather than use an ORM, I am considering the following approach in Python and MySQL with no ORM (SQLObject/SQLAlchemy). I would like to get some feedback on whether this seems likely to have any negative long-term consequences since in the short-term view it seems fine from what I can tell.
Rather than translate a row from the database into an object:
each table is represented by a class
a row is retrieved as a dict
an object representing a cursor provides access to a table like so:
cursor.mytable.get_by_ids(low, high)
removing means setting the time_of_removal to the current time
So essentially this does away with the need for an ORM since each table has a class to represent it and within that class, a separate dict represents each row.
Type mapping is trivial because each dict (row) being a first class object in python/blub allows you to know the class of the object and, besides, the low-level database library in Python handles the conversion of types at the field level into their appropriate application-level types.
If you see any potential problems with going down this road, please let me know. Thanks.

That doesn't do away with the need for an ORM. That is an ORM. In which case, why reinvent the wheel?
Is there a compelling reason you're trying to avoid using an established ORM?

You will still be using SQLAlchemy. ResultProxy is actually a dictionary once you go for .fetchmany() or similar.
Use SQLAlchemy as a tool that makes managing connections easier, as well as executing statements. Documentation is pretty much separated in sections, so you will be reading just the part that you need.

web.py has in a decent db abstraction too (not an ORM).
Queries are written in SQL (not specific to any rdbms), but your code remains compatible with any of the supported dbs (sqlite, mysql, postresql, and others).
from http://webpy.org/cookbook/select:
myvar = dict(name="Bob")
results = db.select('mytable', myvar, where="name = $name")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.