I'm playing with API and it throw me this kind of id : ff869d1f-0923-4d28-8577-4c36291f0fca
I wonder if the id is encoded in a specific format and if I can convert it into an integer, working on python.
Your ID is a UUID. Technically it is a 128bit integer, but since such long numbers are rather impractical to use they are mostly presented as dash-separated hexadecimal groups like your ff869d1f-0923-4d28-8577-4c36291f0fca.
Python as well as modern databases offer direct support for UUIDs. In most cases it is not helpful to convert them.
Exact details about UUIDs, their types (versions), fields and conversion options can be found in the documentation of the UUID module.
Related
In the beginning I used VARCHAR, at least until I found out that it supposedly doesn't support Unicode, for which reason I then switched to NVARCHAR. However, I encountered the need to insert invalid Unicode data into the database (stemming from the fact that on Linux paths are arbitrary bytes and therefore can contain messed up Unicode data).
So now I have to switch the data type again, but I'm stumped on the possibilities. After looking at SQLite docs; the following is "only a small subset of the datatype names that SQLite will accept":
CHARACTER
VARCHAR
VARYING CHARACTER
NCHAR
NATIVE CHARACTER
NVARCHAR
TEXT
CLOB
I'm aware of the possibility to store file paths with BLOB, but it doesn't feel right to store data that is text most of the times as BLOB.
Is there a preferred/idiomatic SQL data type to use for this use-case: text that's Unicode compliant most of the time?
If possible, that data type should transparently convert to and from Python's str data type with surrogateescapes (surrogateescapes are Python's way to still be able to represent faulty file paths with str).
I couldn't find help for this use-case by googling, so I will appreciate any help or pointers here!
I have a list of strings. Some of them can be successfully entered into a postgres table as valid datetime values; some are invalid. I want to separate them out in python - is there a library function that can do this? I would even be happy to see postgresql's underlying functionality that determines whether a string is a valid datetime so that I can build the python function myself. This is especially needed because postgresql seems to have arbitrary rules on what constitutes a valid date - for example, '5-31,2019' is not valid while '5,31-2019' is.
The Postgres date routines ALL produce valid dates. The conversion routines also produce valid dates given sufficient information. If you intentionally use non-standard formats and try to find strings that fail to convert representations you can always do so. In your examples one string happened to match the default settings for converting ambiguous strings while the other didn't. You do not have to rely on the default; you can tell Postgres the how to interpret the string to be converted using format specification of the the to_date function. Doing so both your examples convert to a the same valid date.
select to_date('5-31,2019','mm-dd,yyyy')
union all
select to_date('5,31-2019','mm,dd-yyyy');
How can I find the size of a particular field that Scapy support. For example what is the size of LongField (in bytes)? I checked the scapy documentation, but couldn't find it (http://www.secdev.org/projects/scapy/doc/build_dissect.html#fields) .
Is there any function to check it, or any documentation which shares this info.
Just have a look at the source-code. Scapys built-int fields use struct.pack() to convert integers to various length Big- and Little-Endian representations. The LongField which is an implementation of Field in particular will serialize as fmt=Q which according to the python documentation for struct has a size of Q unsigned long long integer 8 (also see struct.calcsize)
Always note that Fields can be highly specific to one protocol. However, you could, for the fields based on class Field try to access the instances fmt attribute. However, this can be inaccurate as your field could have an overridden serialization method. A more generic solution would be to spawn an instance of the Field, serialize it and call len() on it.
(Note: links to an inofficial github mirror)
When storing a time in Python (in my case in ZODB, but applies to any DB), what format (epoch, datetime etc) do you use and why?
The datetime module has the standard types for modern Python handling of dates and times, and I use it because I like standards (I also think it's well designed); I typically also have timezone information via pytz.
Most DBs have their own standard way of storing dates and times, of course, but modern Python adapters to/from the DBs typically support datetime (another good reason to use it;-) on the Python side of things -- for example that's what I get with Google App Engine's storage, Python's own embedded SQLite, and so on.
If the database has a native date-time format, I try to use that even if it involves encoding and decoding. Even if this is not 100% standard such as SQLITE, I would still use the date and time adaptors described near the bottom of the SQLITE3 help page.
In all other cases I would use ISO 8601 format unless it was a Python object database that stores some kind of binary encoding of the object.
ISO 8601 format is sortable and that is often required in databases for indexing. Also, it is unambiguous so you know that 2009-01-12 was in January, not in December. The people who change the position of month and day, always put the year last, so putting it first stops people from automatically assuming an incorrect format.
Of course, you can reformat however you want for display and input in your applications but data in databases is often viewed with other tools, not your application.
Seconds since epoch is the most compact and portable format for storing time data. Native DATETIME format in MySQL, for example, takes 8 bytes instead of 4 for TIMESTAMP (seconds since epoch). You'd also avoid timezone issues if you need to get the time from clients in multiple geographic locations. Logical operations (for sorting, etc.) are also fastest on integers.
I have a twenty byte hex hash that I would like to store in a django model.
If I use a text field, it's interpreted as unicode and it comes back garbled.
Currently I'm encoding it and decoding it, which really clutters up the code,
because I have to be able to filter by it.
def get_changeset(self):
return bin(self._changeset)
def set_changeset(self, value):
self._changeset = hex(value)
changeset = property(get_changeset, set_changeset)
Here's an example for filtering
Change.objects.get(_changeset=hex(ctx.node()))
This is the approach that was recommended by a django developer, but I'm really struggling to come to terms with the fact that it's this ugly to just store twenty bytes.
Maybe I'm too much of a purist, but ideally I would be able to write
Change.objects.get(changeset=ctx.node())
The properties allow me to write:
change.changeset = ctx.node()
So that's as good as I can ask.
Starting with 1.6, Django has BinaryField allowing to store raw binary data. However, for hashes and other values up to 128 bits it's more efficient (at least with the PostgreSQL backend) to use UUIDField available in Django 1.8+.
I'm assuming if you were writing raw SQL you'd be using a Postgres bytea or a MySQL VARBINARY. There's a ticket with a patch (marked "needs testing") that purportedly makes a field like this (Ticket 2417: Support for binary type fields (aka: bytea in postgres and VARBINARY in mysql)).
Otherwise, you could probably try your hand at writing a custom field type.
"I have a twenty byte hex hash that I would like to store in a django model."
Django does this. They use hex digests, which are -- technically -- strings. Not bytes.
Do not use someHash.digest() -- you get bytes, which you cannot easily store.
Use someHash.hexdigest() -- you get a string, which you can easily store.
Edit -- The code is nearly identical.
See http://docs.python.org/library/hashlib.html
You could also write your own custom Model Manager that does the escaping and unescaping for you.
If this issue is still of interest, Disqus' django-bitfield fits the bill:
https://github.com/disqus/django-bitfield
... the example code on GitHub is a little confusing at first w/r/t the modules' actual function, because of the asinine variable names -- generally I am hardly the sort of person with either the wherewithal or the high ground to take someone elses' goofy identifiers to task... but flaggy_foo?? Srsly, U guys.
If that project isn't to your taste, and you're on Postgres, you have a lot of excellent options as many people have written and released code for an assortment of Django fields that take advantage of Postgres' native type. Here's an hstore model field:
https://github.com/jordanm/django-hstore -- I have used this and it works well.
Here's a full-text search implementation that uses Postgres' termvector types:
https://github.com/aino/django-pgindex
And while I cannot vouch for this specific project, there are Django bytea fields as well:
https://github.com/aino/django-arrayfields