I have a SQLite data base which I am pulling data for a specific set of dates (lets say 01-01-2011 to 01-01-2011). What is the best way to implement this query into SQL. Ideally I would like the following line to run:
SELECT * FROM database where start_date < date_stamp and end_date > date_stamp
This obviously does not work when I store the dates as strings.
My solution (which I think is messy and I am hoping for another one) is to convert the dates into integers in the following format:
YYYYMMDD
Which makes the above line able to run (theoretically). IS there a better method?
Using python sqlite3
Would the answer be any different if I were using SQL not SQLite
For SQLlite it is the best approach, as comparison with int much faster than strings or any Date And Time manipulations
You should store the dates in one of the supported date/time datatypes, then comparisons will work without conversions, and you would be able to use the built-in date/time functions on them.
(Whether you use strings or numbers does not matter for speed; database performance is mostly determined by the amount of I/O needed.)
In other SQL databases that have a built-in date datatype, you could use that.
(However, this is usually not portable.)
Related
What would be the best approach to handle the following case with Django?
Django needs access to a database (in MariaDB) in which datetime values are stored in UTC timezone, except for one table that has all values for all of its datetime columns stored in local timezone (obviously different that UTC). This particular table is being populated by a different system, not Django, and for some reasons we cannot have the option to convert the timestamps in that table to UTC or change that system to start storing the values in UTC. The queries involving that table are read-only, but may join data from other tables. The table itself does not have a foreign key but there are other tables with a foreign key to that table. The table is very big (millions of rows) and one of its datetime columns is part of more than one indexes that help for making optimized queries.
I am asking your opinion for an approach to the above case that would be as seamless as it can be, preferably without doing conversions here and there in various parts of the codebase while accessing and filtering on the datetime fields of this "problematic" table / model. I think an approach at the model layer, which will let Django ORM work as if the values for that table were stored in UTC timezone, would be preferable. Perhaps a solution based on a custom model field that does the conversions from and back to the database "transparently". Am I thinking right? Or perhaps there is a better approach?
It is what it is. If you have different timezones then you need to convert different timezones to the one you prefer. Plus, there is no such thing as for reasons we cannot have the option to convert the timestamps in that table to UTC - well, too bad for you, should have thought about that, now you need to deal with it (if that is the case, which it is not - this is "programming", after all. Of course everything can be changed)
Say you have a column that contains the values for the year, month and date. Is it possible to get just the year? In particular I have
ALTER TABLE pmk_pp_disturbances.disturbances_natural ADD COLUMN sdate timestamp without time zone;
and want just the 2004 from 2004-08-10 05:00:00. Can this be done with Postgres or must a script parse the string? By the way, any rules as to when to "let the database do the work" vs. let the script running on the local computer do the work? I once heard querying databases is slower than the rest of the program written in C/C++, generally speaking.
You can use extract:
SELECT extract('year' FROM sdate) FROM pmk_pp_disturbances.disturbances_natural;
For many queries it's worth investigating whether the database can perform the data transformations as needed. That being said, it also depends on what your application will do with the data so it's a trade-off as to whether the work should be done by the database or in the application.
SELECT date_part('year', your_column) FROM your_table;
I think no. You're forced to read the entire value of a column. You can divide the date in few columns, one for the year, another for the month, etc. , or store the date on an integer format if you want an aggressive space optimization. But it will doing the database worst about scalability and modifications.
The databases are slow, you must assume it, but they offer hardest things to do with C/C++.
If you think make a game and save your 'save game' on SQL forget it. Use it if you're doing a back-end server or a management application, tool, etc.
I have a complex django ORM query that I'd really rather not have to convert to raw SQL, because it's a very non-trivial query, I want consistency, I use a number of the ORM features to generate the query, and it's been thoroughly tested as it stands.
I want to add a single filter to the WHERE clause on a datetime field. However, I want to test against the date part only, not the time.
Here's a simplified version of my existing query:
MyTable.objects.filter(date_field__gte=datetime.now().date())
But I've converted the date_field to datetime_field for more precision in some scenarios. In this scenario, however, I still want a date-only comparison. Something like:
MyTable.objects.filter(datetime_field__datepartonly__gte=datetime.now().date())
In postgres, my database of choice, that's simple:
SELECT * FROM mytable WHERE DATE(datetime_field) >= ...
How can I do this in django, without converting the entire query to raw SQL?
I tried using F(), but you can only specify field names, not custom SQL.
I tried using Q(), but same deal.
I tried Django's SQL functions (Sum, etc), but there are only a few, and it looks like they're designed solely for agreggate queries.
I tried using an alias, but you can't use aliases in a WHERE clause, either in Django, or in SQL
How can I do this in django, without converting the entire query to raw SQL?
The year, month and day fields of the datetime object are available to you, but after testing here it doesn't seem to allow the additional __gte application to the field.
This will work:
now = datetime.now()
results = MyTable.objects.filter(datetime_field__year=now.year, datetime_field__month=now.month, datetime_field__day=now.day)
But doesn't allow gte.
you can always just create a datetime starting at 0:00
now = datetime.now().replace(hour=0, minute=0, second=0, microsecond=0)
results = MyTable.objects.filter(datetime_field__gte=now)
When storing a time in Python (in my case in ZODB, but applies to any DB), what format (epoch, datetime etc) do you use and why?
The datetime module has the standard types for modern Python handling of dates and times, and I use it because I like standards (I also think it's well designed); I typically also have timezone information via pytz.
Most DBs have their own standard way of storing dates and times, of course, but modern Python adapters to/from the DBs typically support datetime (another good reason to use it;-) on the Python side of things -- for example that's what I get with Google App Engine's storage, Python's own embedded SQLite, and so on.
If the database has a native date-time format, I try to use that even if it involves encoding and decoding. Even if this is not 100% standard such as SQLITE, I would still use the date and time adaptors described near the bottom of the SQLITE3 help page.
In all other cases I would use ISO 8601 format unless it was a Python object database that stores some kind of binary encoding of the object.
ISO 8601 format is sortable and that is often required in databases for indexing. Also, it is unambiguous so you know that 2009-01-12 was in January, not in December. The people who change the position of month and day, always put the year last, so putting it first stops people from automatically assuming an incorrect format.
Of course, you can reformat however you want for display and input in your applications but data in databases is often viewed with other tools, not your application.
Seconds since epoch is the most compact and portable format for storing time data. Native DATETIME format in MySQL, for example, takes 8 bytes instead of 4 for TIMESTAMP (seconds since epoch). You'd also avoid timezone issues if you need to get the time from clients in multiple geographic locations. Logical operations (for sorting, etc.) are also fastest on integers.
I'm trying to create a system in Python in which one can select a number of rows from a set of tables, which are to be formatted in a user-defined way. Let's say the table a has a set of columns, some of which include a date or timestamp value. The user-defined format for each column should be stored in another table, and queried and applied on the main query at runtime.
Let me give you an example: There are different ways of formatting a date column, e.g. using
SELECT to_char(column, 'YYYY-MM-DD') FROM table;
in PostgreSQL.
For example, I'd like the second parameter of the to_char() builtin to be queried dynamically from another table at runtime, and then applied if it has a value.
Reading the definition from a table as such is not that much of a problem, rather than creating a database scheme which would receive data from a user interface from which a user can select which formatting instructions to apply to the different columns. The user should be able to pick the user's set of columns to be included in the user's query, as well as the user's user defined formatting for each column.
I've been thinking about doing this in an elegant and efficient way for some days now, but to no avail. Having the user put in the user's desired definition in a text field and including it in a query would pretty much generate an invitation for SQL injection attacks (although I could use escape() functions), and storing every possible combination doesn't seem feasible to me either.
It seems to me a stored procedure or a sub-select would work well here, though I haven't tested it. Let's say you store a date_format for each user in the users table.
SELECT to_char((SELECT date_format FROM users WHERE users.id=123), column) FROM table;
Your mileage may vary.
Pull the dates out as Unix timestamps and format them in Python:
SELECT DATE_PART('epoch', TIMESTAMP(my_col)) FROM my_table;
my_date = datetime.datetime.fromtimestamp(row[0]) # Or equivalent for your toolkit
I've found a couple of advantages to this approach: unix timestamps are the most space-efficient common format (this approach is effectively language neutral) and because the language you're querying the database in is richer than the underlying database, giving you plenty of options if you start wanting to do friendlier formatting like "today", "yesterday", "last week", "June 23rd".
I don't know what sort of application you're developing but if it's something like a web app which will be used by multiple people I'd also consider storing your database values in UTC so you can apply user-specific timezone settings when formatting without having to consider them for all of your database operations.