Sqlalchemy, postgres datetime without timezone - python

I am trying to add timezone support to a python/sqlalchemy script. I have studied timezones and use pytz. I understand I should do as much as possible in UTC and only display local times. Due to the nature of the application, this is very easy.
Everything works, except that when I insert UTC data, it gets somehow converted to local time (BST) before entering the database, and I am completely lost why this happens and how I can avoid it.
My table (postgres) is defined as follows (relevant part only):
fpp=> \d foo;
Table "public.foo"
Column | Type | Modifiers
-----------+-----------------------------+-------------------------------------------------------------
x | integer |
y | integer |
when_utc | timestamp without time zone |
I have debugged sqlalchemy when it does an insert. This is what happens:
2016-07-28 17:16:27,896 INFO sqlalchemy.engine.base.Engine INSERT INTO
foo (x, y, "when_utc") VALUES (%(x)s, %(y)s, %(when_utc)s) RETURNING fb_foo.id
2016-07-28 17:16:27,896 INFO sqlalchemy.engine.base.Engine {
'when_utc': datetime.datetime(2016, 7, 11, 23, 0, tzinfo=<UTC>), 'y': 0, 'x': 0}
So it inserts in UTC 11/7/2016 23:00:00. When I query it in command line psql, this is what I find:
fpp=> select x,y,when_utc from foo;
x | y | when_utc
---+---+---------------------
0 | 0 | 2016-07-12 00:00:00
(1 row)
What is going on? I am adamant nothing modifies the field in between. It just seems to add the DST hour to my database entry. Why? How can I avoid this?
R

The problem is that your column type is timestamp without time zone, when it should instead be timestamp with time zone. This can be achieved in SqlAlchemy with DateTime(timezone=True) when declaring the column. Unfortunately the default is False... See the documentation for more information https://docs.sqlalchemy.org/en/13/core/type_basics.html#sqlalchemy.types.DateTime

I was struggling with this problem as well. I found the previous answer very helpful if you are using sqlalchemy core. But, it wasn't my case as I was using sqlalchemy orm. If you are using orm you might be interested in how I solved it.
I did it by mapping the column like this:
import datetime
from sqlalchemy import DateTime
from sqlalchemy.orm import Mapped
from sqlalchemy.orm import mapped_column
class Event(Base):
__tablename__ = "event"
occurred_on: Mapped[datetime.datetime] = mapped_column(DateTime(timezone=True))

Related

Store timestamp to BigQuery with specific timezone

I have a csv with timestamp in UTC8.
whatever.csv:
timestamp
2020-09-09 11:42:33
2020-09-09 11:42:51
2020-09-09 11:49:29
I want to store them in BQ. After storing to BQ, this is the result I'm getting :
It said UTC instead of UTC+8.
However, the timestamp is correct but is there any way I can store it like this 2020-09-11 19:58:51 UTC+8 ? or something related as long as it reflect the actual timezone of the timestamp..
Secondly, can I specify the requirement in field schema since I'm storing this using python script and mapping it through schema from a YAML file such as :
somefile.yaml:
schema:
- name: "timestamp"
type: "TIMESTAMP"
mode: "NULLABLE"
You may need to state more about what do you want to achieve to get better help.
For one, BigQuery always stores TIMESTAMP in UTC. I have to guess that you don't really need the timestamp to be stored in certain timezone (because I can't imagine why does it matter to you how the ts is stored), you care more about how to display the timestamp in UTC+8. If my guess is right, there are 2 ways:
SELECT STRING(TIMESTAMP "2008-12-25 15:30:00+00", "UTC+8")
The approach requires you to decorate each of your TS columns, an one thing for all approach could be
SET ##time_zone = "Asia/Shanghai";
-- All subsequent query will use time zone "Asia/Shanghai"
SELECT STRING(TIMESTAMP "2008-12-25 15:30:00+00");
Both ones output:
+------------------------+
| f0_ |
+------------------------+
| 2008-12-25 23:30:00+08 |
+------------------------+

Python - Filtering SQL query based on dates

I am trying to build a SQL query that will filter based on system date (Query for all sales done in the last 7 days):
import datetime
import pandas as pd
import psycopg2
con = p.connect(db_details)
cur = con.cursor()
df = pd.read_sql("""select store_name,count(*) from sales
where created_at between datetime.datetime.now() - (datetime.today() - timedelta(7))""",con=con)
I get an error
psycopg2.NotSupportedError: cross-database references are not implemented: datetime.datetime.now
You are mixing Python syntax into your SQL query. SQL is parsed and executed by the database, not by Python, and the database knows nothing about datetime.datetime.now() or datetime.date() or timedelta()! The specific error you see is caused by your Python code being interpreted as SQL instead and as SQL, datetime.datetime.now references the now column of the datetime table in the datetime database, which is a cross-database reference, and psycopg2 doesn't support queries that involve multiple databases.
Instead, use SQL parameters to pass in values from Python to the database. Use placeholders in the SQL to show the database driver where the values should go:
params = {
# all rows after this timestamp, 7 days ago relative to 'now'
'earliest': datetime.datetime.now() - datetime.timedelta(days=7),
# if you must have a date *only* (no time component), use
# 'earliest': datetime.date.today() - datetime.timedelta(days=7),
}
df = pd.read_sql("""
select store_name,count(*) from sales
where created_at >= %(latest)s""", params=params, con=con)
This uses placeholders as defined by the psycopg2 parameters documentation, where %(latest)s refers to the latest key in the params dictionary. datetime.datetime() instances are directly supported by the driver.
Note that I also fixed your 7 days ago expression, and replaced your BETWEEN syntax with >=; without a second date you are not querying for values between two dates, so use >= to limit the column to dates at or after the given date.
datetime.datetime.now() is not a proper SQL syntax, and thus cannot be executed by read_sql(). I suggest either using the correct SQL syntax that computes current time, or creating variables for each datetime.datetime.now() and datetime.today() - timedelta(7) and replacing them in your string.
edit: Do not follow the second suggestion. See comments below by Martijn Pieters.
Maybe you should remove that Python code inside your SQL, compute your dates in python and then use the strftime function to convert them to strings.
Then you'll be able to use them in your SQL query.
Actually, you do not necessarily need any params or computations in Python. Just use the corresponding SQL statement which should look like this:
select store_name,count(*)
from sales
where created_at >= now()::date - 7
group by store_name
Edit: I also added a group by which I think is missing.

Django/Python and Raw SQL Querying with PostgreSQL

I'm practicing my raw SQL querying in a Django project using cursor.execute.
Here's my Django models.py database schema:
class Client(models.Model):
date_incorporated = models.DateTimeField(default=timezone.now)
And here's the psql description of the table:
# \d+ client
Column | Type | Modifiers | Storage |
-------------------+--------------------------+--------------------+----------+
date_incorporated | timestamp with time zone | not null | plain |
Here's where I get confused:
If I use psql to query the data from the table, I get:
# SELECT date_incorporated FROM client;
date_incorporated
------------------------
2017-06-14 19:42:15-04
2017-11-02 19:42:33-04
(2 rows)
This makes sense to me. In the PostgreSQL docs, it shows that this is (I believe) just a string that is correctly formatted and stored as a UTC timestamp.
When I go through Django using this query:
cursor.execute('SELECT date_incorporated FROM client;')
data = [dict(zip(columns,row)) for row in cursor.fetchall()]
(using the dictfetchall method from the Django docs)
...my date_incorporated field gets turned into a python datetime object.
{'date_incorporated': datetime.datetime(2017, 11, 2, 23, 42, 33, tzinfo=<UTC>)}
In this app I'm building, I wanted a user to be able to input raw SQL, and put that inputted string into the cursor.execute(rawSQL) function to be executed. I expected the output to be the same as the psql version.
If I was using the Django ORM, I might've expected the timestamp with time zone to be converted to a time-zone aware datetime object, but since I'm doing a raw SQL call, I expected to get back 2017-06-14 19:42:15-04, not a python datetime object.
Is the fetchall method still acting as the Django ORM and converting certain fields?
I believe this is standard conversion from using any interface driver.
You would get the same result even if you use py-postgressql, i.e. the cursor is doing the conversion according to the field type defined in the database.
Long story short, the dictfetchall is not doing any conversion, but rather parsing the converted result from the cursor.

How to retrieve only the year from timestamp column?

I have the following query that runs correctly on Postgres 9.3:
select distinct date_part('year', date_created)
from "Topic";
The intention is to return only the distinct years on the column date_created which is created thus:
date_created | timestamp with time zone | not null default now()
I need to turn it into a SQLAlchemy query but what I wrote does a select distinct on the date_created, not on the year, and returns the whole row, not just the distinct value:
topics = Topic.query.distinct(func.date_part('YEAR', Topic.date_created)).all()
How can I get only the distinct years from the table Topic?
Here are two variants:
Using ORM:
from sqlalchemy import func, distinct
result = session.query(distinct(func.date_part('YEAR', Topic.date_created)))
for row in result:
print(row[0])
SQL Expression:
from sqlalchemy import func, select, distinct
query = select([distinct(func.date_part('YEAR', Topic.date_created))])
for row in session.execute(query):
print(row[0])
SQL Alchemy syntax aside, you have a potential problem in your query.
Your data type is timestamptz (timestamp with time zone), which is a good choice. However, you cannot tell the year reliably form a timestamptz alone, you need to specify the time zone additionally. If you don't, the current time zone setting of the session is applied silently, which may or may not work for you.
Think of New Year's Eve: timestamptz '2016-01-01 04:00:00+00' - what year is it?
It's 2016 in Europe, but still 2015 in the USA.
You should make that explicit with the AT TIME ZONE construct to avoid sneaky mistakes:
SELECT extract(year FROM timestamptz '2016-01-01 04:00:00+00'
AT TIME ZONE 'America/New_York') AS year;
Detailed explanation:
Ignoring timezones altogether in Rails and PostgreSQL
date_part() and extract() do the same in Postgres, extract() is the SQL standard, so rather use that.
SQL Fiddle.
BTW, you could also just:
SELECT extract(year FROM date_created) AS year
FROM "Topic"
GROUP BY 1;
Use extract function:
session.query(func.extract(Topic.date_created, 'year'))
this is a concept code, not tested.

SQLAlchemy: adding Integer column to DateTime

Given a table
class Order(db.Model):
created_at = db.Column(db.DateTime)
days = db.Column(db.Integer)
I need to compose a query with FROM statement like this: Order.created_at + Order.days < datetime.now(). The simplest way doesn't work, since the result of addition integer to datetime is double :) This conclusion I've made after practical experiment with MySQL.
After searching a little I've found out correct SQL statement for MySQL which solves described issue:
SELECT *
FROM orders o
WHERE o.created_at + INTERVAL o.days DAY < '2014-06-10 14:22:00';
And what I'd like to ask is how to code the query above for sqlalchemy filter?
I've found one article how to use Intervals here on stackoverflow, but DATEADD is missing in MySQL and I have no ideas where I need to bind an Interval.
Sorry for my poor English, I hope I could explain my problem correct :)
UPD: Maybe the right thing to define days as Interval, but currently it's not an option.
MySQL has a function called TIMESTAMPADD which seems to do what you want. Here's how to use it in a query.
session.query(Order).filter(func.timestampadd(text('DAY'), Order.days, Order.created_at) < datetime(2014, 6, 10, 14, 22))
from sqlalchemy import select, func
query = (session.query(Orders).filter(
func.DATEADD('day', Orders.days, Orders.created_at) == '2014-06-10 14:22:00'))
Old question, but for anyone using PSQL / SQlAlchemy 1.0.10+ you can use the below:
from sqlalchemy.sql import cast
db = SQLAlchemy() # Instantiated
Order.query.filter(
Order.created_at > cast('1 DAY', db.Interval) * Order.day
).all()

Categories

Resources