Store timestamp to BigQuery with specific timezone - python

I have a csv with timestamp in UTC8.
whatever.csv:
timestamp
2020-09-09 11:42:33
2020-09-09 11:42:51
2020-09-09 11:49:29
I want to store them in BQ. After storing to BQ, this is the result I'm getting :
It said UTC instead of UTC+8.
However, the timestamp is correct but is there any way I can store it like this 2020-09-11 19:58:51 UTC+8 ? or something related as long as it reflect the actual timezone of the timestamp..
Secondly, can I specify the requirement in field schema since I'm storing this using python script and mapping it through schema from a YAML file such as :
somefile.yaml:
schema:
- name: "timestamp"
type: "TIMESTAMP"
mode: "NULLABLE"

You may need to state more about what do you want to achieve to get better help.
For one, BigQuery always stores TIMESTAMP in UTC. I have to guess that you don't really need the timestamp to be stored in certain timezone (because I can't imagine why does it matter to you how the ts is stored), you care more about how to display the timestamp in UTC+8. If my guess is right, there are 2 ways:
SELECT STRING(TIMESTAMP "2008-12-25 15:30:00+00", "UTC+8")
The approach requires you to decorate each of your TS columns, an one thing for all approach could be
SET ##time_zone = "Asia/Shanghai";
-- All subsequent query will use time zone "Asia/Shanghai"
SELECT STRING(TIMESTAMP "2008-12-25 15:30:00+00");
Both ones output:
+------------------------+
| f0_ |
+------------------------+
| 2008-12-25 23:30:00+08 |
+------------------------+

Related

How to insert a value into SQLAlchemy TIMESTAMP column

I am trying to set value for timestamp column using SQLAlchemy, but I experience following error:
column "timestamp" is of type timestamp without time zone but expression is of type numeric
My table looks as folows:
class SomeTable(db.Model):
timestamp = Column(TIMESTAMP)
Insertion try looks like this:
SomeTable(timestamp=time.time())
When I use datetime.now() instead of time.time() the record is inserted into the table but when I select it from the database it has following format:
2019-04-02 11:44:24.801046
So it looks like TIMESTAMP field does not store timestamp format.
Is it regular postgres behaviour? Or am I missing something?
I think that is good, because next is correct:
Column('timestamp', TIMESTAMP(timezone=False), nullable=False, default=datetime.now())
so by default you have datetime.now() there, it is about presentation
datetime.now() will give you '2019-04-02 14:21:33.715782
datetime.now().isoformat() will give you '2019-04-02T14:31:09.071147'
Check it with: http://www.sqlfiddle.com/#!15/d0c6a/8
If you uncomment third insert you will get exact same exception as yours

SQLite3 export Data

I am currently working on a python projekt with tensorflow and I need to preprocess my data.
The data I want to use is stored in an sqlite3 database with the columns:
timestamp|dev|event
10:00 |01 | on
11:00 |02 | off
11:15 |01 | off
11:30 |02 | on
And I would like to export the Data into a file (.csv) looking like that:
Timestamp|01 |02 |...
10:00 |on |0 |...
11:00 |on |off|...
11:15 |off|off|...
11:30 |off|on |...
Which always has the latest information of every Device associated with the current timestamp and with every new timestamp the old values should stay and if there is an update only those value(s) should be updated.
The number of Devices does not change and I can find that number with
SELECT COUNT(DISTINCT dev) FROM table01;
Currently that Number is 38 diffrent devices and a total of 10000 entries.
Is there a way to to this computation with sqlite3 or do I have to write a program in python to process the data. I am new to both topics.
~Fabian
You can work it in sqlite, something along this lines
select
timestamp,
group_concat(case when dev="01" then event else "" end, "") as D01,
group_concat(case when dev="02" then event else "" end, "") as D02
from
table01
group by
timestamp;
Basically you are pivoting the table.
Challenges are, as the pivot needs to be kind of dynamic i.e. the list of the devices are not fixed. You need to query the list of devices and then build this query i.e. case when else part based on the list of the devices.
Also, generally you need to group based on the timestamp, as for device status for different devices will be in different row for a single timestamp.
Also if the {timestamp, device} is not unique you need make it unique.

Django/Python and Raw SQL Querying with PostgreSQL

I'm practicing my raw SQL querying in a Django project using cursor.execute.
Here's my Django models.py database schema:
class Client(models.Model):
date_incorporated = models.DateTimeField(default=timezone.now)
And here's the psql description of the table:
# \d+ client
Column | Type | Modifiers | Storage |
-------------------+--------------------------+--------------------+----------+
date_incorporated | timestamp with time zone | not null | plain |
Here's where I get confused:
If I use psql to query the data from the table, I get:
# SELECT date_incorporated FROM client;
date_incorporated
------------------------
2017-06-14 19:42:15-04
2017-11-02 19:42:33-04
(2 rows)
This makes sense to me. In the PostgreSQL docs, it shows that this is (I believe) just a string that is correctly formatted and stored as a UTC timestamp.
When I go through Django using this query:
cursor.execute('SELECT date_incorporated FROM client;')
data = [dict(zip(columns,row)) for row in cursor.fetchall()]
(using the dictfetchall method from the Django docs)
...my date_incorporated field gets turned into a python datetime object.
{'date_incorporated': datetime.datetime(2017, 11, 2, 23, 42, 33, tzinfo=<UTC>)}
In this app I'm building, I wanted a user to be able to input raw SQL, and put that inputted string into the cursor.execute(rawSQL) function to be executed. I expected the output to be the same as the psql version.
If I was using the Django ORM, I might've expected the timestamp with time zone to be converted to a time-zone aware datetime object, but since I'm doing a raw SQL call, I expected to get back 2017-06-14 19:42:15-04, not a python datetime object.
Is the fetchall method still acting as the Django ORM and converting certain fields?
I believe this is standard conversion from using any interface driver.
You would get the same result even if you use py-postgressql, i.e. the cursor is doing the conversion according to the field type defined in the database.
Long story short, the dictfetchall is not doing any conversion, but rather parsing the converted result from the cursor.

SQL Timestamp in PostgreSQL

I'm trying to understand the raw manner in which PostgreSQL saves timestamp data types. I get 2 different results depending on the client I use:
1. psql
# SELECT date_incorporated FROM client;
date_incorporated
------------------------
2017-06-14 19:42:15-04
2. records python module
rows = db.query('SELECT date_incorporated FROM client')
print(rows[0])
# {"date_incorporated": "2017-06-14T19:42:15-04:00"}
Since the psql interface and records module are both supposed to be giving me back the raw data, I can't understand why both are giving me back different formats of the timestamp they have stored.
The two differences I see so far are the T's in the middle between the date and time in the records version, and also the differing ways in which it shows the time zone at the end of the string
Is one of them altering it? Which one is showing the real data?
https://www.postgresql.org/docs/current/static/datatype-datetime.html
All timezone-aware dates and times are stored internally in UTC. They
are converted to local time in the zone specified by the TimeZone
configuration parameter before being displayed to the client.
https://www.postgresql.org/docs/current/static/datatype-datetime.html#DATATYPE-DATETIME-OUTPUT
The output format of the date/time types can be set to one of the four
styles ISO 8601, SQL (Ingres), traditional POSTGRES (Unix date
format), or German. The default is the ISO format.
EG:
t=# select now();
now
-------------------------------
2017-11-29 09:07:31.716276+00
(1 row)
t=# set datestyle to SQL;
SET
t=# select now();
now
--------------------------------
11/29/2017 09:07:52.341416 UTC
(1 row)
so the time is saved not the way it is returned. at least not neseserely. You can control up to some level how it it returned to your client. psql does not process time. but python does. not records I believe but python itself
https://en.wikipedia.org/wiki/ISO_8601
T is the time designator that precedes the time components of the
representation.
And that T is definetely not added by postgres itself (unless you deliberately format the date with to_char)

Sqlalchemy, postgres datetime without timezone

I am trying to add timezone support to a python/sqlalchemy script. I have studied timezones and use pytz. I understand I should do as much as possible in UTC and only display local times. Due to the nature of the application, this is very easy.
Everything works, except that when I insert UTC data, it gets somehow converted to local time (BST) before entering the database, and I am completely lost why this happens and how I can avoid it.
My table (postgres) is defined as follows (relevant part only):
fpp=> \d foo;
Table "public.foo"
Column | Type | Modifiers
-----------+-----------------------------+-------------------------------------------------------------
x | integer |
y | integer |
when_utc | timestamp without time zone |
I have debugged sqlalchemy when it does an insert. This is what happens:
2016-07-28 17:16:27,896 INFO sqlalchemy.engine.base.Engine INSERT INTO
foo (x, y, "when_utc") VALUES (%(x)s, %(y)s, %(when_utc)s) RETURNING fb_foo.id
2016-07-28 17:16:27,896 INFO sqlalchemy.engine.base.Engine {
'when_utc': datetime.datetime(2016, 7, 11, 23, 0, tzinfo=<UTC>), 'y': 0, 'x': 0}
So it inserts in UTC 11/7/2016 23:00:00. When I query it in command line psql, this is what I find:
fpp=> select x,y,when_utc from foo;
x | y | when_utc
---+---+---------------------
0 | 0 | 2016-07-12 00:00:00
(1 row)
What is going on? I am adamant nothing modifies the field in between. It just seems to add the DST hour to my database entry. Why? How can I avoid this?
R
The problem is that your column type is timestamp without time zone, when it should instead be timestamp with time zone. This can be achieved in SqlAlchemy with DateTime(timezone=True) when declaring the column. Unfortunately the default is False... See the documentation for more information https://docs.sqlalchemy.org/en/13/core/type_basics.html#sqlalchemy.types.DateTime
I was struggling with this problem as well. I found the previous answer very helpful if you are using sqlalchemy core. But, it wasn't my case as I was using sqlalchemy orm. If you are using orm you might be interested in how I solved it.
I did it by mapping the column like this:
import datetime
from sqlalchemy import DateTime
from sqlalchemy.orm import Mapped
from sqlalchemy.orm import mapped_column
class Event(Base):
__tablename__ = "event"
occurred_on: Mapped[datetime.datetime] = mapped_column(DateTime(timezone=True))

Categories

Resources