Using BigQuery SQL with Built-in Python Functions - python

I recently started using Google's BigQuery service, and their Python API, to query some large databases. I'm new to SQL, and the BigQuery documentation isn't incredibly helpful for what I'm doing.
Currently I'm looking through the reddit_comments database, and there's 'created_utc' tag that I'm trying to filter by. This created_utc field is in terms of Unix timestamps (i.e. November 1st, 12:00 AM is 1541030400)
I'd like to grab comments day by day (or between two Unix timestamps) but in a way that I'm iterating over each day. Something like:
from datetime import datetime, timedelta
start = datetime.fromtimestamp(1538352000)
end = datetime.fromtimestamp(1541030400)
time = start
while time < end:
print(time)
time = time + timedelta(days = 1)
Printing times here yield one like: 2018-09-30 20:00:00
However in order to query, I have to convert back to the Unix timestamp by invoking datetime's timestamp() function like time.timestamp()
The problem is, I'm trying to use the timestamp() function inside the query like so:
SELECT *
FROM 'fh-bigquery.reddit_comments.2018_10'
...
AND (created_utc >= curr_day.timestamp() AND created_utc <= next_day.timestamp())
however, it's throwing a BadRequest: 400 Function not found. Is there a way to use built-in Python functions in the way that I've described above? Or does there need to be some alternative?
Everything so far seems pretty intuitive, but it's weird that I can't find much helpful information on this specifically.

You should use BigQuery's Built-in functions
For example:
To get current timestamp - CURRENT_TIMESTAMP()
To get timestamp of start of current date - TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY)
To get timestamp of start of next date - TIMESTAMP_TRUNC(TIMESTAMP_ADD(CURRENT_TIMESTAMP() , INTERVAL 1 DAY), DAY)
and so on
Also, to convert created_utc to TIMESTAMP type - you can use TIMESTAMP_SECONDS(created_utc)
You can see more about TIMESTAMP Functions

Related

How to count the date between 1945 and 1950 year in SQLite? [duplicate]

I can't seem to get reliable results from the query against a sqlite database using a datetime string as a comparison as so:
select *
from table_1
where mydate >= '1/1/2009' and mydate <= '5/5/2009'
how should I handle datetime comparisons to sqlite?
update:
field mydate is a DateTime datatype
To solve this problem, I store dates as YYYYMMDD. Thus,
where mydate >= '20090101' and mydate <= '20050505'
It just plain WORKS all the time. You may only need to write a parser to handle how users might enter their dates so you can convert them to YYYYMMDD.
SQLite doesn't have dedicated datetime types, but does have a few datetime functions. Follow the string representation formats (actually only formats 1-10) understood by those functions (storing the value as a string) and then you can use them, plus lexicographical comparison on the strings will match datetime comparison (as long as you don't try to compare dates to times or datetimes to times, which doesn't make a whole lot of sense anyway).
Depending on which language you use, you can even get automatic conversion. (Which doesn't apply to comparisons in SQL statements like the example, but will make your life easier.)
I had the same issue recently, and I solved it like this:
SELECT * FROM table WHERE
strftime('%s', date) BETWEEN strftime('%s', start_date) AND strftime('%s', end_date)
The following is working fine for me using SQLite:
SELECT *
FROM ingresosgastos
WHERE fecharegistro BETWEEN "2010-01-01" AND "2013-01-01"
Following worked for me.
SELECT *
FROM table_log
WHERE DATE(start_time) <= '2017-01-09' AND DATE(start_time) >= '2016-12-21'
Sqlite can not compare on dates. we need to convert into seconds and cast it as integer.
Example
SELECT * FROM Table
WHERE
CAST(strftime('%s', date_field) AS integer) <=CAST(strftime('%s', '2015-01-01') AS integer) ;
I have a situation where I want data from up to two days ago and up until the end of today.
I arrived at the following.
WHERE dateTimeRecorded between date('now', 'start of day','-2 days')
and date('now', 'start of day', '+1 day')
Ok, technically I also pull in midnight on tomorrow like the original poster, if there was any data, but my data is all historical.
The key thing to remember, the initial poster excluded all data after 2009-11-15 00:00:00. So, any data that was recorded at midnight on the 15th was included but any data after midnight on the 15th was not.
If their query was,
select *
from table_1
where mydate between Datetime('2009-11-13 00:00:00')
and Datetime('2009-11-15 23:59:59')
Use of the between clause for clarity.
It would have been slightly better. It still does not take into account leap seconds in which an hour can actually have more than 60 seconds, but good enough for discussions here :)
I had to store the time with the time-zone information in it, and was able to get queries working with the following format:
"SELECT * FROM events WHERE datetime(date_added) BETWEEN
datetime('2015-03-06 20:11:00 -04:00') AND datetime('2015-03-06 20:13:00 -04:00')"
The time is stored in the database as regular TEXT in the following format:
2015-03-06 20:12:15 -04:00
Right now i am developing using System.Data.SQlite NuGet package (version 1.0.109.2). Which using SQLite version 3.24.0.
And this works for me.
SELECT * FROM tables WHERE datetime
BETWEEN '2018-10-01 00:00:00' AND '2018-10-10 23:59:59';
I don't need to use the datetime() function. Perhaps they already updated the SQL query on that SQLite version.
Below are the methods to compare the dates but before that we need to identify the format of date stored in DB
I have dates stored in MM/DD/YYYY HH:MM format so it has to be compared in that format
Below query compares the convert the date into MM/DD/YYY format and get data from last five days till today. BETWEEN operator will help and you can simply specify start date AND end date.
select * from myTable where myColumn BETWEEN strftime('%m/%d/%Y %H:%M', datetime('now','localtime'), '-5 day') AND strftime('%m/%d/%Y %H:%M',datetime('now','localtime'));
Below query will use greater than operator (>).
select * from myTable where myColumn > strftime('%m/%d/%Y %H:%M', datetime('now','localtime'), '-5 day');
All the computation I have done is using current time, you can change the format and date as per your need.
Hope this will help you
Summved
You could also write up your own user functions to handle dates in the format you choose. SQLite has a fairly simple method for writing your own user functions. For example, I wrote a few to add time durations together.
My query I did as follows:
SELECT COUNT(carSold)
FROM cars_sales_tbl
WHERE date
BETWEEN '2015-04-01' AND '2015-04-30'
AND carType = "Hybrid"
I got the hint by #ifredy's answer. The all I did is, I wanted this query to be run in iOS, using Objective-C. And it works!
Hope someone who does iOS Development, will get use out of this answer too!
Here is a working example in C# in three ways:
string tableName = "TestTable";
var startDate = DateTime.Today.ToString("yyyy-MM-dd 00:00:00"); \\From today midnight
var endDate = date.AddDays(1).ToString("yyyy-MM-dd HH:mm:ss"); \\ Whole day
string way1 /*long way*/ = $"SELECT * FROM {tableName} WHERE strftime(\'%s\', DateTime)
BETWEEN strftime('%s', \'{startDate}\') AND strftime('%s', \'{endDate}\')";
string way2= $"SELECT * FROM {tableName} WHERE DateTime BETWEEN \'{startDate}\' AND \'{endDate}\'";
string way3= $"SELECT * FROM {tableName} WHERE DateTime >= \'{startDate}\' AND DateTime <=\'{endDate}\'";
select *
from table_1
where date(mydate) >= '1/1/2009' and date(mydate) <= '5/5/2009'
This work for me

Discrepancies when subtracting dates with timestamp in SQLAlchemy and Postgresql

I have some discrepancy when subtracting dates in Postgresql and SQLAlchemy. For instance, I have the following in Postgresql:
SELECT trunc(EXTRACT(EPOCH FROM ('2019-07-05 15:20:10.111497-07:00'::timestamp - '2019-07-04 11:45:17.293328-07:00'::timestamp)))
--99292
and the following query in SQLAlchemy:
date_diff = session.query(func.trunc((func.extract('epoch',
func.date('2019-07-05 15:20:10.111497-07:00'))-
func.extract('epoch',
func.date('2019-07-04 11:45:17.293328-07:00'))))).all()
print(date_diff)
#[(86400.0,)]
We can see that the most exact difference is coming from Postgresql query. How can I get the same result using SQLAlchemy? I have not been able to spot what is the cause of this difference. If you know please let me know.
Thanks a lot.
Have never used SQLAlchemy before but it looks like you are trying to truncate to a date instead of a timestamp or datetime
Don't worry, this is an easy mistake to make. DateTime libraries can be confusing with their definitions (a date is a literally a Date so YYYY-MM-DD whereas a timestamp includes both the date and time to some denomination)
This is why you have a difference of 86,400 (one day) because it is comparing the dates of the two objects (2019-07-05 - 2019-07-04)
Try using the func.time.as_utc() or something similar to get a timestamp
You want to be comparing the WHOLE timestamp
EDIT: Sorry, didn't see your comment until after posting.

String to datetime for Google Calendar API

I've gotten some events data from a website and stored the dates and starting times as string variables in python. My aim is to iterate over a for loop and over each iteration, create and add a new event to a google calendar using the Google Calendar API. I've stored the date and start/end times for each event as string variables so theoretically I would have:
date='2019-11-01'
start_time='10:00am'
end_time='11:00am'
I'd gotten so far until I realised that the way one must format the date and start/end times for an event is as follows:
'dateTime': '2015-05-28T09:00:00-07:00'
where if I am not mistaken, the RHS is a datetime object rather than a string. At first I thought I'd try sticking all my strings together with a T in between the date and time out of desperation, but obviously that didn't work because the object isn't supposed to be a string. I was wondering if there was any way I could use the variables I've obtained to create a new google event, or whether I've reached a dead end?
Many thanks in advance.
To add an event using Google Calendar's API you need a start_date and an end_date as datetime objects.
The following code creates datetime objects with your strings.
If you print those timeobjects with isoformat. you will see the "T" that you mentioned.
import datetime
date='2019-11-01'
start_time='1:20am'
end_time='11:00pm'
start_date_str = date + start_time
end_date_str = date + end_time
start_date = datetime.datetime.strptime(start_date_str, '%Y-%m-%d%I:%M%p')
end_date = datetime.datetime.strptime(end_date_str, '%Y-%m-%d%I:%M%p')
print(start_date.isoformat())
print(end_date.isoformat())

Python: How to extract time date specific information from text/nltk_contrib timex.py bug

I am new to python. I am looking for ways to extract/tag the date & time specific information from text
e.g.
1.I will meet you tomorrow
2. I had sent it two weeks back
3. Waiting for you last half an hour
I had found timex from nltk_contrib, however found couple of problems with it
https://code.google.com/p/nltk/source/browse/trunk/nltk_contrib/nltk_contrib/timex.py
b. Not sure of the Date data type passed to ground(tagged_text, base_date)
c. It deals only with date i.e. granularity at day level. Cant find expression like next one hour etc.
Thank you for your help
b) The data type that you need to pass to ground(tagged_text, base_date) is an instance of the datetime.date class which you'd initialize using something like:
from datetime import date
base_date = date.today()

Comparing a python date variable with timestamp from select query

I want to take some action based on comparing two dates. Date 1 is stored in a python variable. Date 2 is retrieved from the database in the select statement. For example I want to retrieve some records from the database where the associated date in the record (in form of the timestamp) is later than the date defined by the python variable. Preferably, I would like the comparison to be in readable date format rather than in timestamps.
I am a beginner with python.
----edit -----
Sorry for being ambiguous. Here's what I am trying to do:
import MySQLdb as mdb
from datetime import datetime
from datetime import date
import time
conn = mdb.connect('localhost','root','root','my_db')
cur = conn.cursor()
right_now = date.today()// python date
this is the part which I want to figure out
The database has a table which has timestamp. I want to compare that timestamp with this date and then retrieve records based on that comparison. For example I want to retrieve all records for which timestamp is above this date
cur.execute("SELECT created from node WHERE timestamp > right_now")
results = cur.fetchall()
for row in results:
print row
first of all, I guess Date 1 (python variable) is a datetime object. http://docs.python.org/2/library/datetime.html
As far as I have used it, MySQLdb gives you results in a (python) datetime object if the sql type was datetime.
So actually you have nothing to do, you can use python datetime comparison methods with date 1 and date 2.
I am a little bit confused about "comparison to be in readable date format rather than in timestamps". I mean the timestamps is readable enough, right?
If Date 1 is timestamps data, then you just simply do comparison. If not, then convert it to timestamps or convert the date in database to date type, both way works.
If you are asking how to write the code to do the comparison, you would use either '_mysql' or sqlalchemy to help you. The detailed syntax can be found at any where.
Anyway, the question itself is not clear enough, so the answer is blur, too.

Categories

Resources