sql: select list of columns - python

I want to pass an str or list argument and want that sql knows how to treat it.
Example of list_col='date1, date2, date3, date4' and at the end i want to have dataframe
date1, date2, date3, id
query = """
SELECT {list_col} AT TIME ZONE 'Europe/Paris' as {list_col}, {table}.{id}
FROM {table}
ORDER BY {table}.{id}
"""
def fun_query(table_name, list_col, id):
return query.format(table=table_name, list_col=list_col, id=id)
Does anyone knows how to do it please?

As already noted this is not doable in a way you suggested because both AT TIME ZONE and AS clauses should appear along with each column. I would suggest doing something like this.
query = """
SELECT {date_cols_as_tz}, {table}.{id}
FROM {table}
ORDER BY {table}.{id}
"""
def fun_query(table_name, list_col, id, tz="'Europe/Paris'"):
date_cols_as_tz = ",".join((f"{c} AT TIME ZONE {tz} as {c}" for c in list_col))
return query.format(date_cols_as_tz=date_cols_as_tz, table=table_name, list_col=list_col, id=id)
When you call e.g. fun_query("my_table", ["date1", "date2"], "table_id") and print it you get following query:
SELECT date1 AT TIME ZONE 'Europe/Paris' as date1,date2 AT TIME ZONE 'Europe/Paris' as date2, my_table.table_id
FROM my_table
ORDER BY my_table.table_id
The major changes are:
create date_cols_as_tz inside the fun_query
use real list for list_col parameter (not string like "date1,date2" but list like ["date1", "date2"])
added optional tz parameter to the function
The advantage of this solution is that you can easily change the timezone by using different value for tz instead of hard coded value.
Also note that this function expects that all columns in list_col are dates (but that's probably what you expect if I understood your question correctly).

Related

SQLAlchemy Python 3.8 work with renamed columns [duplicate]

Using the following SQL expression but I'm getting an error.
select
CampaignCustomer.CampaignCustomerID,
convert(varchar, CampaignCustomer.ModifiedDate, 111) as startdate,
CampaignCustomer.CampaignID,
CampaignCustomer.CampaignCallStatusID,
CampaignCustomer.UserID,
CampaignCustomerSale.Value,
Users.Name
from CampaignCustomer
inner join CampaignCustomerSale
on CampaignCustomer.CampaignCustomerID = CampaignCustomerSale.CampaignCustomerID
inner join Users
on CampaignCustomer.UserID = Users.UserID
where
CampaignCustomer.CampaignCallStatusID = 21
and CampaignCustomer.startdate = '2011/11/22' <------- THIS
order by
startdate desc,
Users.Name asc
Error:
Msg 207, Level 16, State 1, Line 1
Invalid column name 'startdate'.
I can't recognize my alias name startdate in the WHERE clause, but it can in my ORDER BY clause. What's wrong?
Edit:
And no, it is not possible for me to change the datatype to date instead of datetime. The time is needed elsewhere. But in this case, I need only to get all posts on a specific date and I really don't care about what time of the date the modifieddate is :)
Maybe another method is needed instead of convert()?
You can't use column alias in WHERE clause.
Change it to:
where
CampaignCustomer.CampaignCallStatusID = 21
and convert(varchar, CampaignCustomer.ModifiedDate, 111) = '2011/11/22'
Do this:
select
CampaignCustomer.CampaignCustomerID,
convert(varchar, CampaignCustomer.ModifiedDate, 111) as startdate,
CampaignCustomer.CampaignID,
CampaignCustomer.CampaignCallStatusID,
CampaignCustomer.UserID,
CampaignCustomerSale.Value,
Users.Name
from CampaignCustomer
inner join CampaignCustomerSale
on CampaignCustomer.CampaignCustomerID = CampaignCustomerSale.CampaignCustomerID
inner join Users
on CampaignCustomer.UserID = Users.UserID
where
CampaignCustomer.CampaignCallStatusID = 21
and convert(varchar, CampaignCustomer.ModifiedDate, 111) = '2011/11/22'
order by
startdate desc,
Users.Name asc
You need to put in your where clause no aliases, and in the above query I replaced your alias with what it represents.
You didn't mention what version of SQL Server you're using - but if you're on 2008 or newer, you could use:
where
CampaignCustomer.CampaignCallStatusID = 21
and CAST(CampaignCustomer.ModifiedDate AS DATE) = '20111122'
You could cast it to a DATE - just for this comparison.
Also: I would recommend to always use the ISO-8601 standard format of representing a date if you need to compare a date to string - ISO-8601 defines a date as YYYYMMDD and is the only format in SQL Server that will always work - no matter what language/regional settings you have. Any other string representation of a date is always subject to settings in your SQL Server - it might work for you, but I bet for someone else, it will break....

SQLite query in Python using DATETIME and variables not working as expected

I'm trying to query a database using Python/Pandas. This will be a recurring request where I'd like to look back into a window of time that changes over time, so I'd like to use some smarts in how I do this.
In my SQLite query, if I say
WHERE table.date BETWEEN DATETIME('now', '-6 month') AND DATETIME('now')
I get the result I expect. But if I try to move those to variables, the resulting table comes up empty. I found out that the endDate variable does work but the startDate does not. Presumably I'm doing something wrong with the escapes around the apostrophes? Since the result is coming up empty it's like it's looking at DATETIME(\'now\') and not seeing the '-6 month' bit (comparing now vs. now which would be empty). Any ideas how I can pass this through to the query correctly using Python?
startDate = 'DATETIME(\'now\', \'-6 month\')'
endDate = 'DATETIME(\'now\')'
query = '''
SELECT some stuff
FROM table
WHERE table.date BETWEEN ? AND ?
'''
df = pd.read_sql_query(query, db, params=[startDate, endDate])
You can try with the string format as shown below,
startDate = "DATETIME('now', '-6 month')"
endDate = "DATETIME('now')"
query = '''
SELECT some stuff
FROM table
WHERE table.date BETWEEN {start_date} AND {end_data}
'''
df = pd.read_sql_query(query.format(start_date=startDate, end_data=endDate), db)
When you provide parameters to a query, they're treated as literals, not expressions that SQL should evaluate.
You can pass the function arguments rather than the function as a string.
startDate = 'now'
startOffset = '-6 month'
endDate = 'now'
endOffset = '+0 seconds'
query = '''
SELECT some stuff
FROM table
WHERE table.date BETWEEN DATETIME(?, ?) AND DATETIME(?, ?)
'''
df = pd.read_sql_query(query, db, params=[startDate, startOffset, endDate, endOffset])

peewee select() return SQL query, not the actual data

I'm trying sum up the values in two columns and truncate my date fields by the day. I've constructed the SQL query to do this(which works):
SELECT date_trunc('day', date) AS Day, SUM(fremont_bridge_nb) AS
Sum_NB, SUM(fremont_bridge_sb) AS Sum_SB FROM bike_count GROUP BY Day
ORDER BY Day;
But I then run into issues when I try to format this into peewee:
Bike_Count.select(fn.date_trunc('day', Bike_Count.date).alias('Day'),
fn.SUM(Bike_Count.fremont_bridge_nb).alias('Sum_NB'),
fn.SUM(Bike_Count.fremont_bridge_sb).alias('Sum_SB'))
.group_by('Day').order_by('Day')
I don't get any errors, but when I print out the variable I stored this in, it shows:
<class 'models.Bike_Count'> SELECT date_trunc(%s, "t1"."date") AS
Day, SUM("t1"."fremont_bridge_nb") AS Sum_NB,
SUM("t1"."fremont_bridge_sb") AS Sum_SB FROM "bike_count" AS t1 ORDER
BY %s ['day', 'Day']
The only thing that I've written in Python to get data successfully is:
Bike_Count.get(Bike_Count.id == 1).date
If you just stick a string into your group by / order by, Peewee will try to parameterize it as a value. This is to avoid SQL injection haxx.
To solve the problem, you can use SQL('Day') in place of 'Day' inside the group_by() and order_by() calls.
Another way is to just stick the function call into the GROUP BY and ORDER BY. Here's how you would do that:
day = fn.date_trunc('day', Bike_Count.date)
nb_sum = fn.SUM(Bike_Count.fremont_bridge_nb)
sb_sum = fn.SUM(Bike_Count.fremont_bridge_sb)
query = (Bike_Count
.select(day.alias('Day'), nb_sum.alias('Sum_NB'), sb_sum.alias('Sum_SB'))
.group_by(day)
.order_by(day))
Or, if you prefer:
query = (Bike_Count
.select(day.alias('Day'), nb_sum.alias('Sum_NB'), sb_sum.alias('Sum_SB'))
.group_by(SQL('Day'))
.order_by(SQL('Day')))

Get range of columns from Cassandra based on TimeUUIDType using Python and the datetime module

I've got a table set up like so:
{"String" : {uuid1 : "String", uuid1: "String"}, "String" : {uuid : "String"}}
Or...
Row_validation_class = UTF8Type
Default_validation_class = UTF8Type
Comparator = UUID
(It's basically got website as a row label, and has dynamically generated columns based on datetime.datetime.now() with TimeUUIDType in Cassandra and a string as the value)
I'm looking to use Pycassa to retrieve slices of the data based on both the row and the columns. However, on other (smaller) tables I've done this but by downloading the whole data set (or at least filtered to one row) and then had an ordered dictionary I could compare with datetime objects.
I'd like to be able to use something like the Pycassa multiget or get_indexed_slice function to pull certain columns and rows. Does something like this exist that allows filtering on datetime. All my current attempts result in the following error message:
TypeError: can't compare datetime.datetime to UUID
The best I've managed to come up with so far is...
def get_number_of_visitors(site, start_date, end_date=datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S:%f")):
pool = ConnectionPool('Logs', timeout = 2)
col_fam = ColumnFamily(pool, 'sessions')
result = col_fam.get(site)
number_of_views = [(k,v) for k,v in col_fam.get(site).items() if get_posixtime(k) > datetime.datetime.strptime(str(start_date), "%Y-%m-%d %H:%M:%S:%f") and get_posixtime(k) < datetime.datetime.strptime(str(end_date), "%Y-%m-%d %H:%M:%S:%f")]
total_unique_sessions = len(number_of_views)
return total_unique_sessions
With get_posixtime being defined as:
def get_posixtime(uuid1):
assert uuid1.version == 1, ValueError('only applies to type 1')
t = uuid1.time
t = (t - 0x01b21dd213814000L)
t = t / 1e7
return datetime.datetime.fromtimestamp(t)
This doesn't seem to work (isn't returning the data I'd expect) and also feels like it shouldn't be necessary. I'm creating the column timestamps using:
timestamp = datetime.datetime.now()
Does anybody have any ideas? It feels like this is the sort of thing that Pycassa (or another python library) would support but I can't figure out how to do it.
p.s. table schema as described by cqlsh:
CREATE COLUMNFAMILY sessions (
KEY text PRIMARY KEY
) WITH
comment='' AND
comparator='TimeUUIDType' AND
row_cache_provider='ConcurrentLinkedHashCacheProvider' AND
key_cache_size=200000.000000 AND
row_cache_size=0.000000 AND
read_repair_chance=1.000000 AND
gc_grace_seconds=864000 AND
default_validation=text AND
min_compaction_threshold=4 AND
max_compaction_threshold=32 AND
row_cache_save_period_in_seconds=0 AND
key_cache_save_period_in_seconds=14400 AND
replicate_on_write=True;
p.s.
I know you can specify a column range in Pycassa but I won't be able to guarantee that the start and end values of the range will have entries for each of the rows and hence the column may not exist.
You do want to request a "slice" of columns using the column_start and column_finish parameters to get(), multiget(), get_count(), get_range(), etc. For TimeUUIDType comparators, pycassa actually accepts datetime instances or timestamps for those two parameters; it will internally convert them to a TimeUUID-like form with a matching timestamp component. There's a section of the documentation dedicated to working with TimeUUIDs that provides more details.
For example, I would implement your function like this:
def get_number_of_visitors(site, start_date, end_date=None):
"""
start_date and end_date should be datetime.datetime instances or
timestamps like those returned from time.time().
"""
if end_date is None:
end_date = datetime.datetime.now()
pool = ConnectionPool('Logs', timeout = 2)
col_fam = ColumnFamily(pool, 'sessions')
return col_fam.get_count(site, column_start=start_date, column_finish=end_date)
You could use the same form with col_fam.get() or col_fam.xget() to get the actual list of visitors.
P.S. try not to create a new ConnectionPool() for every request. If you have to, set a lower pool size.

Django GROUP BY strftime date format

I would like to do a SUM on rows in a database and group by date.
I am trying to run this SQL query using Django aggregates and annotations:
select strftime('%m/%d/%Y', time_stamp) as the_date, sum(numbers_data)
from my_model
group by the_date;
I tried the following:
data = My_Model.objects.values("strftime('%m/%d/%Y',
time_stamp)").annotate(Sum("numbers_data")).order_by()
but it seems like you can only use column names in the values() function; it doesn't like the use of strftime().
How should I go about this?
This works for me:
select_data = {"d": """strftime('%%m/%%d/%%Y', time_stamp)"""}
data = My_Model.objects.extra(select=select_data).values('d').annotate(Sum("numbers_data")).order_by()
Took a bit to figure out I had to escape the % signs.
As of v1.8, you can use Func() expressions.
For example, if you happen to be targeting AWS Redshift's date and time functions:
from django.db.models import F, Func, Value
def TimezoneConvertedDateF(field_name, tz_name):
tz_fn = Func(Value(tz_name), F(field_name), function='CONVERT_TIMEZONE')
dt_fn = Func(tz_fn, function='TRUNC')
return dt_fn
Then you can use it like this:
SomeDbModel.objects \
.annotate(the_date=TimezoneConvertedDateF('some_timestamp_col_name',
'America/New_York')) \
.filter(the_date=...)
or like this:
SomeDbModel.objects \
.annotate(the_date=TimezoneConvertedDateF('some_timestamp_col_name',
'America/New_York')) \
.values('the_date') \
.annotate(...)
Any reason not to just do this in the database, by running the following query against the database:
select date, sum(numbers_data)
from my_model
group by date;
If your answer is, the date is a datetime with non-zero hours, minutes, seconds, or milliseconds, my answer is to use a date function to truncate the datetime, but I can't tell you exactly what that is without knowing what RBDMS you're using.
I'm not sure about strftime, my solution below is using sql postgres trunc...
select_data = {"date": "date_trunc('day', creationtime)"}
ttl = ReportWebclick.objects.using('cms')\
.extra(select=select_data)\
.filter(**filters)\
.values('date', 'tone_name', 'singer', 'parthner', 'price', 'period')\
.annotate(loadcount=Sum('loadcount'), buycount=Sum('buycount'), cancelcount=Sum('cancelcount'))\
.order_by('date', 'parthner')
-- equal to sql query execution:
select date_trunc('month', creationtime) as date, tone_name, sum(loadcount), sum(buycount), sum(cancelcount)
from webclickstat
group by tone_name, date;
my solution like this when my db is mysql:
select_data = {"date":"""FROM_UNIXTIME( action_time,'%%Y-%%m-%%d')"""}
qs = ViewLogs.objects.filter().extra(select=select_data).values('mall_id', 'date').annotate(pv=Count('id'), uv=Count('visitor_id', distinct=True))
to use which function, you can read mysql datetime processor docs like DATE_FORMAT,FROM_UNIXTIME...

Categories

Resources