SQL Query, Getting Rid of Repeat Selects - python

I'm using sqlite3 in python and I have a table that has the following columns:
recordid(int), username(text), locations(text), types(text), occupancy(int), time_added(datetime), token(text) and (undo).
I have the following query where I am selecting data from the table depending on what the occupancy is and the time added is between the 2 specified times the user inputs which is start_date and end_date:
('''SELECT locations, types,
(SELECT COUNT (occupancy) FROM traffic WHERE undo = 0 AND occupancy = 1 AND types = ? AND time_added BETWEEN ? AND ?),
(SELECT COUNT (occupancy) FROM traffic WHERE undo = 0 AND occupancy = 2 AND types = ? AND time_added BETWEEN ? AND ?),
(SELECT COUNT (occupancy) FROM traffic WHERE undo = 0 AND occupancy = 3 AND types = ? AND time_added BETWEEN ? AND ?),
(SELECT COUNT (occupancy) FROM traffic WHERE undo = 0 AND occupancy = 4 AND types = ? AND time_added BETWEEN ? AND ?),
FROM traffic WHERE types = ? GROUP BY type''',
(vehicle, start_date, end_date, vehicle, start_date, end_date, vehicle, start_date, end_date, vehicle, start_date, end_date, vehicle)
Is there anyway to condense this so I don't have to copy and paste the same thing multiple times just to change the occupancy? I tried using a for loop but that didn't really get me anywhere.
Cheers!

I'm pretty sure can simplify the query considerably:
SELECT type
SUM( occupancy = 1 ) as cnt_1,
SUM( occupancy = 2 ) as cnt_2,
SUM( occupancy = 3 ) as cnt_3,
SUM( occupancy = 4 ) as cnt_4
FROM traffic
WHERE undo = 0 AND
type = ? AND
time_added BETWEEN ? AND ?
GROUP BY type;
I'm not sure if that is exactly what your question has in mind, though.

Related

Combining Python variables into SQL queries

I am pulling data from an online database using SQL/postgresql queries and converting it into a Python dataframe using Pandas. I want to be able to change the dates in the SQL query from one point in my Python script instead of having to manually go through every SQL query and change it one by one as there are many queries and many lines in each one.
This is what I have to begin with for example:
random_query = """
select *
from table_A as a
where date_trunc('day',a.created_at) >= date('2022-03-01')
and date_trunc('day',a.created_at) <= date('2022-03-31')
group by 1,2,3
"""
Then I will read the data into Pandas as follows:
df_random_query = pd.read_sql(random_query, conn)
The connection above is to the database - the issue is not there so I am excluding that portion of code here.
What I have attempted is the following:
start_date = '2022-03-01'
end_date = '2022-03-31'
I have set the above 2 dates as variables and then below I have tried to use them in the SQL query as follows:
attempted_solution = """
select *
from table_A as a
where date_trunc('day',a.created_at) >= date(
""" + start_date + """)
and date_trunc('day',a.created_at) <= date(
""" + end_date + """)
group by 1,2,3
"""
This does run but it gives me a dataframe with no data in it - i.e. no numbers. I am not sure what I am doing wrong - any assistance will really help.
try dropping date function and formatting:
my_query = f"... where date_trunc('day', a.created_at) >= {start_date}"
I was able to work it out as follows:
start_date = '2022-03-01'
end_date = '2022-03-31'
random_query = f"""
select *
from table_A as a
where date_trunc('day',a.created_at) >= date('start_date')
and date_trunc('day',a.created_at) <= date('end_date')
group by 1,2,3
"""
It was amusing to see that all I needed to do was put start_date and end_date in ' ' as well. I noticed this simply by printing what query was showing in the script. Key thing here is to know how to troubleshoot.
Another option was also to use the .format() at the end of the query and inside it say .format(start_date = '2022-03-01', end_date = '2022-03-31').

Extracting max date from a database and use output in another query

I want to query max date in a table and use this as parameter in a where clausere in another query. I am doing this:
query = (""" select
cast(max(order_date) as date)
from
tablename
""")
cursor.execute(query)
d = cursor.fethcone()
as output:[(datetime.date(2021, 9, 8),)]
Then I want to use this output as parameter in another query:
query3=("""select * from anothertable
where order_date = d::date limit 10""")
cursor.execute(query3)
as output: column "d" does not exist
I tried to cast(d as date) , d::date but nothing works. I also tried to datetime.date(d) no success too.
What I am doing wrong here?
There is no reason to select the date then use it in another query. That requires 2 round trips to the server. Do it in a single query. This has the advantage of removing all client side processing of that date.
select *
from anothertable
where order_date =
( select max(cast(order_date as date ))
from tablename
);
I am not exactly how this translates into your obfuscation layer but, from what I see, I believe it would be something like.
query = (""" select *
from anothertable
where order_date =
( select max(cast(order_date as date ))
from tablename
) """)
cursor.execute(query)
Heed the warning by #OneCricketeer. You may need cast on anothertable order_date as well. So where cast(order_date as date) = ( select ... )

Update values in sqlite database when there are multiple with the same name

I'll do my best to explain my problem.
I'm working on cs50 C$50 Finanace currently implementing a function called sell. The purpose of this function is to update the cash value of a specific user into the database and update his portfolio.
I'm struggling with updating the portfolio database.
This is the database query for better clarification:
CREATE TABLE portfolio(id INTEGER, username TEXT NOT NULL, symbol TEXT NOT NULL, shares INTEGER, PRIMARY KEY(id));
Let's say I've these values in it:
id | username | symbol | shares
1 | eminem | AAPL | 20
2 | eminem | NFLX | 5
3 | eminem | AAPL | 5
And the user sells some of his stocks. I have to update the shares.
If it was for NFLX symbol it is easy. A simple query like the below is sufficient
db.execute("UPDATE portfolio SET shares=shares - ? WHERE username=?
AND symbol=?", int(shares), username, quote["symbol"])
However if I wanted the update the AAPL shares, here is where the problem arises. If I executed the above query, lets say the user sold 5 of his shares, the above query will change the AAPL shares in both ids 1 and 3 into 20, making the total shares of AAPL to 40 not 20.
Which approach should I consider? Should I group the shares based on symbol before inserting them into portfolio table. If so, how? or is there a query that could solve my problem?
If your version of SQLite is 3.33.0+, then use the UPDATE...FROM syntax like this:
UPDATE portfolio AS p
SET shares = (p.id = t.id) * t.shares_left
FROM (
SELECT MIN(id) id, username, symbol, shares_left
FROM (
SELECT *, SUM(shares) OVER (ORDER BY id) - ? shares_left -- change ? to the number of stocks the user sold
FROM portfolio
WHERE username = ? AND symbol = ?
)
WHERE shares_left >= 0
) AS t
WHERE p.username = t.username AND p.symbol = t.symbol AND p.id <= t.id;
The window function SUM() returns an incremental sum of the shares until it reaches the number of shares sold.
The UPDATE statement will set, in all rows with id less than than the first id that exceeds the sold stocks, the column shares to 0 and in the row with with id equal to the first id that exceeds the sold stocks to the difference between the incremental sum and the number of sold shares.
See a simplified demo.
For prior versions you can use this:
WITH
cte AS (
SELECT MIN(id) id, username, symbol, shares_left
FROM (
SELECT *, SUM(shares) OVER (ORDER BY id) - ? shares_left -- change ? to the number of stocks the user sold
FROM portfolio
WHERE username = ? AND symbol = ?
)
WHERE shares_left >= 0
)
UPDATE portfolio
SET shares = (id = (SELECT id FROM cte)) * (SELECT shares_left FROM cte)
WHERE (username, symbol) = (SELECT username, symbol FROM cte) AND id <= (SELECT id FROM cte)
See a simplified demo.

SQLAlchemy group by minute

The task is a grouping of datetime values (using SQLAlchemy) into per minute points (group by minute).
I have a custom SQL-query:
SELECT COUNT(*) AS point_value, MAX(time) as time
FROM `Downloads`
LEFT JOIN Mirror ON Downloads.mirror = Mirror.id
WHERE Mirror.domain_name = 'localhost.local'
AND `time` BETWEEN '2012-06-30 00:29:00' AND '2012-07-01 00:29:00'
GROUP BY DAYOFYEAR( time ) , ( 60 * HOUR( time ) + MINUTE(time ))
ORDER BY time ASC
It works great, but now I have do it in SQLAlchemy. This is what I've got for now (grouping by year is just an example):
rows = (DBSession.query(func.count(Download.id), func.max(Download.time)).
filter(Download.time >= fromInterval).
filter(Download.time <= untilInterval).
join(Mirror,Download.mirror==Mirror.id).
group_by(func.year(Download.time)).
order_by(Download.time)
)
It gives me this SQL:
SELECT count("Downloads".id) AS count_1, max("Downloads".time) AS max_1
FROM "Downloads" JOIN "Mirror" ON "Downloads".mirror = "Mirror".id
WHERE "Downloads".time >= :time_1 AND "Downloads".time <= :time_2
GROUP BY year("Downloads".time)
ORDER BY "Downloads".time
As you can see, it lacking only the correct grouping:
GROUP BY DAYOFYEAR( time ) , ( 60 * HOUR( time ) + MINUTE(time ))
Does SQLAlchemy have some function to group by minute?
You can use any SQL side function from SA by means of Functions, which you already use fr the YEAR part. I think in your case you just need to add (not tested):
from sqlalchemy.sql import func
...
# add another group_by to your existing query:
rows = ...
group_by(func.year(Download.time),
60 * func.HOUR(Download.time) + func.MINUTE(Download.time)
)

Grouping by week, and padding out 'missing' weeks

In my Django model, I've got a very simple model which represents a single occurrence of an event (such as a server alert occurring):
class EventOccurrence:
event = models.ForeignKey(Event)
time = models.DateTimeField()
My end goal is to produce a table or graph that shows how many times an event occurred over the past n weeks.
So my question has two parts:
How can I group_by the week of the time field?
How can I "pad out" the result of this group_by to add a zero-value for any missing weeks?
For example, for the second part, I'd like transform a result like this:
| week | count | | week | count |
| 2 | 3 | | 2 | 3 |
| 3 | 5 | —— becomes —> | 3 | 5 |
| 5 | 1 | | 4 | 0 |
| 5 | 1 |
What's the best way to do this in Django? General Python solutions are also OK.
Django's DateField as well as datetime doesn't support week attribute. To fetch everything in one query you need to do:
from django.db import connection
cursor = connection.cursor()
cursor.execute(" SELECT WEEK(`time`) AS 'week', COUNT(*) AS 'count' FROM %s GROUP BY WEEK(`time`) ORDER BY WEEK(`time`)" % EventOccurrence._meta.db_table, [])
data = []
results = cursor.fetchall()
for i, row in enumerate(results[:-1]):
data.append(row)
week = row[0] + 1
next_week = results[i+1][0]
while week < next_week:
data.append( (week, 0) )
week += 1
data.append( results[-1] )
print data
After digging django query api doc, I have don't found a way to make query through django ORM system. Cursor is a workaround, if your database brand is MySQL:
from django.db import connection, transaction
cursor = connection.cursor()
cursor.execute("""
select
week(time) as `week`,
count(*) as `count`
from EventOccurrence
group by week(time)
order by 1;""")
myData = dictfetchall(cursor)
This is, in my opinion, the best performance solution. But notice that this don't pad missing weeks.
EDITED Indepedent database brand solution via python (less performance)
If you are looking for database brand independece code then you should take dates day by day and aggregate it via python. If this is your case code may looks like:
#get all weeks:
import datetime
weeks = set()
d7 = datetime.timedelta( days = 7)
iterDay = datetime.date(2012,1,1)
while iterDay <= datetime.now():
weeks.add( iterDay.isocalendar()[1] )
iterDay += d7
#get all events
allEvents = EventOccurrence.objects.value_list( 'time', flat=True )
#aggregate event by week
result = dict()
for w in weeks:
result.setdefault( w ,0)
for e in allEvents:
result[ e.isocalendar()[1] ] += 1
(Disclaimer: not tested)
Since I have to query multiple tables by join them, I'm using db view to solve these requirements.
CREATE VIEW my_view
AS
SELECT
*, // <-- other fields goes here
YEAR(time_field) as year,
WEEK(time_field) as week
FROM my_table;
and the model as:
from django.db import models
class MyView(models.Model):
# other fields goes here
year = models.IntegerField()
week = models.IntegerField()
class Meta:
managed = False
db_table = 'my_view'
def query():
rows = MyView.objects.filter(week__range=[2, 5])
# to handle the rows
after get rows from this db view, use the way by #danihp to padding 0 for "hole" weeks/months.
NOTE: this is only tested for MySQL backend, I'm not sure if it's OK for MS SQL Server or other.

Categories

Resources