I m trying to create a method where I can pass a parameter (a number) and get the number as my number of output. See below:
def get_data(i):
for i in range(0,i):
TNG = "SELECT DISTINCT hub, value, date_inserted FROM ZE_DATA.AESO_CSD_SUMMARY where opr_date >= trunc(sysdate) order by date_inserted desc fetch first i rows only"
Where i is a number. Inside the query "fetch first i rows only" , i want it to query i number of rows.
Thoughts on the syntax?
Seems like you're looking for a limit argument. You didn't mention what type of SQL you're using, but here are a couple of examples for various SQL languages.
I'm also a little confused by the structure of that function, seems like you may want to query the result set, then iterate through it rather than query the result set i number of times.
Related
Problem Summary:
I'm using Python to send a series of queries to a database (one by one) from a loop until a non-empty result set is found. The query has three conditions that must be met and they're placed in a where statement. Every iteration of the loop changes and manipulates the conditions from a specific condition to a more generic one.
Details:
Assuming the conditions are keywords based on a pre-made list ordered by accuracy such as:
Option KEYWORD1 KEYWORD2 KEYWORD3
1 exact exact exact # most accurate!
2 generic exact exact # accurate
3 generic generic exact # close enough
4 generic generic generic # close
5 generic+ generic generic # almost there
.... and so on.
On the database side, I have a description column that should contain all the three keywords either in their specific form or a generic form. When I run the loop in python this is what actually happens:
-- The first sql statement will be like
Select *
From MyTable
Where Description LIKE 'keyword1-exact$'
AND Description LIKE 'keyword2-exact%'
AND Description LIKE 'keyword3-exact%'
-- if no results, the second sql statement will be like
Select *
From MyTable
Where Description LIKE 'keyword1-generic%'
AND Description LIKE 'keyword2-exact%'
AND Description LIKE 'keyword3-exact%'
-- if no results, the third sql statement will be like
Select *
From MyTable
Where Description LIKE 'keyword1-generic%'
AND Description LIKE 'keyword2-generic%'
AND Description LIKE 'keyword3-exact%'
-- and so on until a non-empty result set is found or all keywords were used
I'm using the approach above to get the most accurate results with the minimum amount of irrelevant ones (the more generic the keywords, the more irrelevant results will show up and they will need additional processin)
Question:
My approach above is doing exactly what I want but I'm sure that it's not efficient.
What would be the proper way to do this operation in a query instead of Python loop (knowing that I only have a read access to the database so I can't store procedures)?
Here is an idea
select top 1
*
from
(
select
MyTable.*,
accuracy = case when description like keyword1 + '%'
and description like keyword2 + '%'
and description like keyword3 + '%'
then accuracy
end
-- an example of data from MyTable
from (select description = 'exact') MyTable
cross join
(values
-- generate full list like this in python
-- or read it from a table if it is in database
(1, ('exact'), ('exact'), ('exact')),
(2, ('generic'), ('exact'), ('exact')),
(3, ('generic'), ('generic'), ('exact'))
) t(accuracy, keyword1, keyword2, keyword3)
) t
where accuracy is not null
order by accuracy
I would not do a loop over database queries. Instead I would search for the least specific, i.e. most generic, keyword and return all these rows.
Select *
From MyTable
Where Description LIKE '%iPhone%'
This returns all the rows with iPhones. Now do the further processing, i.e. find the best match, in memory. This is much faster than multiple queries.
If you have several equally most generic keywords, then query them with OR
Select *
From MyTable
Where Description LIKE '%iPhone%' OR
Description LIKE '%i-Phone%'
But any case make only one query.
Please try using RegEx regular expression functionality of Sql server.
Or else you can try impoting re in python for regular expression.
First you can collect the data and then try re to achieve you r goal.
Hope this is helpful.
I have the following query:
self.cursor.execute("SELECT platform_id_episode, title, from table WHERE asset_type='movie'")
Is there a way to get the number of results returned directly? Currently I am doing the inefficient:
r = self.cursor.fetchall()
num_results = len(r)
If you don't actually need the results,* don't ask MySQL for them; just use COUNT:**
self.cursor.execute("SELECT COUNT(*) FROM table WHERE asset_type='movie'")
Now, you'll get back one row, with one column, whose value is the number of rows your other query would have returns.
Notice that I ignored your specific columns and just did COUNT(*). A COUNT(platform_id_episode) would also be legal, but it means the number of found rows with non-NULL platform_id_episode values; COUNT(*) is the number of found rows full stop.***
* If you do need the results… well, you have to call fetchall() or equivalent to get them, so I don't see the problem.
** If you've never used aggregate functions in SQL before, make sure to look over some of the examples on that page; you've probably never realized you can do things like that so simply (and efficiently).
*** If someone taught you "never use * in a SELECT", well, that's good advice, but it's not relevant here. The problem with SELECT * is that it spams all of the columns, in random order, across your result set, instead of the columns you actually need in the order you need. SELECT COUNT(*) doesn't do that.
On mysql I would enter the following query, but running the same on google BigQuery throws an error for the upper limit. How do I specify limits on a query? Say I have a query that returns 20 results and I want results between 5 and 10 only, how should I frame the query on Google BigQuery?)
For example:
SELECT id,
COUNT(total) AS total
FROM ABC.data
GROUP BY id
ORDER BY count DESC
LIMIT 5,10;
If I only put "LIMIT 5" on the end of the query, I get the top 5 and if I put "LIMIT 10" I ge t the top 10, but what syntax do I use to get between 5 and 10.
Could someone please shed some light on this?
Any help is much appreciated.
Thanks and have a great day.
I would use window functions...
something like
select * from
(Select id, total, row_number() over (order by total desc) as rnb
from
(SELECT id,
COUNT(total) AS total
FROM ABC.data
GROUP BY id
))
where rnb>=5 and rnb<=10
The windowing function answer is a good one, but I thought I'd give another option that involves how your result is fetched rather than how the query is run.
If you only need the first N rows you can add a LIMIT N to your query. But if you don't need the first M rows, you can change how you fetch the results. If you're using the the java API, you can use the setStartIndex() method on either the TableData.list() or the Jobs.getQueryResults() call to only fetch rows starting from a particular index.
That question makes no sense to an ever changing dataset. if you have a 1 second delay between when you ask for the first 5 and the next 5... the data could have changed. It's order is now different and you will miss data or get duplicate results. So databases like BigTable have a method for doing one query of the data and giving you the resultset to you in groups. If that were the case: What you are looking for is called query cursors. I can't say this any better than their own example so [Here is the documentation on them.][1]
But since you said the data does not change then fetch() will work just fine. fetch() has 2 options you will want to take note of limit and offset. 'limit' is the maximum number of results to return. If set to None, all available results will be retrieved. 'offset' is how many results to skip.
Check out other options here: https://developers.google.com/appengine/docs/python/datastore/queryclass#Query_fetch
I have a general ledger table in my DB with the columns: member_id, is_credit and amount. I want to get the current balance of the member.
Ideally that can be got by two queries where the first query has is_credit == True and the second query is_credit == False something close to:
credit_amount = session.query(func.sum(Funds.amount).label('Debit_Amount')).filter(Funds.member_id==member_id, Funds.is_credit==True)
debit_amount = session.query(func.sum(Funds.amount).label('Debit_Amount')).filter(Funds.member_id==member_id, Funds.is_credit==False)
balance = credit_amount - debit_amount
and then subtract the result. Is there a way to have the above run in one query to give the balance?
From the comments you state that hybrids are too advanced right now, so I will propose an easier but not as efficient solution (still its okay):
(session.query(Funds.is_credit, func.sum(Funds.amount).label('Debit_Amount')).
filter(Funds.member_d==member_id).group_by(Funds.is_credit))
What will this do? You will recieve a two-row result, one has the credit, the other the debit, depending on the is_credit property of the result. The second part (Debit_Amount) will be the value. You then evaluate them to get the result: Only one query that fetches both values.
If you are unsure what group_by does, I recommend you read up on SQL before doing it in SQLAlchemy. SQLAlchemy offers very easy usage of SQL but it requires that you understand SQL as well. Thus, I recommend: First build a query in SQL and see that it does what you want - then translate it to SQLAlchemy and see that it does the same. Otherwise SQLAlchemy will often generate highly inefficient queries, because you asked for the wrong thing.
Using SQLAlchemy, I have a one to many relation with two tables - users and scores. I am trying to query the top 10 users sorted by their aggregate score over the past X amount of days.
users:
id
user_name
score
scores:
user
score_amount
created
My current query is:
top_users = DBSession.query(User).options(eagerload('scores')).filter_by(User.scores.created > somedate).order_by(func.sum(User.scores).desc()).all()
I know this is clearly not correct, it's just my best guess. However, after looking at the documentation and googling I cannot find an answer.
EDIT:
Perhaps it would help if I sketched what the MySQL query would look like:
SELECT user.*, SUM(scores.amount) as score_increase
FROM user LEFT JOIN scores ON scores.user_id = user.user_id
WITH scores.created_at > someday
ORDER BY score_increase DESC
The single-joined-row way, with a group_by added in for all user columns although MySQL will let you group on just the "id" column if you choose:
sess.query(User, func.sum(Score.amount).label('score_increase')).\
join(User.scores).\
filter(Score.created_at > someday).\
group_by(User).\
order_by("score increase desc")
Or if you just want the users in the result:
sess.query(User).\
join(User.scores).\
filter(Score.created_at > someday).\
group_by(User).\
order_by(func.sum(Score.amount))
The above two have an inefficiency in that you're grouping on all columns of "user" (or you're using MySQL's "group on only a few columns" thing, which is MySQL only). To minimize that, the subquery approach:
subq = sess.query(Score.user_id, func.sum(Score.amount).label('score_increase')).\
filter(Score.created_at > someday).\
group_by(Score.user_id).subquery()
sess.query(User).join((subq, subq.c.user_id==User.user_id)).order_by(subq.c.score_increase)
An example of the identical scenario is in the ORM tutorial at: http://docs.sqlalchemy.org/en/latest/orm/tutorial.html#selecting-entities-from-subqueries
You will need to use a subquery in order to compute the aggregate score for each user. Subqueries are described here: http://www.sqlalchemy.org/docs/05/ormtutorial.html?highlight=subquery#using-subqueries
I am assuming the column (not the relation) you're using for the join is called Score.user_id, so change it if this is not the case.
You will need to do something like this:
DBSession.query(Score.user_id, func.sum(Score.score_amount).label('total_score')).group_by(Score.user_id).filter(Score.created > somedate).order_by('total_score DESC')[:10]
However this will result in tuples of (user_id, total_score). I'm not sure if the computed score is actually important to you, but if it is, you will probably want to do something like this:
users_scores = []
q = DBSession.query(Score.user_id, func.sum(Score.score_amount).label('total_score')).group_by(Score.user_id).filter(Score.created > somedate).order_by('total_score DESC')[:10]
for user_id, total_score in q:
user = DBSession.query(User)
users_scores.append((user, total_score))
This will result in 11 queries being executed, however. It is possible to do it all in a single query, but due to various limitations in SQLAlchemy, it will likely create a very ugly multi-join query or subquery (dependent on engine) and it won't be very performant.
If you plan on doing something like this often and you have a large amount of scores, consider denormalizing the current score onto the user table. It's more work to upkeep, but will result in a single non-join query like:
DBSession.query(User).order_by(User.computed_score.desc())
Hope that helps.