i have a dataframe like this:
tablename
columnname
t1
crd
t2
deb
t3
lon
...
...
and want to combine these 2 column into a query like this, (in Python)
sel crd from t1
union
select deb from t2
union
select lon from t3 ;
Thank you
Is it what you expect?
make_query = lambda x: f"select {x['columnname']} from {x['tablename']}"
qs = ' union '.join(df.apply(make_query, axis=1))
print(qs)
# Output
'select crd from t1 union select deb from t2 union select lon from t3'
I have some problems with a SQL for Python that I hope you can help me with - I'm trying to retrieve some data from wordpress/woocommerce.
My code:
cursor.execute("
SELECT t1.ID, t1.post_date, t2.meta_value AS first_name, t3.meta_value AS last_name
FROM test_posts t1
LEFT JOIN test_postmeta t2
ON t1.ID = t2.post_id
WHERE t2.meta_key = '_billing_first_name' and t2.post_id = t1.ID
LEFT JOIN test_postmeta t3
ON t1.ID = t3.post_id
WHERE t3.meta_key = '_billing_last_name' and t3.post_id = t1.ID
GROUP BY t1.ID
ORDER BY t1.post_date DESC LIMIT 20")
I'm getting the following error:
mysql.connector.errors.ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'LEFT JOIN test_postmeta t3 ON t1.ID = t3.post_id WHERE t3.meta_key = '_billing' at line 1
What am I doing wrong?
Thanks in advance.
There should be only 1 WHERE clause before GROUP BY.
But since you use LEFT joins, setting a condition on the right table like t2.meta_key = '_billing_first_name' you get an INNER join instead because you reject unmatched rows.
So set all the conditions in the ON clauses:
cursor.execute("
SELECT t1.ID, t1.post_date, t2.meta_value AS first_name, t3.meta_value AS last_name
FROM test_posts t1
LEFT JOIN test_postmeta t2
ON t1.ID = t2.post_id AND t2.meta_key = '_billing_first_name'
LEFT JOIN test_postmeta t3
ON t1.ID = t3.post_id AND t3.meta_key = '_billing_last_name'
GROUP BY t1.ID
ORDER BY t1.post_date DESC LIMIT 20")
Although this query may be syntactically correct for MySql, it does not make sense to use GROUP BY since you do not do any aggregation.
Your SQL syntax is incorrect. Try this:
cursor.execute("
SELECT t1.ID, t1.post_date, t2.meta_value AS first_name, t3.meta_value AS last_name
FROM test_posts t1
LEFT JOIN test_postmeta t2 ON t1.ID = t2.post_id
LEFT JOIN test_postmeta t3 ON t1.ID = t3.post_id
WHERE t3.meta_key = '_billing_last_name' and t2.meta_key = '_billing_first_name'
GROUP BY t1.ID
ORDER BY t1.post_date DESC LIMIT 20")
It might be worth reading a little bit about SQL Joins and WHERE statements.
I am trying to build a compound SQL query that builds a table from a join I have previously performed. (Using SqlAlchemy (Core part) with python3 and Postgresql 9.4)
I include here the relevant part of my python3 code. I first create "in_uuid_set" using a select with a group_by. Then I join "in_uuid_set" with "in_off_messages" to get "jn_in".
Finally, I try to build a new table "incoming" from "jn_in" by selecting and generating the wanted columns:
in_uuid_set = \
sa.select([in_off_messages.c.src_uuid.label('remote_uuid')])\
.select_from(in_off_messages)\
.where(in_off_messages.c.dst_uuid == local_uuid)\
.group_by(in_off_messages.c.src_uuid)\
.alias()
jn_in = in_uuid_set.join(in_off_messages,\
and_(\
in_off_messages.c.src_uuid == in_uuid_set.c.remote_uuid,\
in_off_messages.c.dst_uuid == local_uuid,\
))\
.alias()
incoming = sa.select([\
in_off_messages.c.msg_uuid.label('msg_uuid'),\
in_uuid_set.c.remote_uuid.label('remote_uuid'),\
in_off_messages.c.msg_type.label('msg_type'),\
in_off_messages.c.date_sent.label('date_sent'),\
in_off_messages.c.content.label('content'),\
in_off_messages.c.was_read.label('was_read'),\
true().label('is_incoming')]
)\
.select_from(jn_in)
Surprisingly, I get that "incoming" has more rows than "jn_in". "incoming" has 12 rows, while "jn_in" has only 2 rows. I expect that "incoming" will have the same amount of rows (2) as "jn_in".
I also include here the SQL output the SqlAlchemy generates for "incoming":
SELECT in_off_messages.msg_uuid AS msg_uuid,
anon_1.remote_uuid AS remote_uuid,
in_off_messages.msg_type AS msg_type,
in_off_messages.date_sent AS date_sent,
in_off_messages.content AS content,
in_off_messages.was_read AS was_read,
1 AS is_incoming
FROM in_off_messages,
(SELECT in_off_messages.src_uuid AS remote_uuid
FROM in_off_messages
WHERE in_off_messages.dst_uuid = :dst_uuid_1
GROUP BY in_off_messages.src_uuid) AS anon_1,
(SELECT anon_1.remote_uuid AS anon_1_remote_uuid,
in_off_messages.msg_uuid AS in_off_messages_msg_uuid,
in_off_messages.orig_src_uuid AS in_off_messages_orig_src_uuid,
in_off_messages.src_uuid AS in_off_messages_src_uuid,
in_off_messages.dst_uuid AS in_off_messages_dst_uuid,
in_off_messages.msg_type AS in_off_messages_msg_type,
in_off_messages.date_sent AS in_off_messages_date_sent,
in_off_messages.content AS in_off_messages_content,
in_off_messages.was_read AS in_off_messages_was_read
FROM (SELECT in_off_messages.src_uuid AS remote_uuid
FROM in_off_messages
WHERE in_off_messages.dst_uuid = :dst_uuid_1
GROUP BY in_off_messages.src_uuid) AS anon_1
JOIN in_off_messages
ON in_off_messages.src_uuid = anon_1.remote_uuid
AND in_off_messages.dst_uuid = :dst_uuid_2) AS anon_2
Something doesn't look right for me with this SQL output, mostly because I see GROUP BY too many times. I would have expected it to show up about once, but it seems like it shows up twice here.
My guesses is that somehow some braces went out of place (In the generated SQL). I also suspect that I did something wrong with the alias() thing, though I'm not sure about it.
What should I do to get the wanted result (Same amount of rows for "jn_in" and "incoming")?
After playing with the code for a while, I found a way to fix it.
The answer was eventually related to the alias().
In order to make this work, the second alias() (Of jn_in) should be omitted, like this:
in_uuid_set = \
sa.select([in_off_messages.c.src_uuid.label('remote_uuid')])\
.select_from(in_off_messages)\
.where(in_off_messages.c.dst_uuid == local_uuid)\
.group_by(in_off_messages.c.src_uuid)\
.alias()
jn_in = in_uuid_set.join(in_off_messages,\
and_(\
in_off_messages.c.src_uuid == in_uuid_set.c.remote_uuid,\
in_off_messages.c.dst_uuid == local_uuid,\
))
# <<< The alias() is gone >>>
incoming = sa.select([\
in_off_messages.c.msg_uuid.label('msg_uuid'),\
in_uuid_set.c.remote_uuid.label('remote_uuid'),\
in_off_messages.c.msg_type.label('msg_type'),\
in_off_messages.c.date_sent.label('date_sent'),\
in_off_messages.c.content.label('content'),\
in_off_messages.c.was_read.label('was_read'),\
true().label('is_incoming')]
)\
.select_from(jn_in)
It seems, however, that the first alias() (of in_uuid_set) can not be ommited. If I try to omit it, I get this error message:
E subquery in FROM must have an alias
E LINE 2: FROM (SELECT in_off_messages.src_uuid AS remote_uuid
E ^
E HINT: For example, FROM (SELECT ...) [AS] foo.
As a generalization of this, probably if you have a select that you want to put as a clause somewhere else, then you want to alias() it, however if you have a join that you want to put as a clause, you should not alias() it.
For the sake of completeness, I include here the resulting SQL of the new code:
SELECT in_off_messages.msg_uuid AS msg_uuid,
anon_1.remote_uuid AS remote_uuid,
in_off_messages.msg_type AS msg_type,
in_off_messages.date_sent AS date_sent,
in_off_messages.content AS content,
in_off_messages.was_read AS was_read,
1 AS is_incoming
FROM (SELECT in_off_messages.src_uuid AS remote_uuid
FROM in_off_messages
WHERE in_off_messages.dst_uuid = :dst_uuid_1
GROUP BY in_off_messages.src_uuid) AS anon_1
JOIN in_off_messages
ON in_off_messages.src_uuid = anon_1.remote_uuid
AND in_off_messages.dst_uuid = :dst_uuid_2
Much shorter than the one at the question.
Even so how to solve this problem:
cursor.execute("""SELECT * FROM Users AS t1
INNER JOIN Users_has_Users AS t ON t.Users_id = t1.id
INNER JOIN Users AS t2 ON t.Users_id1 = t2.id
WHERE t1.email = %s AND t1.id != t2.id AND t2.id >= %s
ORDER BY t2.name {}
LIMIT 10""".format(order), (email, since_id, limit))
Error:
not all arguments converted during string formatting
You cannot use SQL parameters to interpolate anything other than data; you cannot use it for any SQL keywords such as ASC, nor the limit parameter. That is the point of SQL parameters; to avoid their values from being interpreted as SQL instead.
Use string formatting to interpolate your sort direction and query limit instead:
cursor.execute("""SELECT * FROM Users AS t1
INNER JOIN Users_has_Users AS t ON t.Users_id = t1.id
INNER JOIN Users AS t2 ON t.Users_id1 = t2.id
WHERE t1.email = %s AND t1.id != t2.id AND t2.id >= %s
ORDER BY t2.name {}
LIMIT {}""".format(order, limit), (email, since_id))
This does assume that you have full control over the contents of order and limit; never set it from user-supplied data as string formatting like this would open you up to a SQL injection attack otherwise.
I am trying to fetch in a single query a fixed set of rows, plus some other rows found by a subquery. My problem is that the query generated by my SQLAlchemy code is incorrect.
The problem is that the query generated by SQLAlchemy is as follows:
SELECT tbl.id AS tbl_id
FROM tbl
WHERE tbl.id IN
(
SELECT t2.id AS t2_id
FROM tbl AS t2, tbl AS t1
WHERE t2.id =
(
SELECT t3.id AS t3_id
FROM tbl AS t3, tbl AS t1
WHERE t3.id < t1.id ORDER BY t3.id DESC LIMIT 1 OFFSET 0
)
AND t1.id IN (4, 8)
)
OR tbl.id IN (0, 8)
while the correct query should not have the second tbl AS t1 (the goal from this query is to select IDs 0 and 8, as well as the IDs just before 4 and 8).
Unfortunately, I can't find how to get SQLAlchemy to generate the correct one (see the code below).
Suggestions to also achieve the same result with a simpler query are also welcome (they need to be efficient though -- I tried a few variants and some were a lot slower on my real use case).
The code producing the query:
from sqlalchemy import create_engine, or_
from sqlalchemy import Column, Integer, MetaData, Table
from sqlalchemy.orm import sessionmaker
engine = create_engine('sqlite:///:memory:', echo=True)
meta = MetaData(bind=engine)
table = Table('tbl', meta, Column('id', Integer))
session = sessionmaker(bind=engine)()
meta.create_all()
# Insert IDs 0, 2, 4, 6, 8.
i = table.insert()
i.execute(*[dict(id=i) for i in range(0, 10, 2)])
print session.query(table).all()
# output: [(0,), (2,), (4,), (6,), (8,)]
# Subquery of interest: look for the row just before IDs 4 and 8.
sub_query_txt = (
'SELECT t2.id '
'FROM tbl t1, tbl t2 '
'WHERE t2.id = ( '
' SELECT t3.id from tbl t3 '
' WHERE t3.id < t1.id '
' ORDER BY t3.id DESC '
' LIMIT 1) '
'AND t1.id IN (4, 8)')
print session.execute(sub_query_txt).fetchall()
# output: [(2,), (6,)]
# Full query of interest: get the rows mentioned above, as well as more rows.
query_txt = (
'SELECT * '
'FROM tbl '
'WHERE ( '
' id IN (%s) '
'OR id IN (0, 8))'
) % sub_query_txt
print session.execute(query_txt).fetchall()
# output: [(0,), (2,), (6,), (8,)]
# Attempt at an SQLAlchemy translation (from innermost sub-query to full query).
t1 = table.alias('t1')
t2 = table.alias('t2')
t3 = table.alias('t3')
q1 = session.query(t3.c.id).filter(t3.c.id < t1.c.id).order_by(t3.c.id.desc()).\
limit(1)
q2 = session.query(t2.c.id).filter(t2.c.id == q1, t1.c.id.in_([4, 8]))
q3 = session.query(table).filter(
or_(table.c.id.in_(q2), table.c.id.in_([0, 8])))
print list(q3)
# output: [(0,), (6,), (8,)]
What you are missing is a correlation between the innermost sub-query and the next level up; without the correlation, SQLAlchemy will include the t1 alias in the innermost sub-query:
>>> print str(q1)
SELECT t3.id AS t3_id
FROM tbl AS t3, tbl AS t1
WHERE t3.id < t1.id ORDER BY t3.id DESC
LIMIT ? OFFSET ?
>>> print str(q1.correlate(t1))
SELECT t3.id AS t3_id
FROM tbl AS t3
WHERE t3.id < t1.id ORDER BY t3.id DESC
LIMIT ? OFFSET ?
Note that tbl AS t1 is now missing from the query. From the .correlate() method documentation:
Return a Query construct which will correlate the given FROM clauses to that of an enclosing Query or select().
Thus, t1 is assumed to be part of the enclosing query, and isn't listed in the query itself.
Now your query works:
>>> q1 = session.query(t3.c.id).filter(t3.c.id < t1.c.id).order_by(t3.c.id.desc()).\
... limit(1).correlate(t1)
>>> q2 = session.query(t2.c.id).filter(t2.c.id == q1, t1.c.id.in_([4, 8]))
>>> q3 = session.query(table).filter(
... or_(table.c.id.in_(q2), table.c.id.in_([0, 8])))
>>> print list(q3)
2012-10-24 22:16:22,239 INFO sqlalchemy.engine.base.Engine SELECT tbl.id AS tbl_id
FROM tbl
WHERE tbl.id IN (SELECT t2.id AS t2_id
FROM tbl AS t2, tbl AS t1
WHERE t2.id = (SELECT t3.id AS t3_id
FROM tbl AS t3
WHERE t3.id < t1.id ORDER BY t3.id DESC
LIMIT ? OFFSET ?) AND t1.id IN (?, ?)) OR tbl.id IN (?, ?)
2012-10-24 22:16:22,239 INFO sqlalchemy.engine.base.Engine (1, 0, 4, 8, 0, 8)
[(0,), (2,), (6,), (8,)]
I'm only kinda sure I understand the query you're asking for. Lets break it down, though:
the goal from this query is to select IDs 0 and 8, as well as the IDs just before 4 and 8.
It looks like you want to query for two kinds of things, and then combine them. The proper operator for that is union. Do the simple queries and add them up at the end. I'll start with the second bit, "ids just before X".
To start with; lets look at the all the ids that are before some given value. For this, we'll join the table on itself with a <:
# select t1.id t1_id, t2.id t2_id from tbl t1 join tbl t2 on t1.id < t2.id;
t1_id | t2_id
-------+-------
0 | 2
0 | 4
0 | 6
0 | 8
2 | 4
2 | 6
2 | 8
4 | 6
4 | 8
6 | 8
(10 rows)
That certainly gives us all of the pairs of rows where the left is less than the right. Of all of them, we want the rows for a given t2_id that is as high as possible; We'll group by t2_id and select the maximum t1_id
# select max(t1.id), t2.id from tbl t1 join tbl t2 on t1.id < t2.id group by t2.id;
max | id
-----+-------
0 | 2
2 | 4
4 | 6
6 | 8
(4 rows)
Your query, using a limit, could achieve this, but its usually a good idea to avoid using this technique when alternatives exist because partitioning does not have good, portable support across Database implementations. Sqlite can use this technique, but postgresql doesn't like it, it uses a technique called "analytic queries" (which are both standardised and more general). MySQL can do neither. The above query, though, works consistently across all sql database engines.
the rest of the work is just using in or other equivalent filtering queries and are not difficult to express in sqlalchemy. The boilerplate...
>>> import sqlalchemy as sa
>>> from sqlalchemy.orm import Query
>>> engine = sa.create_engine('sqlite:///:memory:')
>>> meta = sa.MetaData(bind=engine)
>>> table = sa.Table('tbl', meta, sa.Column('id', sa.Integer))
>>> meta.create_all()
>>> table.insert().execute([{'id':i} for i in range(0, 10, 2)])
>>> t1 = table.alias()
>>> t2 = table.alias()
>>> before_filter = [4, 8]
First interesting bit is we give the 'max(id)' expression a name. this is needed so that we can refer to it more than once, and to lift it out of a subquery.
>>> c1 = sa.func.max(t1.c.id).label('max_id')
>>> # ^^^^^^
The 'heavy lifting' portion of the query, join the above aliases, group and select the max
>>> q1 = Query([c1, t2.c.id]) \
... .join((t2, t1.c.id < t2.c.id)) \
... .group_by(t2.c.id) \
... .filter(t2.c.id.in_(before_filter))
Because we'll be using a union, we need this to produce the right number of fields: we wrap it in a subquery and project down to the only column we're interested in. This will have the name we gave it in the above label() call.
>>> q2 = Query(q1.subquery().c.max_id)
>>> # ^^^^^^
The other half of the union is much simpler:
>>> t3 = table.alias()
>>> exact_filter = [0, 8]
>>> q3 = Query(t3).filter(t3.c.id.in_(exact_filter))
All that's left is to combine them:
>>> q4 = q2.union(q3)
>>> engine.execute(q4.statement).fetchall()
[(0,), (2,), (6,), (8,)]
The responses here helped me fix my issue but in my case I had to use both correlate() and subquery():
# ...
subquery = subquery.correlate(OuterCorrelationTable).subquery()
filter_query = db.session.query(func.sum(subquery.c.some_count_column))
filter = filter_query.as_scalar() == as_many_as_some_param
# ...
final_query = db.session.query(OuterCorrelationTable).filter(filter)