How to use another method to achieve mysql left join? - python

Because MySQL Left join limited 61, maybe this is table:
SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;
-- ----------------------------
-- Table structure for test
-- ----------------------------
DROP TABLE IF EXISTS `test`;
CREATE TABLE `test` (
`id` int(11) DEFAULT NULL,
`pid` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
-- ----------------------------
-- Records of test
-- ----------------------------
BEGIN;
INSERT INTO `test` VALUES (1, 3);
INSERT INTO `test` VALUES (1, 4);
INSERT INTO `test` VALUES (2, 4);
INSERT INTO `test` VALUES (3, 5);
INSERT INTO `test` VALUES (3, 6);
COMMIT;
SET FOREIGN_KEY_CHECKS = 1;
This is MySQL SQL:
SELECT
t1.pid AS lev1,
t2.pid AS lev2,
t3.pid AS lev3,
t4.pid AS lev4
FROM
test AS t1
LEFT JOIN test AS t2 ON ( t2.id = t1.pid )
LEFT JOIN test AS t3 ON ( t3.id = t2.pid )
LEFT JOIN test AS t4 ON ( t4.id = t3.pid )
LEFT JOIN test AS t5 ON ( t5.id = t4.pid )
WHERE t1.id = 1 ;
I want to output like this but not using MYSQL LEFT JOIN:
lev1,lev2,lev3,lev4
3, 6,
3, 5,
4, ,
If python can achieve, I also need!

The limit of 61 is there for a reason and your query might run somewhat slowly, but something like the following should work for your needs:
SELECT
t1.id AS lev1, t2.id AS lev2, t3.id AS lev3, t4.id AS lev4
FROM
t1, t2, t3, t4
WHERE
t2.pid = t1.id AND t3.pid = t1.id AND t4.pid = t1.id

Related

Select from a query with peewee

I have some troubles implementing the following query with peewee:
SELECT *
FROM (
SELECT datas.*, (rank() over(partition by tracking_id order by date_of_data DESC)) as rank_result
FROM datas
WHERE tracking_id in (1, 2, 3, 4, 5, 6)
)
WHERE rank_result < 3;
I have tried to do the following:
subquery = (Datas.select(Datas.tracking, Datas.value, Datas.date_of_data,
fn.rank().over(partition_by=[Datas.tracking],
order_by=[Datas.date_of_data.desc()]).alias('rank'))
.where(Datas.tracking.in_([1, 2, 3, 4, 5, 6])))
result = (Datas.select()
.from_(subquery)
.where(SQL('rank') < 3))
but since I'm doing "Model.select()" i'm getting all the fields in the SQL SELECT which i don't want and which doesn't make my query work.
Here is the schema of my table:
CREATE TABLE IF NOT EXISTS "datas"
(
"id" INTEGER NOT NULL PRIMARY KEY,
"tracking_id" INTEGER NOT NULL,
"value" INTEGER NOT NULL,
"date_of_data" DATETIME NOT NULL,
FOREIGN KEY ("tracking_id") REFERENCES "follower" ("id")
);
CREATE INDEX "datas_tracking_id" ON "datas" ("tracking_id");
Thanks!
You probably want to use the .select_from() method on the subquery:
subq = (Datas.select(Datas.tracking, Datas.value, Datas.date_of_data,
fn.rank().over(partition_by=[Datas.tracking],
order_by=[Datas.date_of_data.desc()]).alias('rank'))
.where(Datas.tracking.in_([1, 2, 3, 4, 5, 6])))
result = subq.select_from(
subq.c.tracking, subq.c.value, subq.c.date_of_data,
subq.c.rank).where(subq.c.rank < 3)
Produces:
SELECT "t1"."tracking", "t1"."value", "t1"."date_of_data", "t1"."rank"
FROM (
SELECT "t2"."tracking",
"t2"."value",
"t2"."date_of_data",
rank() OVER (
PARTITION BY "t2"."tracking"
ORDER BY "t2"."date_of_data" DESC) AS "rank"
FROM "datas" AS "t2"
WHERE ("t2"."tracking" IN (?, ?, ?, ?, ?, ?))) AS "t1"
WHERE ("t1"."rank" < ?)

Multiple insertion of one value in sqlalchemy statement to pandas

I have constructed a sql clause where I reference the same table as a and b to compare the two geometries as a postgis command.
I would like to pass a value into the sql statement using the %s operator and read the result into a pandas dataframe using to_sql, params kwargs. Currently my code will allow for one value to be passed to one %s but i'm looking for multiple insertions of the same list of values.
I'm connecting to a postgresql database using psycopg2.
Simplified code is below
sql = """
SELECT
st_distance(a.the_geom, b.the_geom, true) AS dist
FROM
(SELECT
table.*
FROM table
WHERE id in %s) AS a,
(SELECT
table.*
FROM table
WHERE id in %s) AS b
WHERE a.nid <> b.nid """
sampList = (14070,11184)
df = pd.read_sql(sql, con=conn, params = [sampList])
Basically i'm looking to replace both %s with the sampList value in both places. The code as written will only replace the first value indicating ': list index out of range. If I adjust to having one %s and replacing the second in statement with numbers the code runs, but ultimately I would like away to repeat those values.
You dont need the subqueries, just join the table with itself:
SELECT a.*, b.* -- or whatwever
, st_distance(a.the_geom, b.the_geom, true) AS dist
FROM ztable a
JOIN ztable b ON a.nid < b.nid
WHERE a.id IN (%s)
AND b.id IN (%s)
;
avoid repetition by using a CTE (this may be non-optimal, performance-wise)
WITH zt AS (
SELECT * FROM ztable
WHERE id IN (%s)
)
SELECT a.*, b.* -- or whatever
, st_distance(a.the_geom, b.the_geom, true) AS dist
FROM zt a
JOIN zt b ON a.nid < b.nid
;
Performance-wise, I would just stick to the first version, and supply the list-argument twice. (or refer to it twice, using a FORMAT() construct)
first of all i would recommend you to use updated SQL from #wildplasser - it's much better and more efficient way to do that.
now you can do the following:
sql_ = """\
WITH zt AS (
SELECT * FROM ztable
WHERE id IN ({})
)
SELECT a.*, b.* -- or whatever
, st_distance(a.the_geom, b.the_geom, true) AS dist
FROM zt a
JOIN zt b ON a.nid < b.nid
"""
sampList = (14070,11184)
sql = sql_.format(','.join(['?' for x in sampList]))
df = pd.read_sql(sql, con=conn, params=sampList)
dynamically generated SQL with parameters (AKA: prepared statements, bind variables, etc.):
In [27]: print(sql)
WITH zt AS (
SELECT * FROM ztable
WHERE id IN (?,?)
)
SELECT a.*, b.* -- or whatever
, st_distance(a.the_geom, b.the_geom, true) AS dist
FROM zt a
JOIN zt b ON a.nid < b.nid

The same query returns different results in the shell and in code

Previously, I stumbled across one interesting thing in Oracle -
Oracle: ORA-01722: invalid number. It turned out to be Oracle's natural behaviour (though different from other major databases I dealt before - MySQL, Postgres and SQLite). But now I see another counterintuitive thing - I have a very simple query which returns results in the shell, but returns nothing from Python code. This is the query:
SELECT * FROM TEST_TABLE T0
INNER JOIN TEST_TABLE_2 T1 ON T1.ATTR=T0.ID
INNER JOIN TEST_TABLE_3 T2 ON T2.ID = T1.ID
As you can see, it's a very simple query with just two simple joins. And here is voodoo magic screencast:
So, as you can see in the shell it returns data. Here is another picture of the ghost:
Now you see that in Python code it returns nothing. However, it does return if we tune this query a little bit - just remove the second join:
So, what is wrong with all that? And how can I trust Oracle? (now it just seems to me that I can rely more on a file database like SQLite, then on Oracle).
EDIT
Below is schema with data:
SQL> SELECT COLUMN_NAME, DATA_TYPE FROM USER_TAB_COLUMNS
WHERE TABLE_NAME = 'TEST_TABLE';
COLUMN_NAME
------------------------------
DATA_TYPE
------------------------------
ID
NUMBER
SQL> SELECT * FROM TEST_TABLE;
ID
----------
1
SQL> SELECT COLUMN_NAME, DATA_TYPE FROM USER_TAB_COLUMNS
WHERE TABLE_NAME = 'TEST_TABLE_2';
COLUMN_NAME
------------------------------
DATA_TYPE
------------------------------
ID
NUMBER
TXT
VARCHAR2
ATTR
NUMBER
SQL> SELECT * FROM TEST_TABLE_2;
ID
----------
TXT
-----------------------------------
ATTR
----------
2
hello
1
SQL> SELECT COLUMN_NAME, DATA_TYPE FROM USER_TAB_COLUMNS
WHERE TABLE_NAME = 'TEST_TABLE_3';
COLUMN_NAME
------------------------------
DATA_TYPE
------------------------------
ID
NUMBER
SQL> SELECT * FROM TEST_TABLE_3;
ID
----------
2
EDIT
To be more precise, I created my three tables with these statements:
CREATE TABLE test_table(id number(19) default 0 not null)
CREATE TABLE test_table_2(txt varchar(255),id number(19) default 0 not null,attr number(19) default 0 not null)
CREATE TABLE test_table_3(id number(19) default 0 not null)

How to specify the FROM tables in SQLAlchemy subqueries?

I am trying to fetch in a single query a fixed set of rows, plus some other rows found by a subquery. My problem is that the query generated by my SQLAlchemy code is incorrect.
The problem is that the query generated by SQLAlchemy is as follows:
SELECT tbl.id AS tbl_id
FROM tbl
WHERE tbl.id IN
(
SELECT t2.id AS t2_id
FROM tbl AS t2, tbl AS t1
WHERE t2.id =
(
SELECT t3.id AS t3_id
FROM tbl AS t3, tbl AS t1
WHERE t3.id < t1.id ORDER BY t3.id DESC LIMIT 1 OFFSET 0
)
AND t1.id IN (4, 8)
)
OR tbl.id IN (0, 8)
while the correct query should not have the second tbl AS t1 (the goal from this query is to select IDs 0 and 8, as well as the IDs just before 4 and 8).
Unfortunately, I can't find how to get SQLAlchemy to generate the correct one (see the code below).
Suggestions to also achieve the same result with a simpler query are also welcome (they need to be efficient though -- I tried a few variants and some were a lot slower on my real use case).
The code producing the query:
from sqlalchemy import create_engine, or_
from sqlalchemy import Column, Integer, MetaData, Table
from sqlalchemy.orm import sessionmaker
engine = create_engine('sqlite:///:memory:', echo=True)
meta = MetaData(bind=engine)
table = Table('tbl', meta, Column('id', Integer))
session = sessionmaker(bind=engine)()
meta.create_all()
# Insert IDs 0, 2, 4, 6, 8.
i = table.insert()
i.execute(*[dict(id=i) for i in range(0, 10, 2)])
print session.query(table).all()
# output: [(0,), (2,), (4,), (6,), (8,)]
# Subquery of interest: look for the row just before IDs 4 and 8.
sub_query_txt = (
'SELECT t2.id '
'FROM tbl t1, tbl t2 '
'WHERE t2.id = ( '
' SELECT t3.id from tbl t3 '
' WHERE t3.id < t1.id '
' ORDER BY t3.id DESC '
' LIMIT 1) '
'AND t1.id IN (4, 8)')
print session.execute(sub_query_txt).fetchall()
# output: [(2,), (6,)]
# Full query of interest: get the rows mentioned above, as well as more rows.
query_txt = (
'SELECT * '
'FROM tbl '
'WHERE ( '
' id IN (%s) '
'OR id IN (0, 8))'
) % sub_query_txt
print session.execute(query_txt).fetchall()
# output: [(0,), (2,), (6,), (8,)]
# Attempt at an SQLAlchemy translation (from innermost sub-query to full query).
t1 = table.alias('t1')
t2 = table.alias('t2')
t3 = table.alias('t3')
q1 = session.query(t3.c.id).filter(t3.c.id < t1.c.id).order_by(t3.c.id.desc()).\
limit(1)
q2 = session.query(t2.c.id).filter(t2.c.id == q1, t1.c.id.in_([4, 8]))
q3 = session.query(table).filter(
or_(table.c.id.in_(q2), table.c.id.in_([0, 8])))
print list(q3)
# output: [(0,), (6,), (8,)]
What you are missing is a correlation between the innermost sub-query and the next level up; without the correlation, SQLAlchemy will include the t1 alias in the innermost sub-query:
>>> print str(q1)
SELECT t3.id AS t3_id
FROM tbl AS t3, tbl AS t1
WHERE t3.id < t1.id ORDER BY t3.id DESC
LIMIT ? OFFSET ?
>>> print str(q1.correlate(t1))
SELECT t3.id AS t3_id
FROM tbl AS t3
WHERE t3.id < t1.id ORDER BY t3.id DESC
LIMIT ? OFFSET ?
Note that tbl AS t1 is now missing from the query. From the .correlate() method documentation:
Return a Query construct which will correlate the given FROM clauses to that of an enclosing Query or select().
Thus, t1 is assumed to be part of the enclosing query, and isn't listed in the query itself.
Now your query works:
>>> q1 = session.query(t3.c.id).filter(t3.c.id < t1.c.id).order_by(t3.c.id.desc()).\
... limit(1).correlate(t1)
>>> q2 = session.query(t2.c.id).filter(t2.c.id == q1, t1.c.id.in_([4, 8]))
>>> q3 = session.query(table).filter(
... or_(table.c.id.in_(q2), table.c.id.in_([0, 8])))
>>> print list(q3)
2012-10-24 22:16:22,239 INFO sqlalchemy.engine.base.Engine SELECT tbl.id AS tbl_id
FROM tbl
WHERE tbl.id IN (SELECT t2.id AS t2_id
FROM tbl AS t2, tbl AS t1
WHERE t2.id = (SELECT t3.id AS t3_id
FROM tbl AS t3
WHERE t3.id < t1.id ORDER BY t3.id DESC
LIMIT ? OFFSET ?) AND t1.id IN (?, ?)) OR tbl.id IN (?, ?)
2012-10-24 22:16:22,239 INFO sqlalchemy.engine.base.Engine (1, 0, 4, 8, 0, 8)
[(0,), (2,), (6,), (8,)]
I'm only kinda sure I understand the query you're asking for. Lets break it down, though:
the goal from this query is to select IDs 0 and 8, as well as the IDs just before 4 and 8.
It looks like you want to query for two kinds of things, and then combine them. The proper operator for that is union. Do the simple queries and add them up at the end. I'll start with the second bit, "ids just before X".
To start with; lets look at the all the ids that are before some given value. For this, we'll join the table on itself with a <:
# select t1.id t1_id, t2.id t2_id from tbl t1 join tbl t2 on t1.id < t2.id;
t1_id | t2_id
-------+-------
0 | 2
0 | 4
0 | 6
0 | 8
2 | 4
2 | 6
2 | 8
4 | 6
4 | 8
6 | 8
(10 rows)
That certainly gives us all of the pairs of rows where the left is less than the right. Of all of them, we want the rows for a given t2_id that is as high as possible; We'll group by t2_id and select the maximum t1_id
# select max(t1.id), t2.id from tbl t1 join tbl t2 on t1.id < t2.id group by t2.id;
max | id
-----+-------
0 | 2
2 | 4
4 | 6
6 | 8
(4 rows)
Your query, using a limit, could achieve this, but its usually a good idea to avoid using this technique when alternatives exist because partitioning does not have good, portable support across Database implementations. Sqlite can use this technique, but postgresql doesn't like it, it uses a technique called "analytic queries" (which are both standardised and more general). MySQL can do neither. The above query, though, works consistently across all sql database engines.
the rest of the work is just using in or other equivalent filtering queries and are not difficult to express in sqlalchemy. The boilerplate...
>>> import sqlalchemy as sa
>>> from sqlalchemy.orm import Query
>>> engine = sa.create_engine('sqlite:///:memory:')
>>> meta = sa.MetaData(bind=engine)
>>> table = sa.Table('tbl', meta, sa.Column('id', sa.Integer))
>>> meta.create_all()
>>> table.insert().execute([{'id':i} for i in range(0, 10, 2)])
>>> t1 = table.alias()
>>> t2 = table.alias()
>>> before_filter = [4, 8]
First interesting bit is we give the 'max(id)' expression a name. this is needed so that we can refer to it more than once, and to lift it out of a subquery.
>>> c1 = sa.func.max(t1.c.id).label('max_id')
>>> # ^^^^^^
The 'heavy lifting' portion of the query, join the above aliases, group and select the max
>>> q1 = Query([c1, t2.c.id]) \
... .join((t2, t1.c.id < t2.c.id)) \
... .group_by(t2.c.id) \
... .filter(t2.c.id.in_(before_filter))
Because we'll be using a union, we need this to produce the right number of fields: we wrap it in a subquery and project down to the only column we're interested in. This will have the name we gave it in the above label() call.
>>> q2 = Query(q1.subquery().c.max_id)
>>> # ^^^^^^
The other half of the union is much simpler:
>>> t3 = table.alias()
>>> exact_filter = [0, 8]
>>> q3 = Query(t3).filter(t3.c.id.in_(exact_filter))
All that's left is to combine them:
>>> q4 = q2.union(q3)
>>> engine.execute(q4.statement).fetchall()
[(0,), (2,), (6,), (8,)]
The responses here helped me fix my issue but in my case I had to use both correlate() and subquery():
# ...
subquery = subquery.correlate(OuterCorrelationTable).subquery()
filter_query = db.session.query(func.sum(subquery.c.some_count_column))
filter = filter_query.as_scalar() == as_many_as_some_param
# ...
final_query = db.session.query(OuterCorrelationTable).filter(filter)

Python: How to Sort SQL statements in a text file?

The output below is from Oracle; where it generates "create table" statements using a supplied package. I feed them into the python diff tool HtmlDiff which uses difflib under the covers. Each table is followed by a number of "alter table add constraint" commands that add the various constraints.
The problem is that there is no explicit ordering of the "alter table" commands, and I have to re-order them after generating the output file, before I run the comparison.
I need help doing this; ideally using python, sed or awk.
The rules would be
After each line containing "CREATE TABLE" until the next line containing "CREATE TABLE"
sort each line containing " ADD CONSTRAINT "
This is a sample of my output:
CREATE TABLE T1
( BOOK_ID NUMBER NOT NULL ENABLE,
LOCATION_ID NUMBER NOT NULL ENABLE,
NAME VARCHAR2(255) NOT NULL ENABLE,
LEGAL_ENTITY VARCHAR2(255) NOT NULL ENABLE,
STATUS CHAR(1) NOT NULL ENABLE
) ;
ALTER TABLE T1 ADD CONSTRAINT T1_CHK_1 CHECK ( status IN ( 'A', 'I' , 'T' ) ) ENABLE;
ALTER TABLE T1 ADD CONSTRAINT T1_PK PRIMARY KEY (BOOK_ID) ENABLE;
ALTER TABLE T1 ADD CONSTRAINT T1_AK_1 UNIQUE (NAME) ENABLE;
ALTER TABLE T1 ADD CONSTRAINT T1_FK_1 FOREIGN KEY (LOCATION_ID) REFERENCES T6 (LOCATION_ID) ENABLE;
CREATE TABLE T2
(
BUCKET_ID NUMBER NOT NULL ENABLE,
DATA_LOAD_SESSION_ID NUMBER,
IS_LOCKED CHAR(1) NOT NULL ENABLE,
LOCK_DATE_TIME DATE
) ;
ALTER TABLE T2 ADD CONSTRAINT CKC_IS_LOCKED_BUCKET CHECK (IS_LOCKED in ('T','F')) ENABLE;
ALTER TABLE T2 ADD CONSTRAINT T2_PK PRIMARY KEY (BUCKET_ID) ENABLE;
ALTER TABLE T2 ADD CONSTRAINT T2_FK_2 FOREIGN KEY (RRDB_STAGING_TABLE_ID) REFERENCES T4 (RRDB_STAGING_TABLE_ID) ENABLE;
CREATE TABLE T3
( VALUE_DATE DATE,
NODE_ID NUMBER,
RESULT_UID VARCHAR2(255),
LATEST_EOD_SING_VAL_RESULT_ID NUMBER NOT NULL ENABLE
)
ALTER TABLE T3 ADD CONSTRAINT T3_PK PRIMARY KEY (VALUE_DATE, NODE_ID, RESULT_UID) ENABLE;
Desired output: the only difference is the ordering of the " ADD CONSTRAINT " lines
CREATE TABLE T1
( BOOK_ID NUMBER NOT NULL ENABLE,
LOCATION_ID NUMBER NOT NULL ENABLE,
NAME VARCHAR2(255) NOT NULL ENABLE,
LEGAL_ENTITY VARCHAR2(255) NOT NULL ENABLE,
STATUS CHAR(1) NOT NULL ENABLE
) ;
ALTER TABLE T1 ADD CONSTRAINT T1_AK_1 UNIQUE (NAME) ENABLE;
ALTER TABLE T1 ADD CONSTRAINT T1_CHK_1 CHECK ( status IN ( 'A', 'I' , 'T' ) ) ENABLE;
ALTER TABLE T1 ADD CONSTRAINT T1_FK_1 FOREIGN KEY (LOCATION_ID) REFERENCES T6 (LOCATION_ID) ENABLE;
ALTER TABLE T1 ADD CONSTRAINT T1_PK PRIMARY KEY (BOOK_ID) ENABLE;
CREATE TABLE T2
(
BUCKET_ID NUMBER NOT NULL ENABLE,
DATA_LOAD_SESSION_ID NUMBER,
IS_LOCKED CHAR(1) NOT NULL ENABLE,
LOCK_DATE_TIME DATE
) ;
ALTER TABLE T2 ADD CONSTRAINT CKC_IS_LOCKED_BUCKET CHECK (IS_LOCKED in ('T','F')) ENABLE;
ALTER TABLE T2 ADD CONSTRAINT T2_FK_2 FOREIGN KEY (RRDB_STAGING_TABLE_ID) REFERENCES T4 (RRDB_STAGING_TABLE_ID) ENABLE;
ALTER TABLE T2 ADD CONSTRAINT T2_PK PRIMARY KEY (BUCKET_ID) ENABLE;
CREATE TABLE T3
( VALUE_DATE DATE,
NODE_ID NUMBER,
RESULT_UID VARCHAR2(255),
LATEST_EOD_SING_VAL_RESULT_ID NUMBER NOT NULL ENABLE
)
ALTER TABLE T3 ADD CONSTRAINT T3_PK PRIMARY KEY (VALUE_DATE, NODE_ID, RESULT_UID) ENABLE;
Load it all in to a string, then split on ";" to build an array of SQL commands. Loop the array and build a new sorted array. Pass along the CREATE TABLE bits and slice out the ALTER TABLE statements in to a separate list. sort() the alter list and extend it to the result array. When you're done, ';\n'.join(result_array) + ';'.
Quick-and-dirty, but i think this will do it for your case (worked on the example for sure):
conList = []
for ln in f:
if ' ADD CONSTRAINT ' in ln:
conList.append(ln)
else:
for it in sorted(conList):
print it
conList = []
print ln
# finish any unfinished business
for it in sorted(conList):
print it

Categories

Resources