Snowflake DB does not support recursive with clause function , Need help me on how to achieve below query . Below query works well in Teradata
If any one also can help me to achieve using Python that would be great
WITH RECURSIVE RECURTEMP(ID,KCODE,LVL)
AS(SELECT ID, MIN(KCODE) AS KCODE,1
FROM TABLE_A
GROUP BY 1
UNION ALL
SELECT b.ID, trim(a.KCODE)|| ';'||trim(b.KCODE), LVL+1
FROM TABLE_A a
INNER JOIN RECURTEMP b ON a.ID = b.ID AND a.KCODE > b.KCODE
)
SELECT * FROM RECURTEMP
![Result]: https://imgur.com/a/ppSRXeT
CREATE TABLE MYTABLE (
ID VARCHAR2(50),
KCODE VARCHAR2(50)
);
INSERT INTO MYTABLE VALUES ('ABCD','K10');
INSERT INTO MYTABLE VALUES ('ABCD','K53');
INSERT INTO MYTABLE VALUES ('ABCD','K55');
INSERT INTO MYTABLE VALUES ('ABCD','K56');
COMMIT;
OUTPUT as below
ID KCODE LEVEL
--------------------------------------
ABCD K10 1
ABCD K53;K10 2
ABCD K55;K10 2
ABCD K56;K10 2
ABCD K55;K53;K10 3
ABCD K56;K53;K10 3
ABCD K56;K55;K10 3
ABCD K56;K55;K53;K10 4
Recursive WITH is now supported in Snowflake.
Your query
WITH RECURSIVE RECURTEMP(ID,KCODE,LVL) AS(
SELECT
ID,
MIN(KCODE) AS KCODE,
1
FROM
TABLE_A
GROUP BY
1
UNION ALL
SELECT
b.ID,
trim(a.KCODE) || ';' || trim(b.KCODE) AS KCODE,
LVL+1
FROM
TABLE_A a
INNER JOIN RECURTEMP b ON (a.ID = b.ID AND a.KCODE > b.KCODE)
)
SELECT * FROM RECURTEMP
Link to article is below.
https://docs.snowflake.net/manuals/user-guide/queries-cte.html#overview-of-recursive-cte-syntax
Related
I'm trying to run the following query through pandasql, but the output I get is not what I was expecting. I was expecting to get a table with exactly 800 rows as I am selecting the only employee_day_transmitters of the table employee_days_transmitters, but what I get is a table with more than 800 rows. What's wrong? How can I get exactly 800 rows related to the employee_day_transmitters selected in the table employee_days_transmitters?
query_text = '''WITH employee_days_transmitters AS (
SELECT DISTINCT
employeeId
, theDate
, transmitterId
, employeeId || '-' || CAST(theDate AS STRING) || '-' || transmitterId AS employee_day_transmitter
FROM
table1
WHERE variable='rpv'
ORDER BY
RANDOM()
LIMIT
800
)
SELECT
*
FROM
table1
WHERE
(employeeId || '-' || CAST(theDate AS STRING) || '-' || transmitterId) IN (SELECT employee_day_transmitter FROM employee_days_transmitters) AND variable = 'rpv'
'''
table2=pandasql.sqldf(query_text,globals())
You are using DISTINCT in the CTE, so I suspect you have duplicates for the combination of the columns employeeId, theDate, transmitterId and this why you get more than 800 rows.
You select 800 rows in the CTE but when you use the operator IN in your main query, all the rows that satisfy your conditions are returned, which are more than 800.
But why do you use the CTE?
You could apply the conditions directly in the main query:
SELECT DISTINCT employeeId, theDate, transmitterId
FROM table1
WHERE variable='rpv'
ORDER BY RANDOM()
LIMIT 800
Or maybe with ROW_NUMBER() window function:
WITH cte AS (
SELECT id
FROM (
SELECT rowid id,
ROW_NUMBER() OVER (PARTITION BY employeeId, theDate, transmitterId ORDER BY RANDOM()) rn
FROM table1
WHERE variable='rpv'
)
WHERE rn = 1
ORDER BY RANDOM()
LIMIT 800
)
SELECT *
FROM table1
WHERE rowid IN cte
Say I have a sqlite table set up as such:
ColumnA | ColumnB
---------|----------
A | One
B | One
C | Two
D | Three
E | Two
F | Three
G | Three
Is there a query that would find the number of instances in Column A that have the same instance in Column B? Or would using a script to pull from rows (python sqlite3) be better?
for instance,
query("One") = 2
query("Two") = 2
query("Three") = 3
Thank you
This can easily be achieved by sqlite3 itself.
$ sqlite3 mydb.db
SQLite version 3.11.0 2016-02-15 17:29:24
Enter ".help" for usage hints.
sqlite> .databases
seq name file
--- --------------- ----------------------------------------------------------
0 main /home/ziya/mydb.db
sqlite> create table two_column( col_a char(5), col_b varchar(20) );
sqlite> insert into two_column values('A', 'One');
sqlite> insert into two_column values('B', 'One');
sqlite> insert into two_column values('C', 'Two');
sqlite> insert into two_column values('D', 'Three');
sqlite> insert into two_column values('E', 'Two');
sqlite> insert into two_column values('F', 'Three');
sqlite> insert into two_column values('G', 'Three');
sqlite> select * from two_column;
A|One
B|One
C|Two
D|Three
E|Two
F|Three
G|Three
sqlite> select count(*) from two_column where col_b = 'One';
2
sqlite> select count(*) from two_column where col_b = 'Two';
2
sqlite> select count(*) from two_column where col_b = 'Three';
3
If you are ok with python,
>>> import sqlite3
>>> c = sqlite3.connect("mydb.db")
>>> cur = c.execute("SELECT COUNT(col_b) FROM two_column WHERE col_b = '{}' ".format('A'))
>>> [r for r in cur]
[(0,)]
You can easily make a function using above statements.
If you want to get all counts at once, you can use grouping:
SELECT ColumnB, count(*)
FROM MyTable
GROUP BY ColumnB;
I have constructed a sql clause where I reference the same table as a and b to compare the two geometries as a postgis command.
I would like to pass a value into the sql statement using the %s operator and read the result into a pandas dataframe using to_sql, params kwargs. Currently my code will allow for one value to be passed to one %s but i'm looking for multiple insertions of the same list of values.
I'm connecting to a postgresql database using psycopg2.
Simplified code is below
sql = """
SELECT
st_distance(a.the_geom, b.the_geom, true) AS dist
FROM
(SELECT
table.*
FROM table
WHERE id in %s) AS a,
(SELECT
table.*
FROM table
WHERE id in %s) AS b
WHERE a.nid <> b.nid """
sampList = (14070,11184)
df = pd.read_sql(sql, con=conn, params = [sampList])
Basically i'm looking to replace both %s with the sampList value in both places. The code as written will only replace the first value indicating ': list index out of range. If I adjust to having one %s and replacing the second in statement with numbers the code runs, but ultimately I would like away to repeat those values.
You dont need the subqueries, just join the table with itself:
SELECT a.*, b.* -- or whatwever
, st_distance(a.the_geom, b.the_geom, true) AS dist
FROM ztable a
JOIN ztable b ON a.nid < b.nid
WHERE a.id IN (%s)
AND b.id IN (%s)
;
avoid repetition by using a CTE (this may be non-optimal, performance-wise)
WITH zt AS (
SELECT * FROM ztable
WHERE id IN (%s)
)
SELECT a.*, b.* -- or whatever
, st_distance(a.the_geom, b.the_geom, true) AS dist
FROM zt a
JOIN zt b ON a.nid < b.nid
;
Performance-wise, I would just stick to the first version, and supply the list-argument twice. (or refer to it twice, using a FORMAT() construct)
first of all i would recommend you to use updated SQL from #wildplasser - it's much better and more efficient way to do that.
now you can do the following:
sql_ = """\
WITH zt AS (
SELECT * FROM ztable
WHERE id IN ({})
)
SELECT a.*, b.* -- or whatever
, st_distance(a.the_geom, b.the_geom, true) AS dist
FROM zt a
JOIN zt b ON a.nid < b.nid
"""
sampList = (14070,11184)
sql = sql_.format(','.join(['?' for x in sampList]))
df = pd.read_sql(sql, con=conn, params=sampList)
dynamically generated SQL with parameters (AKA: prepared statements, bind variables, etc.):
In [27]: print(sql)
WITH zt AS (
SELECT * FROM ztable
WHERE id IN (?,?)
)
SELECT a.*, b.* -- or whatever
, st_distance(a.the_geom, b.the_geom, true) AS dist
FROM zt a
JOIN zt b ON a.nid < b.nid
Previously, I stumbled across one interesting thing in Oracle -
Oracle: ORA-01722: invalid number. It turned out to be Oracle's natural behaviour (though different from other major databases I dealt before - MySQL, Postgres and SQLite). But now I see another counterintuitive thing - I have a very simple query which returns results in the shell, but returns nothing from Python code. This is the query:
SELECT * FROM TEST_TABLE T0
INNER JOIN TEST_TABLE_2 T1 ON T1.ATTR=T0.ID
INNER JOIN TEST_TABLE_3 T2 ON T2.ID = T1.ID
As you can see, it's a very simple query with just two simple joins. And here is voodoo magic screencast:
So, as you can see in the shell it returns data. Here is another picture of the ghost:
Now you see that in Python code it returns nothing. However, it does return if we tune this query a little bit - just remove the second join:
So, what is wrong with all that? And how can I trust Oracle? (now it just seems to me that I can rely more on a file database like SQLite, then on Oracle).
EDIT
Below is schema with data:
SQL> SELECT COLUMN_NAME, DATA_TYPE FROM USER_TAB_COLUMNS
WHERE TABLE_NAME = 'TEST_TABLE';
COLUMN_NAME
------------------------------
DATA_TYPE
------------------------------
ID
NUMBER
SQL> SELECT * FROM TEST_TABLE;
ID
----------
1
SQL> SELECT COLUMN_NAME, DATA_TYPE FROM USER_TAB_COLUMNS
WHERE TABLE_NAME = 'TEST_TABLE_2';
COLUMN_NAME
------------------------------
DATA_TYPE
------------------------------
ID
NUMBER
TXT
VARCHAR2
ATTR
NUMBER
SQL> SELECT * FROM TEST_TABLE_2;
ID
----------
TXT
-----------------------------------
ATTR
----------
2
hello
1
SQL> SELECT COLUMN_NAME, DATA_TYPE FROM USER_TAB_COLUMNS
WHERE TABLE_NAME = 'TEST_TABLE_3';
COLUMN_NAME
------------------------------
DATA_TYPE
------------------------------
ID
NUMBER
SQL> SELECT * FROM TEST_TABLE_3;
ID
----------
2
EDIT
To be more precise, I created my three tables with these statements:
CREATE TABLE test_table(id number(19) default 0 not null)
CREATE TABLE test_table_2(txt varchar(255),id number(19) default 0 not null,attr number(19) default 0 not null)
CREATE TABLE test_table_3(id number(19) default 0 not null)
How to do this thing?
I have two table
Table1:
-id
-table2_id_1
-table2_id_2
Table2:
-id
-table3_id
Table3:
-id
-table4_id
-table5_id
-table6_id
Table4, Table5 and Table6:
-id
-name
-date
Main table is Table1
db(db.Table1).select()
I need to join twice Table2(colums) in witch i need to join Table3(in each table2_id_1 and table2_id_2 field table3_id is equals), than join Table4,Table5,Table6
I don't know, if I really got, what you are trying to do, but if you just want to join the tables according to the id's, something like that should work:
SELECT *
FROM table1 a JOIN table2 b ON (a.table2_id_1 = b.id) JOIN
table2 c ON (a.table2_id_2 = c.id) JOIN
table3 d ON (b.table3_id = d.id) JOIN
table3 e ON (c.table3_id = e.id) JOIN
table4 f ON (d.table4_id = f.id) JOIN
table5 g ON (d.table5_id = g.id) JOIN
table6 h ON (d.table6_id = h.id) JOIN
table4 i ON (e.table4_id = i.id) JOIN
table5 j ON (e.table5_id = j.id) JOIN
table6 k ON (e.table6_id = k.id)