troubles with 'WHERE...IN' clause - python

I'm trying to run the following query through pandasql, but the output I get is not what I was expecting. I was expecting to get a table with exactly 800 rows as I am selecting the only employee_day_transmitters of the table employee_days_transmitters, but what I get is a table with more than 800 rows. What's wrong? How can I get exactly 800 rows related to the employee_day_transmitters selected in the table employee_days_transmitters?
query_text = '''WITH employee_days_transmitters AS (
SELECT DISTINCT
employeeId
, theDate
, transmitterId
, employeeId || '-' || CAST(theDate AS STRING) || '-' || transmitterId AS employee_day_transmitter
FROM
table1
WHERE variable='rpv'
ORDER BY
RANDOM()
LIMIT
800
)
SELECT
*
FROM
table1
WHERE
(employeeId || '-' || CAST(theDate AS STRING) || '-' || transmitterId) IN (SELECT employee_day_transmitter FROM employee_days_transmitters) AND variable = 'rpv'
'''
table2=pandasql.sqldf(query_text,globals())

You are using DISTINCT in the CTE, so I suspect you have duplicates for the combination of the columns employeeId, theDate, transmitterId and this why you get more than 800 rows.
You select 800 rows in the CTE but when you use the operator IN in your main query, all the rows that satisfy your conditions are returned, which are more than 800.
But why do you use the CTE?
You could apply the conditions directly in the main query:
SELECT DISTINCT employeeId, theDate, transmitterId
FROM table1
WHERE variable='rpv'
ORDER BY RANDOM()
LIMIT 800
Or maybe with ROW_NUMBER() window function:
WITH cte AS (
SELECT id
FROM (
SELECT rowid id,
ROW_NUMBER() OVER (PARTITION BY employeeId, theDate, transmitterId ORDER BY RANDOM()) rn
FROM table1
WHERE variable='rpv'
)
WHERE rn = 1
ORDER BY RANDOM()
LIMIT 800
)
SELECT *
FROM table1
WHERE rowid IN cte

Related

split column into multi dynamically in python or sql

I'm trying to Split the details column into multi using T-sql or python.
the table is like this:
ID
Details
15
Hotel:Campsite;Message:Reservation inquiries
150
Page:45-discount-y;PageLink:https://xx.xx.net/SS/45-discount-y/|
13
NULL
There are a lot of keys or columns under the details. So I want a dynamic way to split the details into multiple columns using python or tsql
The desired output:
ID
Details
Hotel
Message
Page
PageLink
15
Hotel:Campsite;Message:Reservation inquiries
Campsite
Reservation inquiries
NULL
NULL
150
Page:45-discount-y;PageLink:https://xx.xx.net/SS/45-discount-y
NULL
NULL
45-discount-y
https://xx.xx.net/SS/45-discount-y/|
13
NULL
NULL
NULL
NULL
NULL
First :split part of Data ';' with string_split
second :after split second part of Data with string_split and replace
we use replace for Handle character : in 'Page Link in'
Finally use pivot
dECLARE #cols AS NVARCHAR(MAX),#scols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
set #query = '
;with cte as (
select Id,Details,valuesd,[1],replace( [2],''https//'',''https://'') as [2] from (
select * from (
select Id,Details,value as valuesd
from T
cross apply(
select *
from string_split(Details,'';'')
)d
)t
cross apply (
select RowN=Row_Number() over (Order by (SELECT NULL)), value
from string_split(replace( t.valuesd,''https://'',''https//''), '':'')
) d
) src
pivot (max(value) for src.RowN in([1],[2])) p
)
SELECT T.id,T.Details,Max([Hotel]) as [Hotel],Max([Message]) as Message,Max([Page]) as Page,Max([PageLink]) as PageLink from
(
select Id,Details, valuesd,[1],[2]
from cte
) x
pivot
(
max( [2]) for [1] in ([Hotel],[Message],[Page],[PageLink])
) p
right join T on p.id=T.id
group by T.id,T.Details
'
execute(#query)
You can to insert the basic data with the following codes
create table T(id int,Details nvarchar(max))
insert into T
select 15,'Hotel:Campsite;Message:Reservation inquiries' union all
select 150,'Page:45-discount-y;PageLink:https://xx.xx.net/SS/45-discount-y/|' union all
select 13, null

Alternative to WITH RECURSIVE CLAUSE

Snowflake DB does not support recursive with clause function , Need help me on how to achieve below query . Below query works well in Teradata
If any one also can help me to achieve using Python that would be great
WITH RECURSIVE RECURTEMP(ID,KCODE,LVL)
AS(SELECT ID, MIN(KCODE) AS KCODE,1
FROM TABLE_A
GROUP BY 1
UNION ALL
SELECT b.ID, trim(a.KCODE)|| ';'||trim(b.KCODE), LVL+1
FROM TABLE_A a
INNER JOIN RECURTEMP b ON a.ID = b.ID AND a.KCODE > b.KCODE
)
SELECT * FROM RECURTEMP
![Result]: https://imgur.com/a/ppSRXeT
CREATE TABLE MYTABLE (
ID VARCHAR2(50),
KCODE VARCHAR2(50)
);
INSERT INTO MYTABLE VALUES ('ABCD','K10');
INSERT INTO MYTABLE VALUES ('ABCD','K53');
INSERT INTO MYTABLE VALUES ('ABCD','K55');
INSERT INTO MYTABLE VALUES ('ABCD','K56');
COMMIT;
OUTPUT as below
ID KCODE LEVEL
--------------------------------------
ABCD K10 1
ABCD K53;K10 2
ABCD K55;K10 2
ABCD K56;K10 2
ABCD K55;K53;K10 3
ABCD K56;K53;K10 3
ABCD K56;K55;K10 3
ABCD K56;K55;K53;K10 4
Recursive WITH is now supported in Snowflake.
Your query
WITH RECURSIVE RECURTEMP(ID,KCODE,LVL) AS(
SELECT
ID,
MIN(KCODE) AS KCODE,
1
FROM
TABLE_A
GROUP BY
1
UNION ALL
SELECT
b.ID,
trim(a.KCODE) || ';' || trim(b.KCODE) AS KCODE,
LVL+1
FROM
TABLE_A a
INNER JOIN RECURTEMP b ON (a.ID = b.ID AND a.KCODE > b.KCODE)
)
SELECT * FROM RECURTEMP
Link to article is below.
https://docs.snowflake.net/manuals/user-guide/queries-cte.html#overview-of-recursive-cte-syntax

How to prevent nested queries in sqlalchemy from selecting a table again?

I wrote a query for mysql that achieved what I wanted. It's structured a bit like this:
select * from table_a where exists(
select * from table_b where table_a.x = table_b.x and exists(
select * from table_c where table_a.y = table_c.y and table_b.z = table_c.z
)
)
I translated the query to sqlalchemy and the result is structured like this:
session.query(table_a).filter(
session.query(table_b).filter(table_a.x == table_b.x).filter(
session.query(table_c).filter(table_a.y == table_c.y).filter(table_b.x == table_c.z).exists()
).exists()
)
Which generates a query like this:
select * from table_a where exists(
select * from table_b where table_a.x = table_b.x and exists(
select * from table_c, table_a where table_a.y = table_c.y and table_b.z = table_c.z
)
)
Note the re-selection of table_a in the innermost query - which breaks the intended functionality.
How can I stop sqlalchemy from selecting the table again in a nested query?
Tell the innermost query to correlate all except table_c:
session.query(table_a).filter(
session.query(table_b).filter(table_a.x == table_b.x).filter(
session.query(table_c).filter(table_a.y == table_c.y).filter(table_b.x == table_c.z)
.exists().correlate_except(table_c)
).exists()
)
In contrast to "auto-correlation", which only considers FROM elements from the enclosing Select, explicit correlation will consider FROM elements from any nesting level as candidates.

Multiple insertion of one value in sqlalchemy statement to pandas

I have constructed a sql clause where I reference the same table as a and b to compare the two geometries as a postgis command.
I would like to pass a value into the sql statement using the %s operator and read the result into a pandas dataframe using to_sql, params kwargs. Currently my code will allow for one value to be passed to one %s but i'm looking for multiple insertions of the same list of values.
I'm connecting to a postgresql database using psycopg2.
Simplified code is below
sql = """
SELECT
st_distance(a.the_geom, b.the_geom, true) AS dist
FROM
(SELECT
table.*
FROM table
WHERE id in %s) AS a,
(SELECT
table.*
FROM table
WHERE id in %s) AS b
WHERE a.nid <> b.nid """
sampList = (14070,11184)
df = pd.read_sql(sql, con=conn, params = [sampList])
Basically i'm looking to replace both %s with the sampList value in both places. The code as written will only replace the first value indicating ': list index out of range. If I adjust to having one %s and replacing the second in statement with numbers the code runs, but ultimately I would like away to repeat those values.
You dont need the subqueries, just join the table with itself:
SELECT a.*, b.* -- or whatwever
, st_distance(a.the_geom, b.the_geom, true) AS dist
FROM ztable a
JOIN ztable b ON a.nid < b.nid
WHERE a.id IN (%s)
AND b.id IN (%s)
;
avoid repetition by using a CTE (this may be non-optimal, performance-wise)
WITH zt AS (
SELECT * FROM ztable
WHERE id IN (%s)
)
SELECT a.*, b.* -- or whatever
, st_distance(a.the_geom, b.the_geom, true) AS dist
FROM zt a
JOIN zt b ON a.nid < b.nid
;
Performance-wise, I would just stick to the first version, and supply the list-argument twice. (or refer to it twice, using a FORMAT() construct)
first of all i would recommend you to use updated SQL from #wildplasser - it's much better and more efficient way to do that.
now you can do the following:
sql_ = """\
WITH zt AS (
SELECT * FROM ztable
WHERE id IN ({})
)
SELECT a.*, b.* -- or whatever
, st_distance(a.the_geom, b.the_geom, true) AS dist
FROM zt a
JOIN zt b ON a.nid < b.nid
"""
sampList = (14070,11184)
sql = sql_.format(','.join(['?' for x in sampList]))
df = pd.read_sql(sql, con=conn, params=sampList)
dynamically generated SQL with parameters (AKA: prepared statements, bind variables, etc.):
In [27]: print(sql)
WITH zt AS (
SELECT * FROM ztable
WHERE id IN (?,?)
)
SELECT a.*, b.* -- or whatever
, st_distance(a.the_geom, b.the_geom, true) AS dist
FROM zt a
JOIN zt b ON a.nid < b.nid

Nested Query in From and Having clauses

Schema:
CREATE TABLE companies (
company_name varchar(200),
market varchar(200),
funding_total integer,
status varchar(20),
country varchar(10),
state varchar(10),
city varchar(30),
funding_rounds integer,
founded_at date,
first_funding_at date,
last_funding_at date,
PRIMARY KEY (company_name,market,city)
);
Query:
What is/are the state(s) that has/have the largest number(s) of startups in the "Security" market (i.e. market column contains the word "Security"), listing all ties?
Code:
db.executescript("""
DROP VIEW IF EXISTS q3;
select companies.state, count(*)as total
from companies
where companies.market like '%Security%'
group by companies.state
having count(*) =
(
select max(countGroup) as maxNumber
from (select C.state, count(*) as countGroup
from companies as C
where C.market like '%Security%'
group by C.state)
);
"""
EDIT:
There is still an error because the output/result is empty. Any ideas why?
Try this. (Please adapt syntax of your RDBMS)
select state, total from
( select companies.state, count(*)as total
from companies
where companies.market like '%Security%'
group by companies.state
) as countgroups
where total =
(
select max(countGroup) as maxNumber
from (select C.state, count(*) as countGroup
from companies as C
where C.market like '%Security%'
group by C.state)
);
Alternatively:
select state, total from
(select companies.state, count(*)as total
from companies
where companies.market like '%Security%'
group by companies.state
) order by 2 desc
limit 1; --please adapt syntax of your RDBMS
Surround the subquery with parentheses

Categories

Resources