ASSERTION ERROR: Issue in running SQL query - python

Question #1
List all the directors who directed a 'Comedy' movie in a leap year. (You need to check that the genre is 'Comedy’ and year is a leap year) Your query should return director name, the movie name, and the year.
%%time
def grader_1(q1):
q1_results = pd.read_sql_query(q1,conn)
print(q1_results.head(10))
assert (q1_results.shape == (232,3))
#m as movie , m_director as md,Genre as g,Person as p
query1 ="""SELECT m.Title,p.Name,m.year
FROM Movie m JOIN
M_director d
ON m.MID = d.MID JOIN
Person p
ON d.PID = p.PID JOIN
M_Genre mg
ON m.MID = mg.MID JOIN
Genre g
ON g.GID = mg.GID
WHERE g.Name LIKE '%Comedy%'
AND ( m.year%4 = 0
AND m.year % 100 <> 0
OR m.year % 400 = 0 ) LIMIT 2"""
grader_1(query1)
ERROR:
title Name year
0 Mastizaade Milap Zaveri 2016
1 Harold & Kumar Go to White Castle Danny Leiner 2004
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-17-a942fcc98f72> in <module>()
----> 1 get_ipython().run_cell_magic('time', '', 'def grader_1(q1):\n q1_results = pd.read_sql_query(q1,conn)\n print(q1_results.head(10))\n assert (q1_results.shape == (232,3))\n\n#m as movie , m_director as md,Genre as g,Person as p\nquery1 ="""SELECT m.Title,p.Name,m.year\nFROM Movie m JOIN \n M_director d\n ON m.MID = d.MID JOIN \n Person p\n ON d.PID = p.PID JOIN\n M_Genre mg\n ON m.MID = mg.MID JOIN\n Genre g \n ON g.GID = mg.GID\n WHERE g.Name LIKE \'%Comedy%\'\nAND ( m.year%4 = 0\nAND m.year % 100 <> 0\nOR m.year % 400 = 0 ) LIMIT 2"""\ngrader_1(query1)')
2 frames
<decorator-gen-53> in time(self, line, cell, local_ns)
/usr/local/lib/python3.7/dist-packages/IPython/core/magics/execution.py in time(self, line, cell, local_ns)
1191 else:
1192 st = clock2()
-> 1193 exec(code, glob, local_ns)
1194 end = clock2()
1195 out = None
<timed exec> in <module>()
<timed exec> in grader_1(q1)
AssertionError:
I have run this SQL query on IMDB DATASET without grad_1 function, I am able to run this query. However when I try to run within grader_1 function. I am getting assertion error.
How can I fix this?

Your query has a LIMIT clause, which prevents the SQL engine to fetch all data.
Just run it again without this clause.

query1 = """ SELECT M.title,Pe.Name,M.year FROM Movie M JOIN M_Director MD ON M.MID = MD.MID JOIN M_Genre MG ON M.MID = MG.MID JOIN Genre Ge ON MG.GID = Ge.GID JOIN Person Pe ON MD.PID = Pe.PID WHERE Ge.Name LIKE '%Comedy%' AND CAST(SUBSTR(TRIM(M.year),-4) AS INTEGER) % 4 = 0 AND (CAST(SUBSTR(TRIM(M.year),-4) AS INTEGER) % 100 <> 0 OR CAST(SUBSTR(TRIM(M.year),-4) AS INTEGER) % 400 = 0) """
Run this query all your problem resolves.

Related

Passing list of strings as parameter to SQL query in Python

I have a Python program that generates a report for returns. When run, a GUI pops up allowing the user to select categories from a list. I am trying to format my query so that the generated report only includes categories in the list of selected categories. The formatting for the start and end dates works so I'm not sure what I'm doing wrong for the category formatting.
The code for my GUI
options = ['Bags Children','Bags Mens','Bandanas & Handkerchiefs','Belts Children','Belts Mens','Belts Womens','Cold Weather Childrens','Cold Weather Mens','Cold Weather Womens','Face Masks','Handbags Womens','Headwear Childrens',\
'Headwear Mens','Headwear Womens','Jewelry Mens','Scarves & Wraps Womens','Sleepwear Childrens','Sleepwear Mens','Sleepwear Womens','Slippers Childrens','Slippers Mens','Slippers Womens','Socks & Hosiery Childrens',\
'Socks & Hosiery Mens','Socks & Hosiery Womens','Sunglasses & Cases','Suspenders Childrens','Suspenders Mens','Suspenders Womens','Travel Accessories','Umbrellas & Rain Gear','Umbrellas & Rain Gear Childrens','Umbrellas & Rain Gear Mens',\
'Umbrellas & Rain Gear Womens','Undergarments Childrens','Undergarments Mens','Undergarments Womens','Waist Packs & Belt Bags','Wallets & Small Accessories Childrens','Wallets & Small Accessories Mens','Wallets & Small Accessories Womens',\
'Womens Wallets & Handbag Accessories']
text = "Select Category(ies): "
title = 'Returns Summary'
cat_output = multchoicebox(text, title, options)
title = 'Message Box'
message = "Selected Categories: " + str(cat_output)
msg = msgbox(message, title)
print(cat_output)
The output for print(cat_output) is in the format ['Face Masks', 'Belts Mens', 'Belts Womens'] displaying the selected categories.
GUI Display
The code for my SQL query
SQL1 = "SELECT i.Category, i.ItemName, r.OrderNumber, r.SKUReceived, r.UnitPrice, r.Quantity, o.CartID, o.MarketName, o.Email \
FROM Returns AS r INNER JOIN Orders AS o ON r.OrderNumber = o.OrderNumber INNER JOIN Inventory AS i ON i.LocalSKU = r.SKU \
WHERE (((r.Date) between '%s' and '%s') AND ((r.UnitPrice)>0) AND o.CartID != 12 AND r.Type = 'R' AND i.Category IN ({})) \
ORDER BY r.OrderNumber;".format(cat_output) % (start, end)
The Error I get
Traceback
<module> Z:\Python\Returns Summary 3.0.py 381
GetReturns Z:\Python\Returns Summary 3.0.py 279
ProgrammingError: ('42S22', "[42S22] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid column name ''Bags Children', 'Belts Womens', 'Headwear Mens', 'Sleepwear Childrens''. (207) (SQLExecDirectW)")
SQL syntax suggests IN clause like … IN (‘m’,’l’). From what I see in the question, you have it like …IN ([‘m’,’l’]).
Try to use .format(“,”.join(repr(x) for x in cat_output))

It seems like I can't really get the output of my code to align properly using the format function. How am I supposed to do that?

I code that calculates the mean, standard deviation, modus, median, and quartile distance from a certain imported txt file with data.
However I've tried many things to properly sort and align the output to make it look somewhat neat, I figured the format function in Python would be my solution:
fmt = '{0:>18}: {1:>6.2f}'
for catlist in database:
huidigeCat = categories[counter]
fmt = '{0:>18}: {1:>6.2f}'
#numeriek
if isAlleenGetallen( catlist ):
#continue data ( floats )
if heeftFloat( catlist ):
floatslijst = maakFloats( catlist )
gemiddelde = getGemiddelde( floatslijst )
standaarddeviatie = getStandaarddeviatie( floatslijst,
gemiddelde )
print( huidigeCat, fmt.format("gemiddelde",(gemiddelde) ))
print(huidigeCat, fmt.format("standaarddeviatie",
(standaarddeviatie)))
#discrete data (integers)
else:
gesorteerdeLijst = sorted( maakIntegers( catlist ) )
mediaan = getMediaan( gesorteerdeLijst )
kwartielAfstand = getKwartielAfstand( gesorteerdeLijst )
print(huidigeCat, fmt.format("mediaan", mediaan))
print(huidigeCat, fmt.format("kwartielafstand",
kwartielAfstand))
#categoriaal
else:
#bereken de modus
#print( huidigeCat, "is Categoriaal")
modus = getModus( catlist )
print( huidigeCat, "Modus:", modus[0], "Aantal:", modus[1])
counter += 1
And this is the output of my current code:
Category Modus: Music/Movie/Game Aantal: 209
currency Modus: US Aantal: 663
sellerRating mediaan: 1853.00
sellerRating kwartielafstand: 2762.00
Duration mediaan: 7.00
Duration kwartielafstand: 2.00
endDay Modus: Mon Aantal: 292
ClosePrice gemiddelde: 38.85
ClosePrice standaarddeviatie: 100.10
OpenPrice gemiddelde: 14.21
OpenPrice standaarddeviatie: 49.38
Competitive? Modus: Yes Aantal: 569
How do I make this output somewhat readable?

Debugging: SQL inside Python Psycopg2

sql = "WITH users AS(SELECT * FROM stats.core_users cu LEFT JOIN XXXX.sent_hidden_users h USING(user_id)\
WHERE cu.status = 'hidden' AND h.user_id is null AND cu.country_code = 86 LIMIT 100)\
SELECT\
cu.user_id,\
CASE WHEN cu.gender = 'male' THEN 0 ELSE 1 END AS gender,\
CASE WHEN cu.looking_for_gender = cu.gender THEN 2 WHEN cu.looking_for_gender = 'both' THEN 1 ELSE 0 END AS sexual_orientation,\
CASE WHEN e2.os_name = 'iOS' THEN 0 ELSE 1 END AS device,\
ROUND((DATE(NOW()) - cu.birthdate)/365.25) AS user_age,\
SUM(dsb.likes) AS likes,\
SUM(dsb.dislikes) AS dislikes,\
SUM(dsb.blocks) AS blocks,\
SUM(dsb.matches) AS matches,\
SUM(dsb.received_likes) AS received_likes,\
SUM(dsb.received_dislikes) AS received_dislikes,\
SUM(dsb.received_blocks) AS received_blocks,\
cu.search_radius,\
cu.search_min_age,\
cu.search_max_age,\
'' AS recall_case,\
'' AS recall_retention\
FROM \
users cu\
LEFT JOIN \
yay.daily_swipes_by_users dsb ON (dsb.user_id = cu.user_id) \
LEFT JOIN LATERAL (\
SELECT \
cd.os_name \
FROM \
stats.core_devices cd \
WHERE \
cu.user_id = cd.user_id \
ORDER BY cd.updated_time DESC LIMIT 1) e2 ON TRUE \
GROUP BY 1,2,3,4,5,13,14,15,16,17\
;"
Error Information:
File "", line 5
sql = "WITH users AS(SELECT * FROM stats.core_users cu LEFT JOIN zhangqiao.sent_hidden_users h USING(user_id) WHERE cu.status = 'hidden' AND h.user_id is null AND cu.country_code = 86 LIMIT 100)SELECT cu.user_id, CASE WHEN cu.gender = 'male' THEN 0 ELSE 1 END AS gender, CASE WHEN cu.looking_for_gender = cu.gender THEN 2 WHEN cu.looking_for_gender = 'both' THEN 1 ELSE 0 END AS sexual_orientation, CASE WHEN e2.os_name = 'iOS' THEN 0 ELSE 1 END AS device, ROUND((DATE(NOW()) - cu.birthdate)/365.25) AS user_age, SUM(dsb.likes) AS likes, SUM(dsb.dislikes) AS dislikes, SUM(dsb.blocks) AS blocks, SUM(dsb.matches) AS matches,\
^
SyntaxError: EOL while scanning string literal
There seems to be a space after the \ in SUM(dsb.matches) AS matches,\. Get rid of that. As currently written, you are escaping the space with \ rather than the newline.
Your second error is because you need a space before the \ in this line:
'' AS recall_retention\
Because when you write:
'' AS recall_retention\
FROM \
users cu\
You get as a result:
'' AS recall_retentionFROM users cu
Hopefully the error there is obvious. Rather than mucking around with all these escapes, maybe you should just simplify your code by using multiline quotations (either ''' or """), like this:
sql = """WITH users AS(SELECT * FROM stats.core_users cu LEFT JOIN XXXX.sent_hidden_users h USING(user_id)
WHERE cu.status = 'hidden' AND h.user_id is null AND cu.country_code = 86 LIMIT 100)
SELECT
cu.user_id,
CASE WHEN cu.gender = 'male' THEN 0 ELSE 1 END AS gender,
CASE WHEN cu.looking_for_gender = cu.gender THEN 2 WHEN cu.looking_for_gender = 'both' THEN 1 ELSE 0 END AS sexual_orientation,
CASE WHEN e2.os_name = 'iOS' THEN 0 ELSE 1 END AS device,
ROUND((DATE(NOW()) - cu.birthdate)/365.25) AS user_age,
SUM(dsb.likes) AS likes,
SUM(dsb.dislikes) AS dislikes,
SUM(dsb.blocks) AS blocks,
SUM(dsb.matches) AS matches,
SUM(dsb.received_likes) AS received_likes,
SUM(dsb.received_dislikes) AS received_dislikes,
SUM(dsb.received_blocks) AS received_blocks,
cu.search_radius,
cu.search_min_age,
cu.search_max_age,
'' AS recall_case,
'' AS recall_retention
FROM
users cu
LEFT JOIN
yay.daily_swipes_by_users dsb ON (dsb.user_id = cu.user_id)
LEFT JOIN LATERAL (
SELECT
cd.os_name
FROM
stats.core_devices cd
WHERE
cu.user_id = cd.user_id
ORDER BY cd.updated_time DESC LIMIT 1) e2 ON TRUE
GROUP BY 1,2,3,4,5,13,14,15,16,17
;"""

How to use variables and OR cunjunctions in SQL statement in Python?

I have a list of Ids in a list named res that I want to use line by line as WHERE conditions on a SQL query before saving the results in an array :
ids
grupos
0 [160, 161, 365, 386, 471]
1 [296, 306]
Here is what I tried to insert it in a SQL query :
listado = [None]*len(res)
# We store the hashtags that describes the best the groups
# We iterate on the people of a group to construct the WHERE condition
print "res : ", res
for i in (0,len(res)):
conn = psycopg2.connect(**params)
cur = conn.cursor()
listado = [None]*len(res)
for i in (0,len(res)):
print "res[i:p] : ", res.iloc[i]['ids']
cur.execute("""SELECT COUNT(swipe.eclipse_id), subscriber_hashtag.hashtag_id FROM subscriber_hashtag
-- join para que las publicidades/eclipses que gusta un usarios estan vinculadas con las de la tabla de correspondencia con los hashtag
INNER JOIN eclipse_hashtag ON eclipse_hashtag.hashtag_id = subscriber_hashtag.hashtag_id
-- join para que los usarios estan vinculados con los de la tabla de correspondencia con los hashtag
LEFT OUTER JOIN swipe ON subscriber_hashtag.subscriber_id = swipe.subscriber_id
-- recobremos los "me gusta"
WHERE subscriber_hastag.subscriber_id in (%s)
GROUP BY subscriber_hashtag.hashtag_id
ORDER BY COUNT(swipe.eclipse_id) DESC;""",(res.iloc[i]['ids']))
n = cur.fetchall()
listado[i] = [{"count": elem[0], "eclipse_id": elem[1]} for elem in n]
Data for a reproducible example
Providing the further data informations :
subscriber_id hashtag_id
160 345
160 347
161 345
160 334
161 347
306 325
296 362
306 324
296 326
161 322
160 322
The output should, here, be like :
{0:[324,1],[325,1],[326,1],[362,1], 1 : [345,2],[347,2],[334,1]}
Current error message
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 50))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-44-f7c3c5b81303> in <module>()
39 WHERE subscriber_hastag.subscriber_id in (%s)
40 GROUP BY subscriber_hashtag.hashtag_id
---> 41 ORDER BY COUNT(swipe.eclipse_id) DESC;""",(res.iloc[i]['ids']))
42
43 n = cur.fetchall()
TypeError: not all arguments converted during string formatting
Have a look at tuples adaptation:
Python tuples are converted into a syntax suitable for the SQL IN operator and to represent a composite type:
Pass ids as a tuple query argument, so your argument to execute is a 1-tuple of tuple of ids, and drop the manual parentheses around %s. At the moment your (res.iloc[i]['ids']) is nothing but a sequence expression in redundant parentheses, so execute() uses it as the argument sequence, which causes your TypeError exception; your argument sequence has more arguments than the query has placeholders.
Try (tuple(res.iloc[i]['ids']),) instead. Note the comma, it is a very common error to omit it. All in all:
cur.execute("""SELECT COUNT(swipe.eclipse_id),
subscriber_hashtag.hashtag_id
FROM subscriber_hashtag
INNER JOIN eclipse_hashtag ON eclipse_hashtag.hashtag_id = subscriber_hashtag.hashtag_id
LEFT OUTER JOIN swipe ON subscriber_hashtag.subscriber_id = swipe.subscriber_id
WHERE subscriber_hashtag.subscriber_id in %s
GROUP BY subscriber_hashtag.hashtag_id
ORDER BY COUNT(swipe.eclipse_id) DESC;""",
(tuple(res.iloc[i]['ids']),))
Your for-loop is a bit strange, since you iterate over a 2-tuple (0, len(res)). Perhaps you meant range(len(res)). You could also just iterate over the Pandas Series:
for i, ids in enumerate(res['ids']):
...
cur.execute(..., (tuple(ids),))

Insert Python List into a single column in mySQL Database

Hi I am trying to insert a python list into a single column but it keeps giving an error on the syntax.
New to this. Appreciate any help. Thanks.
from time import time
import MySQLdb
import urllib
import re
from bs4 import BeautifulSoup
db = MySQLdb.connect("localhost","testuser","test123","testdb" )
cursor = db.cursor()
x=1
while x<2:
url = "http://search.insing.com/ts/food-drink/bars-pubs/bars-pubs?page=" +str(x)
htmlfile = urllib.urlopen(url)
soup = BeautifulSoup(htmlfile)
reshtml = [h3.a for h3 in soup.find("div", "results").find_all("h3")]
reslist = []
for item in reshtml:
res = item.text.encode('ascii', 'ignore')
reslist.append(' '.join(res.split()))
sql = "INSERT INTO insing(name) \
VALUES %r" \
% reslist
try:
cursor.execute(sql)
db.commit()
except:
db.rollback()
db.close()
x += 1
The output for SQL is
'INSERT INTO insing(name) VALUES [\'AdstraGold Microbrewery & Bistro Bar\', \'Alkaff Mansion Ristorante\', \'Parco Caffe\', \'The Fat Cat Bistro\', \'Gravity Bar\', \'The Wine Company (Evans Road)\', \'Serenity Spanish Bar & Restaurant (VivoCity)\', \'The New Harbour Cafe & Bar\', \'Indian Times\', \'Sunset Bay Beach Bar\', \'Friends # Jelita\', \'Talk Cock Sing Song # Thomson\', \'En Japanese Dining Bar (UE Square)\', \'Magma German Wine Bistro\', "Tam Kah Shark\'s Fin", \'Senso Ristorante & Bar\', \'Hard Rock Cafe (HPL House)\', \'St. James Power Station\', \'The St. James\', \'Brotzeit German Bier Bar & Restaurant (Vivocity)\']'
what about
insert into table(name) values ('name1'), ('name2'), ... , ('name36');
Inserting multiple rows in a single SQL query?
That might help too.
EDIT
I automated the process as well:
dataSQL = "INSERT INTO PropertyRow (SWID, Address, APN, PropertyType, PermissableUse, UseDetail, ReviewResult, Analysis, DocReviewed, AqDate, ValuePurchase, ValueCurrent, ValueDate, ValueBasis, ValueSale, SaleDate, PropPurpose, LotSize, Zoning, ParcelValue, EstRevenue, ReqRevenue, EnvHistory, TransitPotential, PlanObjective, PrevHistory, LastUpdDate, LastUpdUser)"
fields = "VALUES ("+"'"+str(rawID)+"', "
if(cell.ctype != 0):
while column < 27:
#column 16 will always be blank
if (column == 16):
column += 1
#column 26 is the end
if (column == 26):
fields += "'"+str(sh.cell_value(rowx=currentRow, colx=column)) + "'"
else:
#append to the value string
fields += "'"+str(sh.cell_value(rowx=currentRow, colx=column)) + "', "
#print fields
column+=1
fields += ');'
writeFyle.write(dataSQL)
writeFyle.write(fields)
In this implementation I am writing an insert statement for each row that I wanted to insert. This wasn't necessary but it was much easier.

Categories

Resources