I have a list of string that I need to pass to an sql query.
listofinput = []
for i in input:
listofinput.append(i)
if(len(listofinput)>1):
listofinput = format(tuple(listofinput))
sql_query = f"""SELECT * FROM countries
where
name in {listofinput};
"""
This works when I have a list, but in case of just one value it fails.
as listofinput = ['USA'] for one value
but listofinput ('USA', 'Germany') for multiple
also I need to do this for thousands of input, what is the best optimized way to achieve the same. name in my table countries is an indexed column
You can just convert to tuple and then if the second last character is a coma, remove it.
listofinput = format(tuple(input))
if listofinput[-2] == ",":
listofinput = f"{listofinput[:-2]})"
sql_query = f"""SELECT * FROM countries
where name in {listofinput};"""
Change if(len(listofinput)>1): to if(len(listofinput)>=1):
This might work.
Remove condition if(len(listofinput)>1) .
Because if you don't convert to tuple your query should be like this:
... where name in ['USA']
or
... where name in []
and in [...] not acceptable in SQL, only in (...) is acceptable.
You can remove format() too:
listofinput = tuple(listofinput)
Final Code:
listofinput = []
for i in input:
listofinput.append(i)
listofinput = tuple(listofinput)
sql_query = f"""SELECT * FROM countries
WHERE
name IN {listofinput};
"""
Yes the tuple with one element will required a ","
To circumvent your problem, maybe you can use string instead by just changing your code to the below:
listofinput = []
for i in input:
listofinput.append(i)
if(len(listofinput)>1):
listofinput = format(tuple(listofinput))
else:
listofinput='('+listofinput[0]+')'
Related
I need to search a string and check if it contains numbers in its name. If it does, I want to replace it with nothing. I've started doing something like this but I didn't find a solution for my problem.
table = "table1"
if any(chr.isdigit() for chr in table) == True:
table = table.replace(chr, "_")
print(table)
# The output should be "table"
Any ideas?
You could do this in many different ways. Here's how it could be done with the re module:
import re
table = 'table1'
table = re.sub('\d+', '', table)
This sound like task for .translate method of str, you could do
table = "table1"
table = table.translate("".maketrans("","","0123456789"))
print(table) # table
2 first arguments of maketrans are for replacement character-for-character, as it we do not need this we use empty strs, third (optional) argument is characters to remove.
If you dont want to import any modules you could try:
table = "".join([i for i in table if not i.isdigit()])
table = "table123"
for i in table:
if i.isdigit():
table = table.replace(i, "")
print(table)
I found this works to remove numbers quickly.
table = "table1"
table_temp =""
for i in table:
if i not in "0123456789":
table_temp +=i
print(table_temp)
char_nums = [chr for chr in table if chr.isdigit()]
for i in char_nums:
table = table.replace(i, "")
print(table)
Sorry if this is basic. I have a function that get as an argument one of these 2 :
subjects ['ALL']. //search for any subject
subjects ['A','B','C'] //only one of these
So in my function I need to query according to subjects argument
def function (subjects):
query = ('''
SELECT date_num, subject, in_col
FROM base
WHERE subject in {subjects} // = subject in ('A','B','C') works, but what about ALL ?
''').format(subjects=subjects)
so when the subjects to be found are a,b,c there is no problem, but how can I tell it to search for ALL subjects in the case argument is ALL ?
Could I send * instead ? like subjects[*] ?
(I do turn the ['a','b'] into ('a','b') in my code )
subjects = ['ALL']
# subjects = ['A', 'B']
where_clause = f"WHERE subject in {tuple(subjects)}" if subjects[0] != "ALL" else ""
query = f'''SELECT date_num, subject, in_col FROM base {where_clause}'''
Try uncommenting the second line to check how the query modifies
Create a generic query and add the where clause if needed
query = "SELECT date_num, subject, in_col FROM base"
if subjects[0] != "ALL":
query += f" WHERE subject in {subjects}"
i got a list of ids and i want to use this list as parameter like this:
list_id = ['4833', '43443', '431431']
qry = f"""SELECT
c.nm_cnae as Nome_CNAE
FROM cnae as c
WHERE c.cod_cnae = '{list_id}'"""
resultado_busca = cosmosengine.query('cnae', qry)
resultado_busca = list(resultado_busca)
how should i do this works?
I'm using azure cosmosdb
Assuming you want to find all the ones that have an id in the list it would be:
list_id = ['4833', '43443', '431431']
qry = f"""SELECT
c.nm_cnae as Nome_CNAE
FROM cnae as c
WHERE c.cod_cnae IN ({','.join(list_id)})"""
resultado_busca = cosmosengine.query('cnae', qry)
resultado_busca = list(resultado_busca)
This uses the IN operator in sql: https://www.w3schools.com/sql/sql_in.asp
','.join(list_id) creates a string where each value is separated by a comma.
I'm looking to take an array list and attach it to a string.
Python 2.7.10, Windows 10
The list is loaded from a mySQL table and the output is this:
skuArray = [('000381001238',) ('000381001238',) ('000381001238',) ('FA200513652',) ('000614400967',)]
I'm wanting to take this list and attach it to a separate query
the problem:
query = "SELECT ItemLookupCode,Description, Quantity, Price, LastReceived "
query = query+"FROM Item "
query = query+"WHERE ItemLookupCode IN ("+skuArray+") "
query = query+"ORDER BY LastReceived ASC;"
I get the error:
TypeError: cannot concatenate 'str' and 'tuple' objects
My guess here is that I need to format the string as:
'000381001238', '000381001238', '000381001238', 'FA200513652','000614400967'
Ultimately the string needs to read:
query = query+"WHERE ItemLookupCode IN ('000381001238', '000381001238', '000381001238', 'FA200513652','000614400967') "
I have tried the following:
skuArray = ''.join(skuArray.split('(', 1))
skuArray = ''.join(skuArray.split(')', 1))
Second Try:
skus = [sku[0] for sku in skuArray]
stubs = ','.join(["'?'"]*len(skuArray))
msconn = pymssql.connect(host=r'*', user=r'*', password=r'*', database=r'*')
cur = msconn.cursor()
query ='''
SELECT ItemLookupCode,Description, Quantity, Price, LastReceived
FROM Item
WHERE ItemLookupCode IN { sku_params }
ORDER BY LastReceived ASC;'''.format(sku_params = stubs)
cur.execute(query, params=skus)
row = cur.fetchone()
print row[3]
cur.close()
msconn.close()
Thanks in advance for your help!
If you want to do the straight inline SQL you could use a list comprehension:
', '.join(["'{}'}.format(sku[0]) for sku in skuArray])
Note: You need to add commas between tuples (based on example)
That said, if you want to do some sql, I would encourage you to parameterize your request with ?
Here is an example of how you would do something like that:
skuArray = [('000381001238',), ('000381001238',), ('000381001238',), ('FA200513652',), ('000614400967',)]
skus = [sku[0] for sku in skuArray]
stubs = ','.join(["'?'"]*len(skuArray))
qry = '''
SELECT ItemLookupCode,Description, Quantity, Price, LastReceived
FROM Item
WHERE ItemLookupCode IN ({ sku_params })
ORDER BY LastReceived ASC;'''.format(sku_params = stubs)
#assuming pyodbc connection syntax may be off
conn.execute(qry, params=skus)
Why?
Non-parameterized queries are a bad idea because it leaves you vulnerable to sql injection and is easy to avoid.
Assuming that skuArray is a list, like this:
>>> skuArray = [('000381001238',), ('000381001238',), ('000381001238',), ('FA200513652',), ('000614400967',)]
You can format your string like this:
>>> ', '.join(["'{}'".format(x[0]) for x in skuArray])
"'000381001238', '000381001238', '000381001238', 'FA200513652', '000614400967'"
So let assume we have such simple query:
Select a.col1, b.col2 from tb1 as a inner join tb2 as b on tb1.col7 = tb2.col8;
The result should looks this way:
tb1 col1
tb1 col7
tb2 col2
tb2 col8
I've tried to solve this problem using some python library:
1) Even extracting only tables using sqlparse might be a huge problem. For example this official book doesn't work properly at all.
2) Using regular expression seems to be really hard to achieve.
3) But then I found this , that might help. However the problem is that I can't connect to any database and execute that query.
Any ideas?
sql-metadata is a Python library that uses a tokenized query returned by python-sqlparse and generates query metadata.
This metadata can return column and table names from your supplied SQL query. Here are a couple of example from the sql-metadata github readme:
>>> sql_metadata.get_query_columns("SELECT test, id FROM foo, bar")
[u'test', u'id']
>>> sql_metadata.get_query_tables("SELECT test, id FROM foo, bar")
[u'foo', u'bar']
>>> sql_metadata.get_query_limit_and_offset('SELECT foo_limit FROM bar_offset LIMIT 50 OFFSET 1000')
(50, 1000)
A hosted version of the library exists at sql-app.infocruncher.com to see if it works for you.
Really, this is no easy task. You could use a lexer (ply in this example) and define several rules to get several tokens out of a string. The following code defines these rules for the different parts of your SQL string and puts them back together as there could be aliases in the input string. As a result, you get a dictionary (result) with the different tablenames as key.
import ply.lex as lex, re
tokens = (
"TABLE",
"JOIN",
"COLUMN",
"TRASH"
)
tables = {"tables": {}, "alias": {}}
columns = []
t_TRASH = r"Select|on|=|;|\s+|,|\t|\r"
def t_TABLE(t):
r"from\s(\w+)\sas\s(\w+)"
regex = re.compile(t_TABLE.__doc__)
m = regex.search(t.value)
if m is not None:
tbl = m.group(1)
alias = m.group(2)
tables["tables"][tbl] = ""
tables["alias"][alias] = tbl
return t
def t_JOIN(t):
r"inner\s+join\s+(\w+)\s+as\s+(\w+)"
regex = re.compile(t_JOIN.__doc__)
m = regex.search(t.value)
if m is not None:
tbl = m.group(1)
alias = m.group(2)
tables["tables"][tbl] = ""
tables["alias"][alias] = tbl
return t
def t_COLUMN(t):
r"(\w+\.\w+)"
regex = re.compile(t_COLUMN.__doc__)
m = regex.search(t.value)
if m is not None:
t.value = m.group(1)
columns.append(t.value)
return t
def t_error(t):
raise TypeError("Unknown text '%s'" % (t.value,))
t.lexer.skip(len(t.value))
# here is where the magic starts
def mylex(inp):
lexer = lex.lex()
lexer.input(inp)
for token in lexer:
pass
result = {}
for col in columns:
tbl, c = col.split('.')
if tbl in tables["alias"].keys():
key = tables["alias"][tbl]
else:
key = tbl
if key in result:
result[key].append(c)
else:
result[key] = list()
result[key].append(c)
print result
# {'tb1': ['col1', 'col7'], 'tb2': ['col2', 'col8']}
string = "Select a.col1, b.col2 from tb1 as a inner join tb2 as b on tb1.col7 = tb2.col8;"
mylex(string)
moz-sql-parser is a python library to convert some subset of SQL-92 queries to JSON-izable parse trees. Maybe it what you want.
Here is an example.
>>> parse("SELECT id,name FROM dual WHERE id>3 and id<10 ORDER BY name")
{'select': [{'value': 'id'}, {'value': 'name'}], 'from': 'dual', 'where': {'and': [{'gt': ['id', 3]}, {'lt': ['id', 10]}]}, 'orderby': {'value': 'name'}}
I am tackling a similar problem and found a simpler solution and it seems to work well.
import re
def tables_in_query(sql_str):
# remove the /* */ comments
q = re.sub(r"/\*[^*]*\*+(?:[^*/][^*]*\*+)*/", "", sql_str)
# remove whole line -- and # comments
lines = [line for line in q.splitlines() if not re.match("^\s*(--|#)", line)]
# remove trailing -- and # comments
q = " ".join([re.split("--|#", line)[0] for line in lines])
# split on blanks, parens and semicolons
tokens = re.split(r"[\s)(;]+", q)
# scan the tokens. if we see a FROM or JOIN, we set the get_next
# flag, and grab the next one (unless it's SELECT).
tables = set()
get_next = False
for tok in tokens:
if get_next:
if tok.lower() not in ["", "select"]:
tables.add(tok)
get_next = False
get_next = tok.lower() in ["from", "join"]
dictTables = dict()
for table in tables:
fields = []
for token in tokens:
if token.startswith(table):
if token != table:
fields.append(token)
if len(list(set(fields))) >= 1:
dictTables[table] = list(set(fields))
return dictTables
code adapted from https://grisha.org/blog/2016/11/14/table-names-from-sql/
Create a list of all the tables that are present in the DB. You can then search each table name in the queries.
This obviously isn't foolproof and the code will break in case any column/alias name matches the table name.
But it can be done as a workaround.
import pandas as pd
#%config PPMagics.autolimit=0
#txt = """<your SQL text here>"""
txt_1 = txt
replace_list = ['\n', '(', ')', '*', '=','-',';','/','.']
count = 0
for i in replace_list:
txt_1 = txt_1.replace(i, ' ')
txt_1 = txt_1.split()
res = []
for i in range(1, len(txt_1)):
if txt_1[i-1].lower() in ['from', 'join','table'] and txt_1[i].lower() != 'select':
count +=1
str_count = str(count)
res.append(txt_1[i] + "." + txt_1[i+1])
#df.head()
res_l = res
f_res_l = []
for i in range(0,len(res_l)):
if len(res_l[i]) > 15 : # change it to 0 is you want all the caught strings
f_res_l.append(res_l[i])
else :
pass
All_Table_List = f_res_l
print("All the unique tables from the SQL text, in the order of their appearence in the code : \n",100*'*')
df = pd.DataFrame(All_Table_List,columns=['Tables_Names'])
df.reset_index(level=0, inplace=True)
list_=list(df["Tables_Names"].unique())
df_1_Final = pd.DataFrame(list_,columns=['Tables_Names'])
df_1_Final.reset_index(level=0, inplace=True)
df_1_Final
Unfortunately, in order to do this successfully for "complex SQL" queries, you will more or less have to implement a complete parser for the particular database engine you are using.
As an example, consider this very basic complex query:
WITH a AS (
SELECT col1 AS c FROM b
)
SELECT c FROM a
In this case, a is not a table but a common table expression (CTE), and should be excluded from your output. There's no simple way of using regexp:es to realize that b is a table access but a is not - your code will really have to understand the SQL at a deeper level.
Also consider
SELECT * FROM tbl
You'd have to know the column names actually present in a particular instance of a database (and accessible to a particular user, too) to answer that correctly.
If by "works with complex SQL" you mean that it must work with any valid SQL statement, you also need to specify for which SQL dialect - or implement dialect-specific solutions. A solution which works with any SQL handled by a database that does not implement CTE:s would not work in one that does.
I am sorry to say so, but I do not think you will find a complete solution which works for arbitrarily complex SQL queries. You'll have to settle for a solution which works with a subset of a particular SQL-dialect.
For my simple use case (one table in query, no joins), I used the following tweak
lst = "select * from table".split(" ")
lst = [item for item in lst if len(item)>0]
table_name = lst[lst.index("from")+1]