I am passing data to a sql statement in python, the code part is like this:
foo.execute("select test_dat_id from tbl_dat_test where sample_id =?",(n_value,))
myfoo = foo.fetchall()
zoo = "select result_value from tbl_dat_analyte where test_dat_id ="
for new in myfoo:
new1=str(new)
new2=float(new1)
var = zoo + new2
print(var)
foo.execute(var)
To make the long story short, myfoo is sql row, and i converted its entries to string, this is a number mainly with a space and brackets, (Like this: (964005, ))
simply i want it to be converted to integer so it can pass to the sql statment, i believe there is easier ways to do so, but i really can't get it, thanks.
A good way would be to use regular expressions:
import re
foo = "(100) "
foo = re.sub("[^0-9]", "", foo)
foo = int(foo)
This will remove all non-numeric characters, and convert the string to int.
If I've understood your need correctly here's what you need to do
string_with_number_and_whitespaces = '(100) ' # Also contains brackets
string_with_number_only = string_with_number_and_whitespaces.replace(' ', '').replace('(', '').replace(')', '')
number = int(string_with_number_only)
print(number) # Output 100
.fetchall() returns a list of pyodbc.Row objects. If you want the first (and only) element in the row just extract it using index [0]
>>> myfoo = [(1,),(2,)]
>>> for new in myfoo:
... myint = new[0]
... print(myint * 10)
...
10
20
>>>
Thank you very much for all your valuable answers, i am going to give them a try one by one, actually before i get your feed back, i got it working like this (I know its a primitive solution but its worked):
foo.execute("select test_dat_id from tbl_dat_test where sample_id =?",(n_value,))
myfoo = foo.fetchall()
myfoo2 = dict(enumerate(item[0:100] for item in myfoo))
v_value = list(myfoo2.values())[0:100]
zoo = "select result_value from tbl_dat_analyte where test_dat_id = "
for new in v_value:
new2=str(new)
new2=new2.replace(" ", "")
new2=new2.replace("(", "")
new2=new2.replace(",)", "")
var = zoo + new2
print(var)
foo100 = cnx.cursor()
foo100.execute(var)
myzoo = foo100.fetchall()
print('zoo is:')
print(myzoo)
c5a = my_sheet.cell(row= 21 , column = 3)
c5a.value = myzoo
Related
I have a list of string that I need to pass to an sql query.
listofinput = []
for i in input:
listofinput.append(i)
if(len(listofinput)>1):
listofinput = format(tuple(listofinput))
sql_query = f"""SELECT * FROM countries
where
name in {listofinput};
"""
This works when I have a list, but in case of just one value it fails.
as listofinput = ['USA'] for one value
but listofinput ('USA', 'Germany') for multiple
also I need to do this for thousands of input, what is the best optimized way to achieve the same. name in my table countries is an indexed column
You can just convert to tuple and then if the second last character is a coma, remove it.
listofinput = format(tuple(input))
if listofinput[-2] == ",":
listofinput = f"{listofinput[:-2]})"
sql_query = f"""SELECT * FROM countries
where name in {listofinput};"""
Change if(len(listofinput)>1): to if(len(listofinput)>=1):
This might work.
Remove condition if(len(listofinput)>1) .
Because if you don't convert to tuple your query should be like this:
... where name in ['USA']
or
... where name in []
and in [...] not acceptable in SQL, only in (...) is acceptable.
You can remove format() too:
listofinput = tuple(listofinput)
Final Code:
listofinput = []
for i in input:
listofinput.append(i)
listofinput = tuple(listofinput)
sql_query = f"""SELECT * FROM countries
WHERE
name IN {listofinput};
"""
Yes the tuple with one element will required a ","
To circumvent your problem, maybe you can use string instead by just changing your code to the below:
listofinput = []
for i in input:
listofinput.append(i)
if(len(listofinput)>1):
listofinput = format(tuple(listofinput))
else:
listofinput='('+listofinput[0]+')'
I have a code similar to this:
s = some_template
s = s.replace('<HOLDER1>',data1)
s = s.replace('<HOLDER2>',data2)
s = s.replace('<HOLDER3>',data3)
... #(about 30 similar lines)
where data1/data2/etc is often a call to a function or a complex expression which might take a while to calculate. for example:
s = some_template
s = s.replace('<HOLDER4>',long_func4(a,b,'some_flag') if c==1 else '')
s = s.replace('<HOLDER5>',long_func5(d,e).replace('.',''))
s = s.replace('<HOLDER6>',self.attr6)
s = s.replace('<HOLDER7>',f'{self.name}_{get_cur_month()}')
... #(about 30 similar lines)
in order to save on runtime, i want the string.replace() method to calculate the new value only if the old value is found in str. this can be achieved by:
if '<HOLDER1>' in s:
s = s.replace('<HOLDER1>',data1)
if '<HOLDER2>' in s:
s = s.replace('<HOLDER2>',data2)
if '<HOLDER3>' in s:
s = s.replace('<HOLDER3>',data3)
...
but i don't like this solutions because it takes double the number of lines of code which will be really messy and also finds the old value in s twice for each holder..
any ideas?
Thanks!
str is immutable. You can't change it, only creating a new instance is allowed.
You could do something like:
def replace_many(replacements, s):
for pattern, replacement in replacements:
s = s.replace(pattern, replacement)
return s
without_replacements = 'this_will_be_replaced, will it?'
replacements = [('this_will_be_replaced', 'by_this')]
with_replacements = replace_many(replacements, without_replacements)
You can easily make it lazy:
def replace_many_lazy(replacements, s):
for pattern, replacement_func in replacements:
if pattern in s:
s = s.replace(pattern, replacement_func())
return s
without_replacements = 'this_will_be_replaced, will it?'
replacements = [('this_will_be_replaced', lambda: 'by_this')]
with_replacements = replace_many_lazy(replacements, without_replacements)
...now you don't do the expensive computation unless necessary.
i have tried to use the .replace or .strip method but have been unsuccessful with doing such. I am trying to print out a single stringed list separated by commas.
does anyone know a way to make it so it us printing out with no [] or single quotes ''
def get_format(header1):
format_lookup = "SELECT ID, FormatName, HeaderRow, ColumnStatus, ColumnMobileID, ColumnVendorID, ColumnTechID, " \
"ColumnCallType, ColumnCallDate, ColumnCallTime, ColumnCallTo, ColumnQty, ColumnQtyLabel " \
"from dynamic_format WHERE HeaderRow=%s"
header1 = (str(header1),)
cursor = connection.cursor()
cursor.execute(format_lookup, header1)
record = cursor.fetchone()
return record
I suppose I'll post my comment as an answer:
In [1]: header1 = ['ESN', 'ORIGINAL_QUANTITY', 'INVOICE_DATE']
In [2]: ", ".join(header1)
Out[2]: 'ESN, ORIGINAL_QUANTITY, INVOICE_DATE'
In [3]: print(", ".join(header1))
ESN, ORIGINAL_QUANTITY, INVOICE_DATE
The reason you're getting those errors is because header1 is a list object and .replace() is a string method.
#sbabtizied's answer is what you'd use if header1 was a string:
# a single string, what sbabti assumed you had
"['ESN', 'ORIGINAL_QUANTITY', 'INVOICE_DATE']"
# a list of strings, what you actually have
['ESN', 'ORIGINAL_QUANTITY', 'INVOICE_DATE']
I've encountered the following problem: I want to fill a string with variables in the following way:
myList = [ 1000, 2000 ]
extra = "$"
myString = "{listItem}{extra} {listItem}{extra}".format(listItem = *myList, extra = extra)
I'm encountering the error Invalid Syntax at the * operator. I guess it is because format is taking two arguments instead of one. If I remove the {extra}-tag completely and only the name listItem like so
myString = "{} {}".format(*myList)
the code works. What do I need to change?
Use a generator to format ieach list element, and the join them together.
myString = " ".join(f"{listItem}{extra}" for listItem in myList)
So let assume we have such simple query:
Select a.col1, b.col2 from tb1 as a inner join tb2 as b on tb1.col7 = tb2.col8;
The result should looks this way:
tb1 col1
tb1 col7
tb2 col2
tb2 col8
I've tried to solve this problem using some python library:
1) Even extracting only tables using sqlparse might be a huge problem. For example this official book doesn't work properly at all.
2) Using regular expression seems to be really hard to achieve.
3) But then I found this , that might help. However the problem is that I can't connect to any database and execute that query.
Any ideas?
sql-metadata is a Python library that uses a tokenized query returned by python-sqlparse and generates query metadata.
This metadata can return column and table names from your supplied SQL query. Here are a couple of example from the sql-metadata github readme:
>>> sql_metadata.get_query_columns("SELECT test, id FROM foo, bar")
[u'test', u'id']
>>> sql_metadata.get_query_tables("SELECT test, id FROM foo, bar")
[u'foo', u'bar']
>>> sql_metadata.get_query_limit_and_offset('SELECT foo_limit FROM bar_offset LIMIT 50 OFFSET 1000')
(50, 1000)
A hosted version of the library exists at sql-app.infocruncher.com to see if it works for you.
Really, this is no easy task. You could use a lexer (ply in this example) and define several rules to get several tokens out of a string. The following code defines these rules for the different parts of your SQL string and puts them back together as there could be aliases in the input string. As a result, you get a dictionary (result) with the different tablenames as key.
import ply.lex as lex, re
tokens = (
"TABLE",
"JOIN",
"COLUMN",
"TRASH"
)
tables = {"tables": {}, "alias": {}}
columns = []
t_TRASH = r"Select|on|=|;|\s+|,|\t|\r"
def t_TABLE(t):
r"from\s(\w+)\sas\s(\w+)"
regex = re.compile(t_TABLE.__doc__)
m = regex.search(t.value)
if m is not None:
tbl = m.group(1)
alias = m.group(2)
tables["tables"][tbl] = ""
tables["alias"][alias] = tbl
return t
def t_JOIN(t):
r"inner\s+join\s+(\w+)\s+as\s+(\w+)"
regex = re.compile(t_JOIN.__doc__)
m = regex.search(t.value)
if m is not None:
tbl = m.group(1)
alias = m.group(2)
tables["tables"][tbl] = ""
tables["alias"][alias] = tbl
return t
def t_COLUMN(t):
r"(\w+\.\w+)"
regex = re.compile(t_COLUMN.__doc__)
m = regex.search(t.value)
if m is not None:
t.value = m.group(1)
columns.append(t.value)
return t
def t_error(t):
raise TypeError("Unknown text '%s'" % (t.value,))
t.lexer.skip(len(t.value))
# here is where the magic starts
def mylex(inp):
lexer = lex.lex()
lexer.input(inp)
for token in lexer:
pass
result = {}
for col in columns:
tbl, c = col.split('.')
if tbl in tables["alias"].keys():
key = tables["alias"][tbl]
else:
key = tbl
if key in result:
result[key].append(c)
else:
result[key] = list()
result[key].append(c)
print result
# {'tb1': ['col1', 'col7'], 'tb2': ['col2', 'col8']}
string = "Select a.col1, b.col2 from tb1 as a inner join tb2 as b on tb1.col7 = tb2.col8;"
mylex(string)
moz-sql-parser is a python library to convert some subset of SQL-92 queries to JSON-izable parse trees. Maybe it what you want.
Here is an example.
>>> parse("SELECT id,name FROM dual WHERE id>3 and id<10 ORDER BY name")
{'select': [{'value': 'id'}, {'value': 'name'}], 'from': 'dual', 'where': {'and': [{'gt': ['id', 3]}, {'lt': ['id', 10]}]}, 'orderby': {'value': 'name'}}
I am tackling a similar problem and found a simpler solution and it seems to work well.
import re
def tables_in_query(sql_str):
# remove the /* */ comments
q = re.sub(r"/\*[^*]*\*+(?:[^*/][^*]*\*+)*/", "", sql_str)
# remove whole line -- and # comments
lines = [line for line in q.splitlines() if not re.match("^\s*(--|#)", line)]
# remove trailing -- and # comments
q = " ".join([re.split("--|#", line)[0] for line in lines])
# split on blanks, parens and semicolons
tokens = re.split(r"[\s)(;]+", q)
# scan the tokens. if we see a FROM or JOIN, we set the get_next
# flag, and grab the next one (unless it's SELECT).
tables = set()
get_next = False
for tok in tokens:
if get_next:
if tok.lower() not in ["", "select"]:
tables.add(tok)
get_next = False
get_next = tok.lower() in ["from", "join"]
dictTables = dict()
for table in tables:
fields = []
for token in tokens:
if token.startswith(table):
if token != table:
fields.append(token)
if len(list(set(fields))) >= 1:
dictTables[table] = list(set(fields))
return dictTables
code adapted from https://grisha.org/blog/2016/11/14/table-names-from-sql/
Create a list of all the tables that are present in the DB. You can then search each table name in the queries.
This obviously isn't foolproof and the code will break in case any column/alias name matches the table name.
But it can be done as a workaround.
import pandas as pd
#%config PPMagics.autolimit=0
#txt = """<your SQL text here>"""
txt_1 = txt
replace_list = ['\n', '(', ')', '*', '=','-',';','/','.']
count = 0
for i in replace_list:
txt_1 = txt_1.replace(i, ' ')
txt_1 = txt_1.split()
res = []
for i in range(1, len(txt_1)):
if txt_1[i-1].lower() in ['from', 'join','table'] and txt_1[i].lower() != 'select':
count +=1
str_count = str(count)
res.append(txt_1[i] + "." + txt_1[i+1])
#df.head()
res_l = res
f_res_l = []
for i in range(0,len(res_l)):
if len(res_l[i]) > 15 : # change it to 0 is you want all the caught strings
f_res_l.append(res_l[i])
else :
pass
All_Table_List = f_res_l
print("All the unique tables from the SQL text, in the order of their appearence in the code : \n",100*'*')
df = pd.DataFrame(All_Table_List,columns=['Tables_Names'])
df.reset_index(level=0, inplace=True)
list_=list(df["Tables_Names"].unique())
df_1_Final = pd.DataFrame(list_,columns=['Tables_Names'])
df_1_Final.reset_index(level=0, inplace=True)
df_1_Final
Unfortunately, in order to do this successfully for "complex SQL" queries, you will more or less have to implement a complete parser for the particular database engine you are using.
As an example, consider this very basic complex query:
WITH a AS (
SELECT col1 AS c FROM b
)
SELECT c FROM a
In this case, a is not a table but a common table expression (CTE), and should be excluded from your output. There's no simple way of using regexp:es to realize that b is a table access but a is not - your code will really have to understand the SQL at a deeper level.
Also consider
SELECT * FROM tbl
You'd have to know the column names actually present in a particular instance of a database (and accessible to a particular user, too) to answer that correctly.
If by "works with complex SQL" you mean that it must work with any valid SQL statement, you also need to specify for which SQL dialect - or implement dialect-specific solutions. A solution which works with any SQL handled by a database that does not implement CTE:s would not work in one that does.
I am sorry to say so, but I do not think you will find a complete solution which works for arbitrarily complex SQL queries. You'll have to settle for a solution which works with a subset of a particular SQL-dialect.
For my simple use case (one table in query, no joins), I used the following tweak
lst = "select * from table".split(" ")
lst = [item for item in lst if len(item)>0]
table_name = lst[lst.index("from")+1]