How to replace a number in a string in Python? - python

I need to search a string and check if it contains numbers in its name. If it does, I want to replace it with nothing. I've started doing something like this but I didn't find a solution for my problem.
table = "table1"
if any(chr.isdigit() for chr in table) == True:
table = table.replace(chr, "_")
print(table)
# The output should be "table"
Any ideas?

You could do this in many different ways. Here's how it could be done with the re module:
import re
table = 'table1'
table = re.sub('\d+', '', table)

This sound like task for .translate method of str, you could do
table = "table1"
table = table.translate("".maketrans("","","0123456789"))
print(table) # table
2 first arguments of maketrans are for replacement character-for-character, as it we do not need this we use empty strs, third (optional) argument is characters to remove.

If you dont want to import any modules you could try:
table = "".join([i for i in table if not i.isdigit()])

table = "table123"
for i in table:
if i.isdigit():
table = table.replace(i, "")
print(table)

I found this works to remove numbers quickly.
table = "table1"
table_temp =""
for i in table:
if i not in "0123456789":
table_temp +=i
print(table_temp)

char_nums = [chr for chr in table if chr.isdigit()]
for i in char_nums:
table = table.replace(i, "")
print(table)

Related

SQL query format

I have a list of string that I need to pass to an sql query.
listofinput = []
for i in input:
listofinput.append(i)
if(len(listofinput)>1):
listofinput = format(tuple(listofinput))
sql_query = f"""SELECT * FROM countries
where
name in {listofinput};
"""
This works when I have a list, but in case of just one value it fails.
as listofinput = ['USA'] for one value
but listofinput ('USA', 'Germany') for multiple
also I need to do this for thousands of input, what is the best optimized way to achieve the same. name in my table countries is an indexed column
You can just convert to tuple and then if the second last character is a coma, remove it.
listofinput = format(tuple(input))
if listofinput[-2] == ",":
listofinput = f"{listofinput[:-2]})"
sql_query = f"""SELECT * FROM countries
where name in {listofinput};"""
Change if(len(listofinput)>1): to if(len(listofinput)>=1):
This might work.
Remove condition if(len(listofinput)>1) .
Because if you don't convert to tuple your query should be like this:
... where name in ['USA']
or
... where name in []
and in [...] not acceptable in SQL, only in (...) is acceptable.
You can remove format() too:
listofinput = tuple(listofinput)
Final Code:
listofinput = []
for i in input:
listofinput.append(i)
listofinput = tuple(listofinput)
sql_query = f"""SELECT * FROM countries
WHERE
name IN {listofinput};
"""
Yes the tuple with one element will required a ","
To circumvent your problem, maybe you can use string instead by just changing your code to the below:
listofinput = []
for i in input:
listofinput.append(i)
if(len(listofinput)>1):
listofinput = format(tuple(listofinput))
else:
listofinput='('+listofinput[0]+')'

Retrieving text from sqlite without string formatting for comparison

Saving a string into a sqlite table, retrieving it again and comparing it to the original requires some filters to work and i dont know why exactly.
tl;dr
How can i retrieve string Data from the SQLITE DB without requiring Filter Nr 3 as its dangerous for more complex strings ?
import sqlite3
RAWSTRING = 'This is a DB Teststing'
# create database and table
currentdb = sqlite3.connect('test.db')
currentdb.execute('''CREATE TABLE tickertable (teststring text)''')
# enter RAWSTRING into databasse
currentdb.execute('''INSERT INTO tickertable VALUES(?);''', (RAWSTRING,))
# get RAWSTRING from database
cursorObj = currentdb.cursor()
cursorObj.execute('SELECT * FROM tickertable')
DB_RAWSTRING = cursorObj.fetchall()
currentdb.commit()
currentdb.close()
# Prints This is a DB Teststing
print('originalstring : ', RAWSTRING)
# Prints [('This is a DB Teststing',)]
print('retrieved from DB: ', DB_RAWSTRING)
# Get first entry from List because fetchall gives a list
FILTER1_DB_RAWSTRING = DB_RAWSTRING[0]
# Convert the Listelement to String because its still a listelement and comparing fails to string
FILTER2_DB_RAWSTRING = str(FILTER1_DB_RAWSTRING)
# Remove annoying db extra characters and i dont know why they exist anyway
FILTER3_DB_RAWSTRING = FILTER2_DB_RAWSTRING.replace("'", "").replace("(", "").replace(")", "").replace(",", "")
if RAWSTRING == FILTER3_DB_RAWSTRING:
print('Strings are the same as they should')
else:
print('String are not the same because of db weirdness')
So here's your problem: fetchall returns a list of tuples. This means that casting them to a string puts pesky parenthesis around each row and commas between each element of each row. If you'd like to retrieve the raw information from each column, that can be done by indexing the tuples:
entries = cursorObj.fetchall()
first_row = entries[0]
first_item = first_row[0]
print(first_item)
This ought to print just the content of the first row and column in the DB. If not, let me know!
David

Populating Insert Statements from a local file

I am trying to write user data from a file into a series of insert statements. I feel I am close but just missing one or two things. I am attempting to run a .format, but all I end up with are ?'s
import time, json, sqlite3
def insertsfromfile(file):
results = open(file).readlines()
output = open('UserINSERTFile.txt', 'w')
for rows in results:
jsonobject = json.loads(rows)
userid = jsonobject['user']['id']
name = jsonobject['user']['name']
screenname = jsonobject['user']['screen_name']
description = jsonobject['user']['description']
friendscount = jsonobject['user']['friends_count']
insert = ('INSERT INTO Users VALUES (?,?,?,?,?'.format(userid, name, screenname,description, friendscount)
insert = insert[:-1] + ''
output.write(insert)
output.close()
Thanks
I figured it out after reviewing it. Essentially I was missing that I had to combine the attributes together with my Insert string with the '+'. Also had to convert the variables to str() in case they were int.

How to extract table names and column names from sql query?

So let assume we have such simple query:
Select a.col1, b.col2 from tb1 as a inner join tb2 as b on tb1.col7 = tb2.col8;
The result should looks this way:
tb1 col1
tb1 col7
tb2 col2
tb2 col8
I've tried to solve this problem using some python library:
1) Even extracting only tables using sqlparse might be a huge problem. For example this official book doesn't work properly at all.
2) Using regular expression seems to be really hard to achieve.
3) But then I found this , that might help. However the problem is that I can't connect to any database and execute that query.
Any ideas?
sql-metadata is a Python library that uses a tokenized query returned by python-sqlparse and generates query metadata.
This metadata can return column and table names from your supplied SQL query. Here are a couple of example from the sql-metadata github readme:
>>> sql_metadata.get_query_columns("SELECT test, id FROM foo, bar")
[u'test', u'id']
>>> sql_metadata.get_query_tables("SELECT test, id FROM foo, bar")
[u'foo', u'bar']
>>> sql_metadata.get_query_limit_and_offset('SELECT foo_limit FROM bar_offset LIMIT 50 OFFSET 1000')
(50, 1000)
A hosted version of the library exists at sql-app.infocruncher.com to see if it works for you.
Really, this is no easy task. You could use a lexer (ply in this example) and define several rules to get several tokens out of a string. The following code defines these rules for the different parts of your SQL string and puts them back together as there could be aliases in the input string. As a result, you get a dictionary (result) with the different tablenames as key.
import ply.lex as lex, re
tokens = (
"TABLE",
"JOIN",
"COLUMN",
"TRASH"
)
tables = {"tables": {}, "alias": {}}
columns = []
t_TRASH = r"Select|on|=|;|\s+|,|\t|\r"
def t_TABLE(t):
r"from\s(\w+)\sas\s(\w+)"
regex = re.compile(t_TABLE.__doc__)
m = regex.search(t.value)
if m is not None:
tbl = m.group(1)
alias = m.group(2)
tables["tables"][tbl] = ""
tables["alias"][alias] = tbl
return t
def t_JOIN(t):
r"inner\s+join\s+(\w+)\s+as\s+(\w+)"
regex = re.compile(t_JOIN.__doc__)
m = regex.search(t.value)
if m is not None:
tbl = m.group(1)
alias = m.group(2)
tables["tables"][tbl] = ""
tables["alias"][alias] = tbl
return t
def t_COLUMN(t):
r"(\w+\.\w+)"
regex = re.compile(t_COLUMN.__doc__)
m = regex.search(t.value)
if m is not None:
t.value = m.group(1)
columns.append(t.value)
return t
def t_error(t):
raise TypeError("Unknown text '%s'" % (t.value,))
t.lexer.skip(len(t.value))
# here is where the magic starts
def mylex(inp):
lexer = lex.lex()
lexer.input(inp)
for token in lexer:
pass
result = {}
for col in columns:
tbl, c = col.split('.')
if tbl in tables["alias"].keys():
key = tables["alias"][tbl]
else:
key = tbl
if key in result:
result[key].append(c)
else:
result[key] = list()
result[key].append(c)
print result
# {'tb1': ['col1', 'col7'], 'tb2': ['col2', 'col8']}
string = "Select a.col1, b.col2 from tb1 as a inner join tb2 as b on tb1.col7 = tb2.col8;"
mylex(string)
moz-sql-parser is a python library to convert some subset of SQL-92 queries to JSON-izable parse trees. Maybe it what you want.
Here is an example.
>>> parse("SELECT id,name FROM dual WHERE id>3 and id<10 ORDER BY name")
{'select': [{'value': 'id'}, {'value': 'name'}], 'from': 'dual', 'where': {'and': [{'gt': ['id', 3]}, {'lt': ['id', 10]}]}, 'orderby': {'value': 'name'}}
I am tackling a similar problem and found a simpler solution and it seems to work well.
import re
def tables_in_query(sql_str):
# remove the /* */ comments
q = re.sub(r"/\*[^*]*\*+(?:[^*/][^*]*\*+)*/", "", sql_str)
# remove whole line -- and # comments
lines = [line for line in q.splitlines() if not re.match("^\s*(--|#)", line)]
# remove trailing -- and # comments
q = " ".join([re.split("--|#", line)[0] for line in lines])
# split on blanks, parens and semicolons
tokens = re.split(r"[\s)(;]+", q)
# scan the tokens. if we see a FROM or JOIN, we set the get_next
# flag, and grab the next one (unless it's SELECT).
tables = set()
get_next = False
for tok in tokens:
if get_next:
if tok.lower() not in ["", "select"]:
tables.add(tok)
get_next = False
get_next = tok.lower() in ["from", "join"]
dictTables = dict()
for table in tables:
fields = []
for token in tokens:
if token.startswith(table):
if token != table:
fields.append(token)
if len(list(set(fields))) >= 1:
dictTables[table] = list(set(fields))
return dictTables
code adapted from https://grisha.org/blog/2016/11/14/table-names-from-sql/
Create a list of all the tables that are present in the DB. You can then search each table name in the queries.
This obviously isn't foolproof and the code will break in case any column/alias name matches the table name.
But it can be done as a workaround.
import pandas as pd
#%config PPMagics.autolimit=0
#txt = """<your SQL text here>"""
txt_1 = txt
replace_list = ['\n', '(', ')', '*', '=','-',';','/','.']
count = 0
for i in replace_list:
txt_1 = txt_1.replace(i, ' ')
txt_1 = txt_1.split()
res = []
for i in range(1, len(txt_1)):
if txt_1[i-1].lower() in ['from', 'join','table'] and txt_1[i].lower() != 'select':
count +=1
str_count = str(count)
res.append(txt_1[i] + "." + txt_1[i+1])
#df.head()
res_l = res
f_res_l = []
for i in range(0,len(res_l)):
if len(res_l[i]) > 15 : # change it to 0 is you want all the caught strings
f_res_l.append(res_l[i])
else :
pass
All_Table_List = f_res_l
print("All the unique tables from the SQL text, in the order of their appearence in the code : \n",100*'*')
df = pd.DataFrame(All_Table_List,columns=['Tables_Names'])
df.reset_index(level=0, inplace=True)
list_=list(df["Tables_Names"].unique())
df_1_Final = pd.DataFrame(list_,columns=['Tables_Names'])
df_1_Final.reset_index(level=0, inplace=True)
df_1_Final
Unfortunately, in order to do this successfully for "complex SQL" queries, you will more or less have to implement a complete parser for the particular database engine you are using.
As an example, consider this very basic complex query:
WITH a AS (
SELECT col1 AS c FROM b
)
SELECT c FROM a
In this case, a is not a table but a common table expression (CTE), and should be excluded from your output. There's no simple way of using regexp:es to realize that b is a table access but a is not - your code will really have to understand the SQL at a deeper level.
Also consider
SELECT * FROM tbl
You'd have to know the column names actually present in a particular instance of a database (and accessible to a particular user, too) to answer that correctly.
If by "works with complex SQL" you mean that it must work with any valid SQL statement, you also need to specify for which SQL dialect - or implement dialect-specific solutions. A solution which works with any SQL handled by a database that does not implement CTE:s would not work in one that does.
I am sorry to say so, but I do not think you will find a complete solution which works for arbitrarily complex SQL queries. You'll have to settle for a solution which works with a subset of a particular SQL-dialect.
For my simple use case (one table in query, no joins), I used the following tweak
lst = "select * from table".split(" ")
lst = [item for item in lst if len(item)>0]
table_name = lst[lst.index("from")+1]

remove last comma from dictionary/list

I am trying to python to generate a script that generates unload command in redshift. I not an expert Python programmer. I need to where I can generate all columns for the unload list. If the column is of specific name, I need to replace with a function. The challenge I am facing is it appending "," to last item in the dictionary. Is there a way I can avoid the last comma? Any help would be appreciated.
import psycopg2 from psycopg2.extras
import RealDictCursor
try:
conn = psycopg2.connect("dbname='test' port='5439' user='scott' host='something.redshift.amazonaws.com' password='tiger'");
except:
print "Unable to connect to the database"
conn.cursor_factory = RealDictCursor
cur = conn.cursor()
conn.set_isolation_level( psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT )
try:
cur.execute("SELECT column from pg_table_def where schema_name='myschema' and table_name ='tab1' " );
except:
print "Unable to execute select statement from the database!"
result = cur.fetchall()
print "unload mychema.tab1 (select "
for row in result:
for key,value in row.items():
print "%s,"%(value)
print ") AWS Credentials here on..."
conn.close()
Use the join function on the list of values in each row:
print ",".join(row.values())
Briefly, the join function is called on a string which we can think of as the "glue", and takes a list of "pieces" as its argument. The result is a string of "pieces" "held together" by the "glue". Example:
>>> glue = "+"
>>> pieces = ["a", "b", "c"]
>>> glue.join(pieces)
"a+b+c"
(since row.values() returns a list, you don't actually need the comprehension, so it's even simpler than I wrote it at first)
Infact, this worked better.
columns = []
for row in result:
if (row['column_name'] == 'mycol1') or (row['column_name'] == 'mycol2') :
columns.append("func_sha1(" + row['column_name'] + "||" + salt +")")
else:
columns.append(row['column_name'])
print selstr + ",".join(columns)+" ) TO s3://"
Thanks for your help, Jon

Categories

Resources