Inserting data into multiple tables simultaneously in sqlite using python - python

I am trying to read in multiple files to populate multiple tables in sqlite using python. I am trying to do something like this.
def connect_db():
return sqlite3.connect(app.config['DATABASE']) # Flask stuff
# List of tables and corresponding filenames
file_list = ['A','B','C','D']
for file in file_list:
with closing(connect_db()) as db:
cursor = db.execute('select * from ' + file)
# Get column names for the current table
names = tuple(map(lambda x: x[0], cursor.description))
filename = file + '.txt'
with open(filename, 'r') as f:
data = [row.split('\t') for row in f.readlines()]
cur.executemany('INSERT INTO' + file + ' ' + str(names) + ' VALUES ' + ???)
Now I realized that in the executemany() statement above, I need to put the syntax (?,?,?) where ? is the number of columns. I can generate a tuple of ? but then they will have quotes around them. ('?', '?', '?'). Is there a way around this? Thanks.

You can implement a query "maker", that loop over a dict or an object properties and build VALUES string with it.
I did something like that on some project :
d = {
"id":0,
"name":"Foo",
"bar":"foobar"
}
sql = ""
for key, foo in d.iteritems():
# First, copy the current value
bar = copy.copy( foo )
# Test if number
try:
# try to increase the value of 1
bar += 1
except TypeError:
# if we can't, add double quote
foo = '"%s"' % foo
# concat with the previous sequences
sql = "%s%s=%s," % ( sql, key, foo )
# remove the latest comma ...
sql = sql[:-1]
So the value of "sql" at the end of the loop should be something like this
bar="foobar",id=0,name="Foo"

Related

using logical expression in arcpy

I am trying to write a python command that includes a logical query. I have series of tables and layers and would like to run a command on them when two shared variables have the same values. I created a list of those tables and layers:
pattern = '*Assess*.dbf'
files = []
tables = []
layernames = []
## create the loop element
for dirpath, dirnames, filenames in arcpy.da.Walk(inputFolder):
# append shapefile path to list
for filename in fnmatch.filter(filenames, pattern):
files.append(os.path.join(dirpath, filename))
tables = arcpy.ListTables()
for file in files:
names.append("\"TOWN_ID\" = " + str(int(os.path.basename(file).rstrip(os.path.splitext(file)[1])[1:4])))
layernames.append(str(int(os.path.basename(file).rstrip(os.path.splitext(file)[1])[1:4])))
The code is as followed:
for layername, table in zip(layernames, tables):
#print layername
#print table
#CF_QueryTool = "C:\\ArcGIS_Temp\\parcel_mosaic\\land_parcels_mosaic.gdb\\cf_QueryTool"
tablelist = [table, "TaxParcels_Layer.{}".format(layername)]
wherecluase = "TaxParcels_Layer.{}.LOC_ID = table.LOC_ID".format(layername)
print tablelist
print wherecluase
arcpy.MakeQueryTable_management(tablelist, "QueryTable.{}".format(layername), "ADD_VIRTUAL_KEY_FIELD", "", "" ,wherecluase)
print "QueryTable.{} created".format(layername)
The problem is, apparently there is something wrong with my wherecluase which made me wonder if I can actually use something like table.locid using a list of tables. Let me know if I'm making a mistake here. Thanks a lot!
Since "table" is both a SQL reserved keyword and invalid name within a file geodatabase (*.gdb), I assume you want the value of the variable table in your SQL WHERE clause and not the word "table." You need to add another argument specifier to your whereclause formatting:
>>> table = "myTable"
>>> layername = "myLayer"
>>>
>>> # original code
>>> whereclause = "TaxParcels_Layer.{}.LOC_ID = table.LOC_ID".format(layername)
>>> print(whereclause)
TaxParcels_Layer.myLayer.LOC_ID = table.LOC_ID
>>>
>>> # modified code
>>> whereclause = "TaxParcels_Layer.{}.LOC_ID = {}.LOC_ID".format(layername, table)
>>> print(whereclause)
TaxParcels_Layer.myLayer.LOC_ID = myTable.LOC_ID
>>>

Pass values from python list to Cassandra query

Beginner here.
I have the following circumstances.
A text file with each line containing a name.
A cassandra 3.5 database
A python script
The intention is to have the script read from the file one line (one name) at a time, and query Cassandra with that name.
FYI, everything works fine except for when I try to pass the value of the list to the query.
I current have something like:
#... driver import, datetime imports done above
#...
with open(fname) as f:
content = f.readlines()
# Loop for each line from the number of lines in the name list file
# num_of_lines is already set
for x in range(num_of_lines):
tagname = str(content[x])
rows = session.execute("""SELECT * FROM tablename where name = %s and date = %s order by time desc limit 1""", (tagname, startDay))
for row in rows:
print row.name + ", " + str(row.date)
Everything works fine if I remove the tagname list component and edit the query itself with a name value.
What am I doing wrong here?
Simply building on the answer from #Vinny above, format simply replaces literal value. You need to put quotes around it.
for x in content:
rows = session.execute("SELECT * FROM tablename where name ='{}' and date ='{}' order by time desc limit 1".format(x, startDay))
for row in rows:
print row.name + ", " + str(row.date)
You can simply iterate over content:
for x in content:
rows = session.execute("SELECT * FROM tablename where name = {} and date = {} order by time desc limit 1".format(x, startDay))
for row in rows:
print row.name + ", " + str(row.date)
....
Also, you don't need to have 3 quotes for the string. Single quotes is good enough (3 quotes is used for documentation / multiple line comments in python)
Note that this might end in a different error; but you will be iterating on the lines instead of iterating over an index and reading lines.

Populating Insert Statements from a local file

I am trying to write user data from a file into a series of insert statements. I feel I am close but just missing one or two things. I am attempting to run a .format, but all I end up with are ?'s
import time, json, sqlite3
def insertsfromfile(file):
results = open(file).readlines()
output = open('UserINSERTFile.txt', 'w')
for rows in results:
jsonobject = json.loads(rows)
userid = jsonobject['user']['id']
name = jsonobject['user']['name']
screenname = jsonobject['user']['screen_name']
description = jsonobject['user']['description']
friendscount = jsonobject['user']['friends_count']
insert = ('INSERT INTO Users VALUES (?,?,?,?,?'.format(userid, name, screenname,description, friendscount)
insert = insert[:-1] + ''
output.write(insert)
output.close()
Thanks
I figured it out after reviewing it. Essentially I was missing that I had to combine the attributes together with my Insert string with the '+'. Also had to convert the variables to str() in case they were int.

How to extract table names and column names from sql query?

So let assume we have such simple query:
Select a.col1, b.col2 from tb1 as a inner join tb2 as b on tb1.col7 = tb2.col8;
The result should looks this way:
tb1 col1
tb1 col7
tb2 col2
tb2 col8
I've tried to solve this problem using some python library:
1) Even extracting only tables using sqlparse might be a huge problem. For example this official book doesn't work properly at all.
2) Using regular expression seems to be really hard to achieve.
3) But then I found this , that might help. However the problem is that I can't connect to any database and execute that query.
Any ideas?
sql-metadata is a Python library that uses a tokenized query returned by python-sqlparse and generates query metadata.
This metadata can return column and table names from your supplied SQL query. Here are a couple of example from the sql-metadata github readme:
>>> sql_metadata.get_query_columns("SELECT test, id FROM foo, bar")
[u'test', u'id']
>>> sql_metadata.get_query_tables("SELECT test, id FROM foo, bar")
[u'foo', u'bar']
>>> sql_metadata.get_query_limit_and_offset('SELECT foo_limit FROM bar_offset LIMIT 50 OFFSET 1000')
(50, 1000)
A hosted version of the library exists at sql-app.infocruncher.com to see if it works for you.
Really, this is no easy task. You could use a lexer (ply in this example) and define several rules to get several tokens out of a string. The following code defines these rules for the different parts of your SQL string and puts them back together as there could be aliases in the input string. As a result, you get a dictionary (result) with the different tablenames as key.
import ply.lex as lex, re
tokens = (
"TABLE",
"JOIN",
"COLUMN",
"TRASH"
)
tables = {"tables": {}, "alias": {}}
columns = []
t_TRASH = r"Select|on|=|;|\s+|,|\t|\r"
def t_TABLE(t):
r"from\s(\w+)\sas\s(\w+)"
regex = re.compile(t_TABLE.__doc__)
m = regex.search(t.value)
if m is not None:
tbl = m.group(1)
alias = m.group(2)
tables["tables"][tbl] = ""
tables["alias"][alias] = tbl
return t
def t_JOIN(t):
r"inner\s+join\s+(\w+)\s+as\s+(\w+)"
regex = re.compile(t_JOIN.__doc__)
m = regex.search(t.value)
if m is not None:
tbl = m.group(1)
alias = m.group(2)
tables["tables"][tbl] = ""
tables["alias"][alias] = tbl
return t
def t_COLUMN(t):
r"(\w+\.\w+)"
regex = re.compile(t_COLUMN.__doc__)
m = regex.search(t.value)
if m is not None:
t.value = m.group(1)
columns.append(t.value)
return t
def t_error(t):
raise TypeError("Unknown text '%s'" % (t.value,))
t.lexer.skip(len(t.value))
# here is where the magic starts
def mylex(inp):
lexer = lex.lex()
lexer.input(inp)
for token in lexer:
pass
result = {}
for col in columns:
tbl, c = col.split('.')
if tbl in tables["alias"].keys():
key = tables["alias"][tbl]
else:
key = tbl
if key in result:
result[key].append(c)
else:
result[key] = list()
result[key].append(c)
print result
# {'tb1': ['col1', 'col7'], 'tb2': ['col2', 'col8']}
string = "Select a.col1, b.col2 from tb1 as a inner join tb2 as b on tb1.col7 = tb2.col8;"
mylex(string)
moz-sql-parser is a python library to convert some subset of SQL-92 queries to JSON-izable parse trees. Maybe it what you want.
Here is an example.
>>> parse("SELECT id,name FROM dual WHERE id>3 and id<10 ORDER BY name")
{'select': [{'value': 'id'}, {'value': 'name'}], 'from': 'dual', 'where': {'and': [{'gt': ['id', 3]}, {'lt': ['id', 10]}]}, 'orderby': {'value': 'name'}}
I am tackling a similar problem and found a simpler solution and it seems to work well.
import re
def tables_in_query(sql_str):
# remove the /* */ comments
q = re.sub(r"/\*[^*]*\*+(?:[^*/][^*]*\*+)*/", "", sql_str)
# remove whole line -- and # comments
lines = [line for line in q.splitlines() if not re.match("^\s*(--|#)", line)]
# remove trailing -- and # comments
q = " ".join([re.split("--|#", line)[0] for line in lines])
# split on blanks, parens and semicolons
tokens = re.split(r"[\s)(;]+", q)
# scan the tokens. if we see a FROM or JOIN, we set the get_next
# flag, and grab the next one (unless it's SELECT).
tables = set()
get_next = False
for tok in tokens:
if get_next:
if tok.lower() not in ["", "select"]:
tables.add(tok)
get_next = False
get_next = tok.lower() in ["from", "join"]
dictTables = dict()
for table in tables:
fields = []
for token in tokens:
if token.startswith(table):
if token != table:
fields.append(token)
if len(list(set(fields))) >= 1:
dictTables[table] = list(set(fields))
return dictTables
code adapted from https://grisha.org/blog/2016/11/14/table-names-from-sql/
Create a list of all the tables that are present in the DB. You can then search each table name in the queries.
This obviously isn't foolproof and the code will break in case any column/alias name matches the table name.
But it can be done as a workaround.
import pandas as pd
#%config PPMagics.autolimit=0
#txt = """<your SQL text here>"""
txt_1 = txt
replace_list = ['\n', '(', ')', '*', '=','-',';','/','.']
count = 0
for i in replace_list:
txt_1 = txt_1.replace(i, ' ')
txt_1 = txt_1.split()
res = []
for i in range(1, len(txt_1)):
if txt_1[i-1].lower() in ['from', 'join','table'] and txt_1[i].lower() != 'select':
count +=1
str_count = str(count)
res.append(txt_1[i] + "." + txt_1[i+1])
#df.head()
res_l = res
f_res_l = []
for i in range(0,len(res_l)):
if len(res_l[i]) > 15 : # change it to 0 is you want all the caught strings
f_res_l.append(res_l[i])
else :
pass
All_Table_List = f_res_l
print("All the unique tables from the SQL text, in the order of their appearence in the code : \n",100*'*')
df = pd.DataFrame(All_Table_List,columns=['Tables_Names'])
df.reset_index(level=0, inplace=True)
list_=list(df["Tables_Names"].unique())
df_1_Final = pd.DataFrame(list_,columns=['Tables_Names'])
df_1_Final.reset_index(level=0, inplace=True)
df_1_Final
Unfortunately, in order to do this successfully for "complex SQL" queries, you will more or less have to implement a complete parser for the particular database engine you are using.
As an example, consider this very basic complex query:
WITH a AS (
SELECT col1 AS c FROM b
)
SELECT c FROM a
In this case, a is not a table but a common table expression (CTE), and should be excluded from your output. There's no simple way of using regexp:es to realize that b is a table access but a is not - your code will really have to understand the SQL at a deeper level.
Also consider
SELECT * FROM tbl
You'd have to know the column names actually present in a particular instance of a database (and accessible to a particular user, too) to answer that correctly.
If by "works with complex SQL" you mean that it must work with any valid SQL statement, you also need to specify for which SQL dialect - or implement dialect-specific solutions. A solution which works with any SQL handled by a database that does not implement CTE:s would not work in one that does.
I am sorry to say so, but I do not think you will find a complete solution which works for arbitrarily complex SQL queries. You'll have to settle for a solution which works with a subset of a particular SQL-dialect.
For my simple use case (one table in query, no joins), I used the following tweak
lst = "select * from table".split(" ")
lst = [item for item in lst if len(item)>0]
table_name = lst[lst.index("from")+1]

Using a 'one key to many values' entry in SQLite3 database with Python

I have a text file that contains many different entries. What I'd like to do is take the first column, use each unique value as a key, and then store the second column as values. I actually have this working, sort of, but I'm looking for a better way to do this. Here is my example file:
account_check:"login/auth/broken"
adobe_air_installed:kb_base+"/"+app_name+"/Path"
adobe_air_installed:kb_base+"/"+app_name+"/Version"
adobe_audition_installed:'SMB/Adobe_Audition/'+version+'/Path'
adobe_audition_installed:'SMB/Adobe_Audition/'+version+'/ExePath'
Here is the code I'm using to parse my text file:
val_dict = {}
for row in creader:
try:
value = val_dict[row[0]]
value += row[1] + ", "
except KeyError:
value = row[1] + ", "
val_dict[row[0]] = value
for row in val_dict.items():
values = row[1][:-1],row[0]
cursor.execute("UPDATE 'plugins' SET 'sets_kb_item'= ? WHERE filename= ?", values)
And here is the code I use to query + format the data currently:
def kb_item(query):
db = get_db()
cur = db.execute("select * from plugins where sets_kb_item like ?", (query,))
plugins = cur.fetchall()
for item in plugins:
for i in item['sets_kb_item'].split(','):
print i.strip()
Here is the output:
kb_base+"/Installed"
kb_base+"/Path"
kb_base+"/Version"
It took me many tries but I finally got the output the way I wanted it, however I'm looking for critique. Is there a better way to do this? Could my entire for item in plugins.... print i.strip() be done in one line and saved as a variable? I am very new to working with databases, and my python skills could also use refreshing.
NOTE I'm using csvreader in this code because I originally had a .csv file - however I found it was just as easy to use the .txt file I was provided.

Categories

Resources