I have a folder called 'testfolder' that includes two files -- 'Sigurdlogfile' and '2004ADlogfile'. Each file has a list of strings called entries. I need to run my code on both of them and am using glob to do this. My code creates a dictionary for each file and stores data extracted using regex where the dictionary keys are stored in commonterms below. Then it inserts each dictionary into a mysql table. It does all of this successfully, but my second sql statement is not inserting how it should (per file).
import glob
import re
files = glob.glob('/home/user/testfolder/*logfile*')
commonterms = (["freq", "\s?(\d+e?\d*)\s?"],
["tx", "#txpattern"],
["rx", "#rxpattern"], ...)
terms = [commonterms[i][0] for i in range(len(commonterms))]
patterns = [commonterms[i][1] for i in range(len(commonterms))]
def getTerms(entry):
for i in range(len(terms)):
term = re.search(patterns[i], entry)
if term:
term = term.groups()[0] if term.groups()[0] is not None else term.groups()[1]
else:
term = 'NULL'
d[terms[i]] += [term]
return d
for filename in files:
#code to create 'entries'
objkey = re.match(r'/home/user/testfolder/(.+?)logfile', filename).group(1)
d = {t: [] for t in terms}
for entry in entries:
d = getTerms(entry)
import MySQLdb
db = MySQLdb.connect(host='', user='', passwd='', db='')
cursor = db.cursor()
cols = d.keys()
vals = d.values()
for i in range(len(entries)):
lst = [item[i] for item in vals]
csv = "'{}'".format("','".join(lst))
sql1 = "INSERT INTO table (%s) VALUES (%s);" % (','.join(cols), csv.replace("'NULL'", "NULL"))
cursor.execute(sql1)
#now in my 2nd sql statement I need to update the table with data from an old table, which is where I have the problem...
sql2 = "UPDATE table, oldtable SET table.key1 = oldtable.key1,
table.key2 = oldtable.key2 WHERE oldtable.obj = %s;" % repr(objkey)
cursor.execute(sql2)
db.commit()
db.close()
The problem is that in the second sql statement, it ends up inserting that data into all columns of the table from only one of the objkeys, but I need it to insert different data depending on which file the code is currently running on. I can't figure out why this is, since I've defined objkey inside my for filename in files loop. How can I fix this?
Instead of doing separate INSERT and UPDATE, do them together to incorporate the fields from the old table.
for i in range(len(entries)):
lst = [item[i] for item in vals]
csv = "'{}'".format("','".join(lst))
sql1 = """INSERT INTO table (key1, key2, %s)
SELECT o.key1, o.key2, a.*
FROM (SELECT %s) AS a
LEFT JOIN oldtable AS o ON o.obj = %s""" % (','.join(cols), csv.replace("'NULL'", "NULL"), repr(objkey))
cursor.execute(sql1)
Related
I need to query rows where a column matches my list of ~60K IDs out of a table that contains millions of IDs. I think normally you would insert a temporary table into the database and merge on that but I can't edit this database. I am doing it like this using a loop w/ a python wrapper, but is there a better way? I mean it works, but still:
import pyodbc
import pandas as pd
# connect to the database using windows authentication
conn = pyodbc.connect('DRIVER={SQL Server Native Client 11.0};SERVER=my_fav_server;DATABASE=my_fav_db;Trusted_Connection=yes;')
cursor = conn.cursor()
# read in all the ids
ids_list = [...60K ids in here..]
# query in 10K chunks to prevent memory error
def chunks(l,n):
# split list into n lists of evenish size
n = max(1,n)
return [l[i:i+n] for i in range(0,len(l), n)]
chunked_ids_lists = chunks(ids_list, 10000)
# looping through to retrieve all cols
for chunk_num, chunked_ids_list in enumerate(chunked_ids_lists):
temp_ids_string = "('" + "','".join(chunked_ids_list) + "')"
temp_sql = f"SELECT * FROM dbo.my_fav_table WHERE ID IN {temp_ids_string};"
temp_data = pd.read_sql_query(temp_sql, conn)
temp_path = f"temp_chunk_{chunk_num}.txt"
temp_data.to_csv(temp_path, sep='\t', index=None)
# read the query chunks
all_data_list = []
for chunk_num in range(len(chunked_ids_lists)):
temp_path = f"temp_chunk_{chunk_num}.txt"
temp_data = pd.read_csv(temp_path, sep='\t')
all_data_list.append(temp_data)
all_data = pd.concat(all_data_list)
Another way use Psycopg's cursor.
import psycopg2
# Connect to an existing database
conn = psycopg2.connect("dbname=test user=postgres")
# Open a cursor to perform database operations
cur = conn.cursor()
# get data from query
# no need construct 'SQL-correct syntax' filter
cur.execute("SELECT * FROM dbo.my_fav_table WHERE ID IN %(filter)s;", {"filter": chunked_ids_lists})
# loop over getted rows
for record in cur:
# we got one record
print(record) # or make other data treatment
Use parameters rather than concatenating strings.
I don't see the need for the CSV files, if you're just going to read them all into Python in the next loop. Just put everything into all_data_list during the query loop.
all_data_list = []
for chunk in chunked_ids_lists:
params = ','.join(['?'] * len(chunk))
sql = f"SELECT * FROM dbo.my_fav_table WHERE ID IN ({params});"
cursor.execute(sql, chunk)
rows = cursor.fetchall()
all_data_list.extend(rows)
all_data = pd.dataFrame(all_data_list)
I want to search a mysql table for rows where the specified column has a particular value. For example, given the input string memory=2048 it will search for the rows that have "2048" as the value of memory column and it will print them.
This is code that I have tried but it print outs nothing.
input = input()
tag = input.split("=")
desc = tag[1]
tag = tag[0]
mycursor = mydb.cursor()
sql = "(SELECT * FROM comp WHERE %s LIKE %s)"
val = (tag, desc)
mycursor.execute(sql, val)
res = mycursor.fetchall()
for x in res:
print(x)
Secondly I tried this code to see where is the problem :
input = input()
tag = input.split("=")
desc = tag[1]
tag = tag[0]
mycursor = mydb.cursor()
sql = "(SELECT * FROM comp WHERE memory LIKE '2048')"
mycursor.execute(sql)
res = mycursor.fetchall()
for x in res:
print(x)
It gives the desired output. So my problem is when I am trying to get the column name with %s it comes as 'memory' and It couldn't finds it, since the name of the column is memory. Is there a way to get rid of the '' chars ?
confirmation of inputs
Looking at the mysql.connector's execute() documentation it appears to use %s as placeholders for bind parameters.
So your execute("SELECT * FROM comp WHERE %s LIKE %s", ("memory", "2048")) call ends up running like the following SQL:
SELECT * FROM comp WHERE 'memory' LIKE '2048'
obviously returning 0 rows.
You need to put the literal column name into the query text before invoking execute():
sql = "SELECT * FROM comp WHERE %s LIKE %s" % (tag, "%s")
# => "SELECT * FROM comp WHERE memory LIKE %s"
mycursor.execute(sql, (desc, ))
I am trying to use variables in a python function to try and retrieve attributes with mysql connector
It seems to work only when I specify the name of the attribute in the query itself
def insert(ids, added_attribute):
insert = ''
if len(ids) > 0:
#insert scpecified attributes wanted
insert += ' AND (%s = %s' %(added_attribute, ids[0])
#for loop for more than one specified specific attribute
for id_index in range(1, len(ids)):
insert += ' OR %s = %s' %(added_attribute, ids[id_index])
insert += ')'#close parenthesis on query insert
return insert
def get(name, attributes = 0, ids = []):
cursor = conn.cursor()
#insert specific ids
insert = insert(ids, "id")
query = 'SELECT %s FROM (TABLE) WHERE (name = %s%s)'
cursor.execute(query, (attributes, name, insert))
data = cursor.fetchall()
cursor.close()
return data
I keep getting null as a return value
Try this...
query = 'SELECT {} FROM (TABLE) WHERE (name = {}{})'
cursor.execute(query.format(attributes, name, insert))
{} is replacing %s here and to call the variables you just need to add .format() with the vars you want inserted in order.
I'm quite new in Python (Python 3.4.6) :)
I'm trying to insert into a mysql db some lines but with variables.
At the beginning, I've a dictionary list_hosts.
Here is my code :
import mysql.connector
import time
db = mysql.connector.connect(host='localhost', user='xxxxx', passwd='xxxxx', database='xxxxx')
cursor = db.cursor()
now_db = time.strftime('%Y-%m-%d %H:%M:%S')
for key, value in list_hosts
key_db += key+", "
value_ex += "%s, "
value_db += "\""+value+"\", "
key_db = key_db.strip(" ")
key_db = key_db.strip(",")
value_ex = value_ex.strip(" ")
value_ex = value_ex.strip(",")
value_db = value_db.strip(" ")
value_db = value_db.strip(",")
add_host = ("INSERT INTO nagios_hosts (date_created, date_modified, "+key_db+") VALUES ("+value_ex+")")
data_host = ("\""+now_db+"\", \""+now_db+"\", "+value_db)
cursor.execute(add_host, data_host)
db.commit()
db.close()
Example of list_hosts:
OrderedDict([('xxxx1', 'data1'), ('xxxx2', 'data2'), ('xxxx3', 'data3'), ('xxxx4', 'data4'), ('xxxx5', 'data5'), ('xxxx6', 'data6')])
I've simplified the code of course.
I did it like this as I've never have the same amount of items in the dictionnary.
I'm trying to create something like this :
add_host - INSERT INTO TABLE (date_created, date_modified, xxxx1, xxxx2, xxxx3, xxxx4, xxxx5, xxxx6) VALUES (%s, %s, %s, %s, %s, %s)
data_host - now, now, data1, data2, data3, data4, data5, data6
Where there are never the same number of xxxx...
They all exist in the DB, but I don't need to fill each column for each item in the dictionnary.
When I execute I get this error :
mysql.connector.errors.ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'xxxxxxxxxxxxxxxxxxxxx' at line 1
As I'm beginning with Python, I think there are a lot of things we can clean too... don't hesitate :)
Here's a canonical python3 (python2 compatible) solution:
import time
from collections import OrderedDict
list_hosts = OrderedDict([("field1", "value1"), ("field2", "value2"), ("fieldN", "valueN")])
# builds a comma-separated string of db placeholders for the values:
placeholders = ", ".join(["%s"] * (len(list_hosts) + 2))
# builds a comma-separated string of field names
fields = ", ".join(("date_created","date_modified") + tuple(list_hosts.keys()))
# builds a tuple of values including the dates
now_db = time.strftime('%Y-%m-%d %H:%M:%S')
values = (now_db, now_db) + tuple(list_hosts.values())
# build the SQL query:
sql = "INSERT INTO nagio_hosts({}) VALUES({})".format(fields, placeholders)
# and safely execute it
cursor.execute(sql, values)
db.commit()
As #khelwood mentioned in the comments, you should use parameterized queries.
If the number of columns you're inserting varies, you might prefer to generate a tuple and use it in a parameterized query then.
cursor.execute() accepts two parameters:
a query as a string;
parameters as a tuple.
The idea is to generate the string and the tuple and pass those to cursor.execute().
You'll need something like this:
list_hosts = {'xxxx1': 'data1', 'xxxx2': 'data2', 'xxxx3': 'data3', 'xxxx4': 'data4'}
keys = [] # creating a list for keys
values = () # creating a tuple for values
for key, value in list_hosts.items():
keys.append(key)
values = values + (value,)
keys_str = ', '.join(keys)
ps = ', '.join(['%s'] * len(list_hosts))
query = "INSERT INTO tbl (%s) VALUES (%s)" % (keys_str, ps)
print(query)
# INSERT INTO tbl (data_created, data_modified, xxxx1, xxxx2, xxxx3, xxxx4) VALUES (%s, %s, %s, %s)
cursor.execute(query, values)
Just tried it on a sample data, works fine!
I am trying to copy some columns from a table called temporarytable to another one called scalingData using psycopg2 in python.
scalingData is a pandas dataframe. The dataframe contains data from cities such as: nameOfCities, population, etc.
scalingData = pd.read_csv('myFile.csv') ## 'myFile.csv' is the datasource
each column of the dataframe has a different kind of data, such as 'int64', 'float64' or 'O'.
Here a screen shot from Jupyter
import psycopg2 as ps
## Populate table scalingData
tmp = scalingData.dtypes
con = None
con = ps.connect(dbname = 'mydb', user='postgres', host='localhost', password='mypd')
con.autocommit = True
cur = con.cursor()
for i in range(0,5):
j = header[i]
stat = """ ALTER TABLE "scalingData" ADD COLUMN "%s" """%j
if tmp[i] == 'int64':
stat = stat+'bigint'
if tmp[i] == 'float64':
stat = stat+'double precision'
if tmp[i] == 'O':
stat = stat+'text'
### Add Column
cur.execute(stat)
stat1 = """INSERT INTO "scalingData" ("%s") SELECT "%s" FROM temporarytable"""%(j,j)
### Copy Column
cur.execute(stat1)
cur.close()
con.close()
My problem is that if I look at scalingData only the first column is copied while the others are empty.
Here a screenshot of the table from pgAdmin afer the query:
Also if I copy for instance the second column as first column it works, but then it fails with the others as well.
This happens because you add 1 field to your new table, than insert data only with that field set up, and you do it 5 times. So you should actually see 5 copies of your original table with only 1 field set up.
You need to first set up the structure for your scalingData table, then insert all the records with all fields.
Here is the code (not a Python developer):
import psycopg2 as ps
## Populate table scalingData
tmp = scalingData.dtypes
con = None
con = ps.connect(dbname = 'mydb', user='postgres', host='localhost', password='mypd')
con.autocommit = True
cur = con.cursor()
for i in range(0,5):
j = header[i]
stat = """ ALTER TABLE "scalingData" ADD COLUMN "%s" """%j
if tmp[i] == 'int64':
stat = stat+'bigint'
if tmp[i] == 'float64':
stat = stat+'double precision'
if tmp[i] == 'O':
stat = stat+'text'
### Add Column
cur.execute(stat)
fieldsStr = '"' + '", "'.join([header]) + '"' ### will return "header1", "header2", ... , "header5"
stat1 = """INSERT INTO "scalingData" (%s) SELECT %s FROM temporarytable"""%(fieldsStr,fieldsStr)
### Copy Table
cur.execute(stat1)
cur.close()
con.close()
I'm not familiar with Python, but just a guess as to where the issue might be coming from:
"""INSERT INTO "scalingData" ("%s") SELECT "%s" FROM temporarytable"""
... will transform the "%s" bit into "foo, bar, baz" rather than "foo", "bar", "baz".
Put another way you should remove the unneeded double quotes in your statement and escape the individual column names instead.
Double quotes are used in PG to quote identifiers. You can literally have an table or column called "foo, bar, baz" and PG will work just fine when you do - provided it's always in-between double quotes when you use it in a statement.