Question:
I have a python script to scrape and website it gets 2 variables and stores them in 2 lists. I then use executemany to update MySQL database using one variable to match a pre-existing row to insert the other variable into.
Code:
Python Script
import mysql.connector
from bs4 import BeautifulSoup as soup
from selenium import webdriver
import time, re
mydb = mysql.connector.connect(
host="host",
user="user",
passwd="passwd",
database="database"
)
mycursor = mydb.cursor()
d = webdriver.Chrome('D:/Uskompuf/Downloads/chromedriver')
d.get('https://au.pcpartpicker.com/products/cpu/overall-list/#page=1')
def cpus(_source):
result = soup(_source, 'html.parser').find('ul', {'id':'category_content'}).find_all('li')
_titles = list(filter(None, [(lambda x:'' if x is None else x.text)(i.find('div', {'class':'title'})) for i in result]))
data = [list(filter(None, [re.findall('(?<=\().*?(?=\))', c.text) for c in i.find_all('div')])) for i in result]
return _titles, [a for *_, [a] in filter(None, data)]
_titles, _cpus = cpus(d.page_source)
sql = "UPDATE cpu set family = %s where name = %s"
mycursor.executemany(sql, list(zip(_cpus, _titles)))
print(sql, list(zip(_titles, _cpus)))
_last_page = soup(d.page_source, 'html.parser').find_all('a', {'href':re.compile('#page\=\d+')})[-1].text
for i in range(2, int(_last_page)+1):
d.get(f'https://au.pcpartpicker.com/products/cpu/overall-list/#page={i}')
time.sleep(3)
_titles, _cpus = cpus(d.page_source)
sql = "UPDATE cpu set family = %s where name = %s"
mycursor.executemany(sql, list(zip(_cpus, _titles)))
mydb.commit()
MySQL UPDATE code
sql = "UPDATE cpu set family = %s where name = %s"
mycursor.executemany(sql, list(zip(_cpus, _titles)))
MySQL UPDATE code print
print(sql, list(zip(_cpus, _titles)))
MySQL UPDATE code print output
UPDATE cpu set family = %s where name = %s [('Pinnacle Ridge', 'AMD Ryzen 5 2600'), ('Coffee Lake-S', 'Intel Core i7-8700K'),...
First 2 rows of table
Expected result
The first variable is the name and that is the variable that needs to be matched the second variable is the family to be updated to row. The name matches perfectly and there are no errors when running the program however all family values are null.
Not sure what the best way to go solving this, I though i could make a fiddle but not sure about the list in executemany?
Other
If you need any more information please let me know.
Thanks
Just had to add:
mydb.commit()
after
executemany
Related
I want to search a mysql table for rows where the specified column has a particular value. For example, given the input string memory=2048 it will search for the rows that have "2048" as the value of memory column and it will print them.
This is code that I have tried but it print outs nothing.
input = input()
tag = input.split("=")
desc = tag[1]
tag = tag[0]
mycursor = mydb.cursor()
sql = "(SELECT * FROM comp WHERE %s LIKE %s)"
val = (tag, desc)
mycursor.execute(sql, val)
res = mycursor.fetchall()
for x in res:
print(x)
Secondly I tried this code to see where is the problem :
input = input()
tag = input.split("=")
desc = tag[1]
tag = tag[0]
mycursor = mydb.cursor()
sql = "(SELECT * FROM comp WHERE memory LIKE '2048')"
mycursor.execute(sql)
res = mycursor.fetchall()
for x in res:
print(x)
It gives the desired output. So my problem is when I am trying to get the column name with %s it comes as 'memory' and It couldn't finds it, since the name of the column is memory. Is there a way to get rid of the '' chars ?
confirmation of inputs
Looking at the mysql.connector's execute() documentation it appears to use %s as placeholders for bind parameters.
So your execute("SELECT * FROM comp WHERE %s LIKE %s", ("memory", "2048")) call ends up running like the following SQL:
SELECT * FROM comp WHERE 'memory' LIKE '2048'
obviously returning 0 rows.
You need to put the literal column name into the query text before invoking execute():
sql = "SELECT * FROM comp WHERE %s LIKE %s" % (tag, "%s")
# => "SELECT * FROM comp WHERE memory LIKE %s"
mycursor.execute(sql, (desc, ))
I have a folder called 'testfolder' that includes two files -- 'Sigurdlogfile' and '2004ADlogfile'. Each file has a list of strings called entries. I need to run my code on both of them and am using glob to do this. My code creates a dictionary for each file and stores data extracted using regex where the dictionary keys are stored in commonterms below. Then it inserts each dictionary into a mysql table. It does all of this successfully, but my second sql statement is not inserting how it should (per file).
import glob
import re
files = glob.glob('/home/user/testfolder/*logfile*')
commonterms = (["freq", "\s?(\d+e?\d*)\s?"],
["tx", "#txpattern"],
["rx", "#rxpattern"], ...)
terms = [commonterms[i][0] for i in range(len(commonterms))]
patterns = [commonterms[i][1] for i in range(len(commonterms))]
def getTerms(entry):
for i in range(len(terms)):
term = re.search(patterns[i], entry)
if term:
term = term.groups()[0] if term.groups()[0] is not None else term.groups()[1]
else:
term = 'NULL'
d[terms[i]] += [term]
return d
for filename in files:
#code to create 'entries'
objkey = re.match(r'/home/user/testfolder/(.+?)logfile', filename).group(1)
d = {t: [] for t in terms}
for entry in entries:
d = getTerms(entry)
import MySQLdb
db = MySQLdb.connect(host='', user='', passwd='', db='')
cursor = db.cursor()
cols = d.keys()
vals = d.values()
for i in range(len(entries)):
lst = [item[i] for item in vals]
csv = "'{}'".format("','".join(lst))
sql1 = "INSERT INTO table (%s) VALUES (%s);" % (','.join(cols), csv.replace("'NULL'", "NULL"))
cursor.execute(sql1)
#now in my 2nd sql statement I need to update the table with data from an old table, which is where I have the problem...
sql2 = "UPDATE table, oldtable SET table.key1 = oldtable.key1,
table.key2 = oldtable.key2 WHERE oldtable.obj = %s;" % repr(objkey)
cursor.execute(sql2)
db.commit()
db.close()
The problem is that in the second sql statement, it ends up inserting that data into all columns of the table from only one of the objkeys, but I need it to insert different data depending on which file the code is currently running on. I can't figure out why this is, since I've defined objkey inside my for filename in files loop. How can I fix this?
Instead of doing separate INSERT and UPDATE, do them together to incorporate the fields from the old table.
for i in range(len(entries)):
lst = [item[i] for item in vals]
csv = "'{}'".format("','".join(lst))
sql1 = """INSERT INTO table (key1, key2, %s)
SELECT o.key1, o.key2, a.*
FROM (SELECT %s) AS a
LEFT JOIN oldtable AS o ON o.obj = %s""" % (','.join(cols), csv.replace("'NULL'", "NULL"), repr(objkey))
cursor.execute(sql1)
Hi I have the following block of code that is meant to take the variable 'search_value'and pass it into the WHERE clause of a mysql select statement
import MySQLdb
search_term = input('Enter your search term: ')
print (search_term)
conn = MySQLdb.connect(my connection info)
c = conn.cursor()
q = "SELECT * FROM courses WHERE course_area = %(value)s "
params = {'value': search_term}
c.execute(q, params)
rows = c.fetchall()
for eachRow in rows:
print (eachRow)
I know that I need to use %s somewhere but I'm not sure of the exact syntax. I did some searching online but I have only found examples of insert statement...and I know they have a little different syntax. Thanks
This should work:
q = "SELECT * FROM courses WHERE course_area = %(value)s "
params = {'value':'some_value_here'}
c.execute(q, params)
.....
I often need to process several hundred million rows of a MySQL table on a line by line basis using Python. I want a script that is robust and does not need to be monitored.
Below I pasted a script that classifying the language of the message field in my row. It utilizes the sqlalchemy and MySQLdb.cursors.SSCursor modules. Unfortunately this script consistently throws a 'Lost connection to MySQL server during query' error after 4840 rows when I run remotely and 42000 rows when I run locally.
Also, I have checked and max_allowed_packet = 32M on my MySQL server's /etc/mysql/my.cnf file as per the answers to this stackoverflow question Lost connection to MySQL server during query
Any advice for either fixing this error, or using another approach to use Python for processing very large MySQL files in a robust way would be much appreciated!
import sqlalchemy
import MySQLdb.cursors
import langid
schema = "twitterstuff"
table = "messages_en" #900M row table
engine_url = "mysql://myserver/{}?charset=utf8mb4&read_default_file=~/.my.cnf".format(schema)
db_eng = sqlalchemy.create_engine(engine_url, connect_args={'cursorclass': MySQLdb.cursors.SSCursor} )
langid.set_languages(['fr', 'de'])
print "Executing input query..."
data_iter = db_eng.execute("SELECT message_id, message FROM {} WHERE langid_lang IS NULL LIMIT 10000".format(table))
def process(inp_iter):
for item in inp_iter:
item = dict(item)
(item['langid_lang'], item['langid_conf']) = langid.classify(item['message'])
yield item
def update_table(update_iter):
count = 0;
for item in update_iter:
count += 1;
if count%10 == 0:
print "{} rows processed".format(count)
lang = item['langid_lang']
conf = item['langid_conf']
message_id = item['message_id']
db_eng.execute("UPDATE {} SET langid_lang = '{}', langid_conf = {} WHERE message_id = {}".format(table, lang, conf, message_id))
data_iter_upd = process(data_iter)
print "Begin processing..."
update_table(data_iter_upd)
According to MySQLdb developer Andy Dustman,
[When using SSCursor,] no new queries can be issued on the connection until
the entire result set has been fetched.
That post says that if you issue another query you will get a "commands out of sequence" error, which is not the error you are seeing. So I am not sure that the following will necessarily fix your problem. Nevertheless, it might be worth trying to remove SSCursor from your code and use the simpler default Cursor just to test if that is the source of the problem.
You could, for example, use LIMIT chunksize OFFSET n in your SELECT statement
to loop through the data set in chunks:
import sqlalchemy
import MySQLdb.cursors
import langid
import itertools as IT
chunksize = 1000
def process(inp_iter):
for item in inp_iter:
item = dict(item)
(item['langid_lang'], item['langid_conf']) = langid.classify(item['message'])
yield item
def update_table(update_iter, engine):
for count, item in enumerate(update_iter):
if count%10 == 0:
print "{} rows processed".format(count)
lang = item['langid_lang']
conf = item['langid_conf']
message_id = item['message_id']
engine.execute(
"UPDATE {} SET langid_lang = '{}', langid_conf = {} WHERE message_id = {}"
.format(table, lang, conf, message_id))
schema = "twitterstuff"
table = "messages_en" #900M row table
engine_url = ("mysql://myserver/{}?charset=utf8mb4&read_default_file=~/.my.cnf"
.format(schema))
db_eng = sqlalchemy.create_engine(engine_url)
langid.set_languages(['fr', 'de'])
for offset in IT.count(start=0, step=chunksize):
print "Executing input query..."
result = db_eng.execute(
"SELECT message_id, message FROM {} WHERE langid_lang IS NULL LIMIT {} OFFSET {}"
.format(table, chunksize, offset))
result = list(result)
if not result: break
data_iter_upd = process(result)
print "Begin processing..."
update_table(data_iter_upd, db_eng)
def StatusUpdate(self, table):
inventoryCurs.execute('SELECT * from Table')
for i in inventoryCurs:
html = urlopen(i[5]).read()
Soup = BeautifulSoup(html)
if table.StockStatus(Soup) == 'Out of Stock':
inventoryCurs.execute('''UPDATE table SET status = 'Out of Stock' WHERE id = %s)''', i[0])
inventoryCurs.execute('''UPDATE table SET status = 'Out of Stock' WHERE id = %s)''', i[0])
OperationalError: near "%": syntax error
Without seeing more of the code, it's difficult to fix the problem completely, but looking at your code, I think the problem might be the %s in this line:
inventoryCurs.execute('''UPDATE table SET status = 'Out of Stock' WHERE id = %s)''', i[0])
According to the documentation for the SQLite module in both Python 2 and Python 3, the sqlite3 module requires a ? as a placeholder, not %s or some other format string.
According to the Python 2 documentation, a %s placeholder could be used like this:
import sqlite3
conn = sqlite3.connect('example.db')
c = conn.cursor()
# Never do this -- insecure!
symbol = 'IBM'
c.execute("select * from stocks where symbol = '%s'" % symbol)
but that's a simple format string, not actually the database's placeholder. Also, as the comment shows, you should never build queries that way because it makes them vulnerable to SQL injection. Rather, you should build them like this, using a ? instead:
import sqlite3
conn = sqlite3.connect('example.db')
c = conn.cursor()
# Do this instead
t = (symbol,)
c.execute('SELECT * FROM stocks WHERE symbol=?', t)
The documentation has more details, but I believe that is the solution to the error you posted.