Insert Python List into a single column in mySQL Database - python

Hi I am trying to insert a python list into a single column but it keeps giving an error on the syntax.
New to this. Appreciate any help. Thanks.
from time import time
import MySQLdb
import urllib
import re
from bs4 import BeautifulSoup
db = MySQLdb.connect("localhost","testuser","test123","testdb" )
cursor = db.cursor()
x=1
while x<2:
url = "http://search.insing.com/ts/food-drink/bars-pubs/bars-pubs?page=" +str(x)
htmlfile = urllib.urlopen(url)
soup = BeautifulSoup(htmlfile)
reshtml = [h3.a for h3 in soup.find("div", "results").find_all("h3")]
reslist = []
for item in reshtml:
res = item.text.encode('ascii', 'ignore')
reslist.append(' '.join(res.split()))
sql = "INSERT INTO insing(name) \
VALUES %r" \
% reslist
try:
cursor.execute(sql)
db.commit()
except:
db.rollback()
db.close()
x += 1
The output for SQL is
'INSERT INTO insing(name) VALUES [\'AdstraGold Microbrewery & Bistro Bar\', \'Alkaff Mansion Ristorante\', \'Parco Caffe\', \'The Fat Cat Bistro\', \'Gravity Bar\', \'The Wine Company (Evans Road)\', \'Serenity Spanish Bar & Restaurant (VivoCity)\', \'The New Harbour Cafe & Bar\', \'Indian Times\', \'Sunset Bay Beach Bar\', \'Friends # Jelita\', \'Talk Cock Sing Song # Thomson\', \'En Japanese Dining Bar (UE Square)\', \'Magma German Wine Bistro\', "Tam Kah Shark\'s Fin", \'Senso Ristorante & Bar\', \'Hard Rock Cafe (HPL House)\', \'St. James Power Station\', \'The St. James\', \'Brotzeit German Bier Bar & Restaurant (Vivocity)\']'

what about
insert into table(name) values ('name1'), ('name2'), ... , ('name36');
Inserting multiple rows in a single SQL query?
That might help too.
EDIT
I automated the process as well:
dataSQL = "INSERT INTO PropertyRow (SWID, Address, APN, PropertyType, PermissableUse, UseDetail, ReviewResult, Analysis, DocReviewed, AqDate, ValuePurchase, ValueCurrent, ValueDate, ValueBasis, ValueSale, SaleDate, PropPurpose, LotSize, Zoning, ParcelValue, EstRevenue, ReqRevenue, EnvHistory, TransitPotential, PlanObjective, PrevHistory, LastUpdDate, LastUpdUser)"
fields = "VALUES ("+"'"+str(rawID)+"', "
if(cell.ctype != 0):
while column < 27:
#column 16 will always be blank
if (column == 16):
column += 1
#column 26 is the end
if (column == 26):
fields += "'"+str(sh.cell_value(rowx=currentRow, colx=column)) + "'"
else:
#append to the value string
fields += "'"+str(sh.cell_value(rowx=currentRow, colx=column)) + "', "
#print fields
column+=1
fields += ');'
writeFyle.write(dataSQL)
writeFyle.write(fields)
In this implementation I am writing an insert statement for each row that I wanted to insert. This wasn't necessary but it was much easier.

Related

Passing list of strings as parameter to SQL query in Python

I have a Python program that generates a report for returns. When run, a GUI pops up allowing the user to select categories from a list. I am trying to format my query so that the generated report only includes categories in the list of selected categories. The formatting for the start and end dates works so I'm not sure what I'm doing wrong for the category formatting.
The code for my GUI
options = ['Bags Children','Bags Mens','Bandanas & Handkerchiefs','Belts Children','Belts Mens','Belts Womens','Cold Weather Childrens','Cold Weather Mens','Cold Weather Womens','Face Masks','Handbags Womens','Headwear Childrens',\
'Headwear Mens','Headwear Womens','Jewelry Mens','Scarves & Wraps Womens','Sleepwear Childrens','Sleepwear Mens','Sleepwear Womens','Slippers Childrens','Slippers Mens','Slippers Womens','Socks & Hosiery Childrens',\
'Socks & Hosiery Mens','Socks & Hosiery Womens','Sunglasses & Cases','Suspenders Childrens','Suspenders Mens','Suspenders Womens','Travel Accessories','Umbrellas & Rain Gear','Umbrellas & Rain Gear Childrens','Umbrellas & Rain Gear Mens',\
'Umbrellas & Rain Gear Womens','Undergarments Childrens','Undergarments Mens','Undergarments Womens','Waist Packs & Belt Bags','Wallets & Small Accessories Childrens','Wallets & Small Accessories Mens','Wallets & Small Accessories Womens',\
'Womens Wallets & Handbag Accessories']
text = "Select Category(ies): "
title = 'Returns Summary'
cat_output = multchoicebox(text, title, options)
title = 'Message Box'
message = "Selected Categories: " + str(cat_output)
msg = msgbox(message, title)
print(cat_output)
The output for print(cat_output) is in the format ['Face Masks', 'Belts Mens', 'Belts Womens'] displaying the selected categories.
GUI Display
The code for my SQL query
SQL1 = "SELECT i.Category, i.ItemName, r.OrderNumber, r.SKUReceived, r.UnitPrice, r.Quantity, o.CartID, o.MarketName, o.Email \
FROM Returns AS r INNER JOIN Orders AS o ON r.OrderNumber = o.OrderNumber INNER JOIN Inventory AS i ON i.LocalSKU = r.SKU \
WHERE (((r.Date) between '%s' and '%s') AND ((r.UnitPrice)>0) AND o.CartID != 12 AND r.Type = 'R' AND i.Category IN ({})) \
ORDER BY r.OrderNumber;".format(cat_output) % (start, end)
The Error I get
Traceback
<module> Z:\Python\Returns Summary 3.0.py 381
GetReturns Z:\Python\Returns Summary 3.0.py 279
ProgrammingError: ('42S22', "[42S22] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid column name ''Bags Children', 'Belts Womens', 'Headwear Mens', 'Sleepwear Childrens''. (207) (SQLExecDirectW)")
SQL syntax suggests IN clause like … IN (‘m’,’l’). From what I see in the question, you have it like …IN ([‘m’,’l’]).
Try to use .format(“,”.join(repr(x) for x in cat_output))

How to Sum Rows in SQLite in Python

I want to add the items together, but they come out separately. There are 3 item weights from weapons that are equiped. Here is the code:
conn = sqlite3.connect('character.db')
c = conn.cursor()
c.execute("SELECT * FROM equipment WHERE character_id = :character_id AND equip = :equip",
{
'character_id': self.controller.currentid.get(),
'equip': 1
})
for row in c.fetchall():
wconn = sqlite3.connect('equipment.db')
w = wconn.cursor()
w.execute("SELECT * FROM weapons WHERE weapon_id = :weapon_id",
{
'weapon_id': str(row[1])
})
for row1 in w.fetchall():
test = str(row1[4])
print(test)
wconn.commit()
wconn.close()
conn.commit()
conn.close()
My output is:
1.0
0.5
10.0
I want this to read 11.5 instead of the separate numbers.
Use the SUM function in your query:
SELECT SUM(column_to_sum) FROM weapons WHERE weapon_id = :weapon_id
You'll need to replace column_to_sum with the actual column name.

Speeding up insertion of point data from netcdf

I've got this netcdf of weather data (one of thousands that require postgresql ingestion). I'm currently capable of inserting each band into a postgis-enabled table at a rate of about 20-23 seconds per band. (for monthly data, there is also daily data that i have yet to test.)
I've heard of different ways of speeding this up using COPY FROM, removing the gid, using ssds, etc... but I'm new to python and have no idea how to store the netcdf data to something I could use COPY FROM or what the best route might be.
If anyone has any other ideas on how to speed this up, please share!
Here is the ingestion script
import netCDF4, psycopg2, time
# Establish connection
db1 = psycopg2.connect("host=localhost dbname=postgis_test user=********** password=********")
cur = db1.cursor()
# Create Table in postgis
print(str(time.ctime()) + " CREATING TABLE")
try:
cur.execute("DROP TABLE IF EXISTS table_name;")
db1.commit()
cur.execute(
"CREATE TABLE table_name (gid serial PRIMARY KEY not null, thedate DATE, thepoint geometry, lon decimal, lat decimal, thevalue decimal);")
db1.commit()
print("TABLE CREATED")
except:
print(psycopg2.DatabaseError)
print("TABLE CREATION FAILED")
rawvalue_nc_file = 'netcdf_file.nc'
nc = netCDF4.Dataset(rawvalue_nc_file, mode='r')
nc.variables.keys()
lat = nc.variables['lat'][:]
lon = nc.variables['lon'][:]
time_var = nc.variables['time']
dtime = netCDF4.num2date(time_var[:], time_var.units)
newtime = [fdate.strftime('%Y-%m-%d') for fdate in dtime]
rawvalue = nc.variables['tx_max'][:]
lathash = {}
lonhash = {}
entry1 = 0
entry2 = 0
lattemp = nc.variables['lat'][:].tolist()
for entry1 in range(lat.size):
lathash[entry1] = lattemp[entry1]
lontemp = nc.variables['lon'][:].tolist()
for entry2 in range(lon.size):
lonhash[entry2] = lontemp[entry2]
for timestep in range(dtime.size):
print(str(time.ctime()) + " " + str(timestep + 1) + "/180")
for _lon in range(lon.size):
for _lat in range(lat.size):
latitude = round(lathash[_lat], 6)
longitude = round(lonhash[_lon], 6)
thedate = newtime[timestep]
thevalue = round(float(rawvalue.data[timestep, _lat, _lon] - 273.15), 3)
if (thevalue > -100):
cur.execute("INSERT INTO table_name (thedate, thepoint, thevalue) VALUES (%s, ST_MakePoint(%s,%s,0), %s)",(thedate, longitude, latitude, thevalue))
db1.commit()
cur.close()
db1.close()
print(" Done!")
If you're certain most of the time is spent in PostgreSQL, and not in any other code of your own, you may want to look at the fast execution helpers, namely cur.execute_values() in your case.
Also, you may want to make sure you're in a transaction, so the database doesn't fall back to an autocommit mode. ("If you do not issue a BEGIN command, then each individual statement has an implicit BEGIN and (if successful) COMMIT wrapped around it.")
Something like this could do the trick -- not tested though.
for timestep in range(dtime.size):
print(str(time.ctime()) + " " + str(timestep + 1) + "/180")
values = []
cur.execute("BEGIN")
for _lon in range(lon.size):
for _lat in range(lat.size):
latitude = round(lathash[_lat], 6)
longitude = round(lonhash[_lon], 6)
thedate = newtime[timestep]
thevalue = round(
float(rawvalue.data[timestep, _lat, _lon] - 273.15), 3
)
if thevalue > -100:
values.append((thedate, longitude, latitude, thevalue))
psycopg2.extras.execute_values(
cur,
"INSERT INTO table_name (thedate, thepoint, thevalue) VALUES %s",
values,
template="(%s, ST_MakePoint(%s,%s,0), %s)"
)
db1.commit()

SQL insert into.. where (python)

I have the following code:
def create_table():
c.execute('CREATE TABLE IF NOT EXISTS TEST(SITE TEXT, SPORT TEXT, TOURNAMENT TEXT, TEAM_1 TEXT, TEAM_2 TEXT, DOUBLE_CHANCE_1X TEXT, DOUBLE_CHANCE_X2 TEXT, DOUBLE_CHANCE_12 TEXT, DRAW_1 TEXT, DRAW_2 TEXT DATE_ODDS TEXT, TIME_ODDS TEXT)')
create_table()
def data_entry():
c.execute("INSERT INTO TEST(SITE, SPORT, TOURNAMENT, TEAM_1, TEAM_2, DOUBLE_CHANCE_1X, DOUBLE_CHANCE_X2, DOUBLE_CHANCE_12, DATE_ODDS, TIME_ODDS) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
(Site, sport.strip(), tournament.strip(), team_1.strip(), team_2.strip(), x_odd.strip(), y_odd.strip(), z_odd.strip(), Date_odds, Time_odds))
conn.commit()
def double_chance():
c.execute("UPDATE TEST SET DOUBLE_CHANCE_1X = x_odd, DOUBLE_CHANCE_X2 = y_odd, DOUBLE_CHANCE_12 = z_odd WHERE TOURNAMENT = tournament and TEAM_1 = team_1 and TEAM_2 = team_2 and DATE_ODDS = Date_odds and TIME_ODDS = Time_odds")
conn.commit()
driver.get(link)
Date_odds = time.strftime('%Y-%m-%d')
Time_odds = time.strftime('%H:%M')
sport = (driver.find_element_by_xpath(".//*[#id='breadcrumb']/li[2]/a")).text #example Footbal
tournament = (driver.find_element_by_xpath(".//*[#id='breadcrumb']/li[4]/a")).text #example Premier League
try:
div = (driver.find_element_by_xpath(".//*[#id='breadcrumb']/li[5]/a")).text #to find any division if exists
except NoSuchElementException:
div = ""
market = driver.find_element_by_xpath(".//*[contains(#id,'ip_market_name_')]")
market_name = market.text
market_num = market.get_attribute('id')[-9:]
print market_num
team_1 = (driver.find_element_by_xpath(".//*[#id='ip_marketBody" + market_num + "']/tr/td[1]//*[contains(#id,'name')]")).text
team_2 = (driver.find_element_by_xpath(".//*[#id='ip_marketBody" + market_num + "']/tr/td[3]//*[contains(#id,'name')]")).text
print sport, tournament, market_name, team_1, team_2
data_entry() #first SQL call
for ip in driver.find_elements_by_xpath(".//*[contains(#id,'ip_market3')]"):
num = ip.get_attribute('id')[-9:]
type = (driver.find_element_by_xpath(".//*[contains(#id,'ip_market_name_" + num + "')]")).text
if type == 'Double Chance':
print type
print num
x_odd = (driver.find_element_by_xpath(".//*[#id='ip_market" + num + "']/table/tbody/tr/td[1]//*[contains(#id,'price')]")).text
y_odd = (driver.find_element_by_xpath(".//*[#id='ip_market" + num + "']/table/tbody/tr/td[2]//*[contains(#id,'price')]")).text
z_odd = (driver.find_element_by_xpath(".//*[#id='ip_market" + num + "']/table/tbody/tr/td[3]//*[contains(#id,'price')]")).text
print x_odd, y_odd, z_odd
double_chance() #second SQL call
c.close()
conn.close()
Update:
Based on the answer below I updated the code, but I can't make it work.
When I run it, I get the following error:
sqlite3.OperationalError: no such column: x_odd
What should I do?
Update 2:
I found the solution:
I created an unique ID in order to be able to select exactly the row I want when I run the second SQL query. In this case it doesn't modify any other rows:
def double_chance():
c.execute("UPDATE TEST SET DOUBLE_CHANCE_1X = (?), DOUBLE_CHANCE_X2 = (?), DOUBLE_CHANCE_12 = (?) WHERE ID = (?)",(x_odd, y_odd, z_odd, ID_unique))
conn.commit()
Now it works perfectly.
Use the UPDATE statement to update columns in an existing row.
UPDATE TEST SET DRAW_1=value1,DRAW_2=value2 WHERE column3=value3;
If data_entry(1) is always called first, then change the statement in data_entry_2() to UPDATE. If not you will need to check if the row exists in both cases and INSERT or UPDATE accordingly.

How to use python to ETL between databases?

Using psycopg2, I'm able to select data from a table in one PostgreSQL database connection and INSERT it into a table in a second PostgreSQL database connection.
However, I'm only able to do it by setting the exact feature I want to extract, and writing out separate variables for each column I'm trying to insert.
Does anyone know of a good practice for either:
moving an entire table between databases, or
iterating through features while not having to declare variables for every column you want to move
or...?
Here's the script I'm currently using where you can see the selection of a specific feature, and the creation of variables (it works, but this is not a practical method):
import psycopg2
connDev = psycopg2.connect("host=host1 dbname=dbname1 user=postgres password=*** ")
connQa = psycopg2.connect("host=host2 dbname=dbname2 user=postgres password=*** ")
curDev = connDev.cursor()
curQa = connQa.cursor()
sql = ('INSERT INTO "tempHoods" (nbhd_name, geom) values (%s, %s);')
curDev.execute('select cast(geom as varchar) from "CCD_Neighborhoods" where nbhd_id = 11;')
tempGeom = curDev.fetchone()
curDev.execute('select nbhd_name from "CCD_Neighborhoods" where nbhd_id = 11;')
tempName = curDev.fetchone()
data = (tempName, tempGeom)
curQa.execute (sql, data)
#commit transactions
connDev.commit()
connQa.commit()
#close connections
curDev.close()
curQa.close()
connDev.close()
connQa.close()
One other note is that python allows the ability to explicitly work with SQL functions / data type casting, which for us is important as we work with the GEOMETRY data type. Above you can see I'm casting it to TEXT then dumping it into an existing geometry column in the source table - this will work with MSSQL Server, which is a huge feature in the geospatial community...
In your solution (your solution and your question have a different order of statements) change the lines which start with 'sql = ' and the loop before '#commit transactions' comment to
sql_insert = 'INSERT INTO "tempHoods" (nbhd_id, nbhd_name, typology, notes, geom) values '
sql_values = ['(%s, %s, %s, %s, %s)']
data_values = []
# you can make this larger if you want
# ...try experimenting to see what works best
batch_size = 100
sql_stmt = sql_insert + ','.join(sql_values*batch_size) + ';'
for i, row in enumerate(rows, 1):
data_values += row[:5]
if i % batch_size == 0:
curQa.execute (sql_stmt , data_values )
data_values = []
if (i % batch_size != 0):
sql_stmt = sql_insert + ','.join(sql_values*(i % batch_size)) + ';'
curQa.execute (sql_stmt , data_values )
BTW, I don't think you need to commit. You don't begin any transactions. So there should not be any need to commit them. Certainly, you don't need to commit a cursor if all you did was a bunch of selects on it.
Here's my updated code based on Dmitry's brilliant solution:
import psycopg2
connDev = psycopg2.connect("host=host1 dbname=dpspgisdev user=postgres password=****")
connQa = psycopg2.connect("host=host2 dbname=dpspgisqa user=postgres password=****")
curDev = connDev.cursor()
curQa = connQa.cursor()
print "Truncating Source"
curQa.execute('delete from "tempHoods"')
connQa.commit()
#Get Data
curDev.execute('select nbhd_id, nbhd_name, typology, notes, cast(geom as varchar) from "CCD_Neighborhoods";') #cast geom to varchar and insert into geometry column!
rows = curDev.fetchall()
sql_insert = 'INSERT INTO "tempHoods" (nbhd_id, nbhd_name, typology, notes, geom) values '
sql_values = ['(%s, %s, %s, %s, %s)'] #number of columns selecting / inserting
data_values = []
batch_size = 1000 #customize for size of tables...
sql_stmt = sql_insert + ','.join(sql_values*batch_size) + ';'
for i, row in enumerate(rows, 1):
data_values += row[:5] #relates to number of columns (%s)
if i % batch_size == 0:
curQa.execute (sql_stmt , data_values )
connQa.commit()
print "Inserting..."
data_values = []
if (i % batch_size != 0):
sql_stmt = sql_insert + ','.join(sql_values*(i % batch_size)) + ';'
curQa.execute (sql_stmt, data_values)
print "Last Values..."
connQa.commit()
# close connections
curDev.close()
curQa.close()
connDev.close()
connQa.close()

Categories

Resources