Create/Insert Json in Postgres with requests and psycopg2 - python

Just started a project with PostgreSQL. I would like to make the leap from Excel to a database and I am stuck on create and insert. Once I run this I will have to switch it to Update I believe so I don't continue to write over the current data. I know my connection is working but i get the following error.
My Error is: TypeError: not all arguments converted during string formatting
#!/usr/bin/env python
import requests
import psycopg2
conn = psycopg2.connect(database='NHL', user='postgres', password='postgres', host='localhost', port='5432')
req = requests.get('http://www.nhl.com/stats/rest/skaters?isAggregate=false&reportType=basic&isGame=false&reportName=skatersummary&sort=[{%22property%22:%22playerName%22,%22direction%22:%22ASC%22},{%22property%22:%22goals%22,%22direction%22:%22DESC%22},{%22property%22:%22assists%22,%22direction%22:%22DESC%22}]&cayenneExp=gameTypeId=2%20and%20seasonId%3E=20172018%20and%20seasonId%3C=20172018')
data = req.json()['data']
my_data = []
for item in data:
season = item['seasonId']
player = item['playerName']
first_name = item['playerFirstName']
last_Name = item['playerLastName']
playerId = item['playerId']
height = item['playerHeight']
pos = item['playerPositionCode']
handed = item['playerShootsCatches']
city = item['playerBirthCity']
country = item['playerBirthCountry']
state = item['playerBirthStateProvince']
dob = item['playerBirthDate']
draft_year = item['playerDraftYear']
draft_round = item['playerDraftRoundNo']
draft_overall = item['playerDraftOverallPickNo']
my_data.append([playerId, player, first_name, last_Name, height, pos, handed, city, country, state, dob, draft_year, draft_round, draft_overall, season])
cur = conn.cursor()
cur.execute("CREATE TABLE t_skaters (data json);")
cur.executemany("INSERT INTO t_skaters VALUES (%s)", (my_data,))
Sample of data:
[[8468493, 'Ron Hainsey', 'Ron', 'Hainsey', 75, 'D', 'L', 'Bolton', 'USA', 'CT', '1981-03-24', 2000, 1, 13, 20172018], [8471339, 'Ryan Callahan', 'Ryan', 'Callahan', 70, 'R', 'R', 'Rochester', 'USA', 'NY', '1985-03-21', 2004, 4, 127, 20172018]]

It seems like you want to create a table with one column named "data". The type of this column is JSON. (I would recommend creating one column per field, but it's up to you.)
In this case the variable data (that is read from the request) is a list of dicts. As I mentioned in my comment, you can loop over data and do the inserts one at a time as executemany() is not faster than multiple calls to execute().
What I did was the following:
Create a list of fields that you care about.
Loop over the elements of data
For each item in data, extract the fields into my_data
Call execute() and pass in json.dumps(my_data) (Converts my_data from a dict into a JSON-string)
Try this:
#!/usr/bin/env python
import requests
import psycopg2
import json
conn = psycopg2.connect(database='NHL', user='postgres', password='postgres', host='localhost', port='5432')
req = requests.get('http://www.nhl.com/stats/rest/skaters?isAggregate=false&reportType=basic&isGame=false&reportName=skatersummary&sort=[{%22property%22:%22playerName%22,%22direction%22:%22ASC%22},{%22property%22:%22goals%22,%22direction%22:%22DESC%22},{%22property%22:%22assists%22,%22direction%22:%22DESC%22}]&cayenneExp=gameTypeId=2%20and%20seasonId%3E=20172018%20and%20seasonId%3C=20172018')
# data here is a list of dicts
data = req.json()['data']
cur = conn.cursor()
# create a table with one column of type JSON
cur.execute("CREATE TABLE t_skaters (data json);")
fields = [
'seasonId',
'playerName',
'playerFirstName',
'playerLastName',
'playerId',
'playerHeight',
'playerPositionCode',
'playerShootsCatches',
'playerBirthCity',
'playerBirthCountry',
'playerBirthStateProvince',
'playerBirthDate',
'playerDraftYear',
'playerDraftRoundNo',
'playerDraftOverallPickNo'
]
for item in data:
my_data = {field: item[field] for field in fields}
cur.execute("INSERT INTO t_skaters VALUES (%s)", (json.dumps(my_data),))
# commit changes
conn.commit()
# Close the connection
conn.close()
I am not 100% sure if all of the postgres syntax is correct here (I don't have access to a PG database to test), but I believe that this logic should work for what you are trying to do.
Update For Separate Columns
You can modify your create statement to handle multiple columns, but it would require knowing the data type of each column. Here's some psuedocode you can follow:
# same boilerplate code from above
cur = conn.cursor()
# create a table with one column per field
cur.execute(
"""CREATE TABLE t_skaters (seasonId INTEGER, playerName VARCHAR, ...);"""
)
fields = [
'seasonId',
'playerName',
'playerFirstName',
'playerLastName',
'playerId',
'playerHeight',
'playerPositionCode',
'playerShootsCatches',
'playerBirthCity',
'playerBirthCountry',
'playerBirthStateProvince',
'playerBirthDate',
'playerDraftYear',
'playerDraftRoundNo',
'playerDraftOverallPickNo'
]
for item in data:
my_data = [item[field] for field in fields]
# need a placeholder (%s) for each variable
# refer to postgres docs on INSERT statement on how to specify order
cur.execute("INSERT INTO t_skaters VALUES (%s, %s, ...)", tuple(my_data))
# commit changes
conn.commit()
# Close the connection
conn.close()
Replace the ... with the appropriate values for your data.

Related

MySQL Insert / update from python, data from excel spreadsheet

Upsert to MySQL using python and data from excel.
Im working on populating a MySQL DB, using python.
The data is stored on excel sheets.
Because the DB is suppossed to be used for monitoring "projects", there's a posibility for repeated pk, so in that case it need to be updated instead of insert, because a project can have many stages.
Also, there's a value to be inserted in the DB table, that can't be added from the spreadsheet. So i'm wondering if in that case, the insert of this value, most be done using a separated query for it or if theres a way to insert it in the same query. The value is the supplier ID and needs to be inserted between id_ops and cif_store.
And to finish, I need to perform an inner join, to import the store_id using the store_cif, from another table called store. I know how do it, but im wondering if it also must be executed from a sepparated query or can be performed at the sameone.
So far, i have done this.
import xlrd
import MySQLdb
def insert():
book = xlrd.open_workbook(r"C:\Users\DevEnviroment\Desktop\OPERACIONES.xlsx")
sheet = book.sheet_by_name("Sheet1")
database = MySQLdb.connect (host="localhost", user = "pytest", passwd = "password", db = "opstest1")
cursor = database.cursor()
query = """INSERT INTO operation (id_ops, cif_store, date, client,
time_resp, id_area_service) VALUES (%s, %s, %s, %s, %s, %s)"""
for r in range(1, sheet.nrows):
id_ops = sheet.cell(r,0).value
cif_store = sheet.cell(r,1).value
date = sheet.cell(r,2).value
client = sheet.cell(r,3).value
time_resp = sheet.cell(r,4).value
id_area_service = sheet.cell(r,5).value
values = (id_ops, cif_store, date, client, time_resp, id_area_service)
cursor.execute(query, values)
# Close the cursor
cursor.close()
# Commit the transaction
database.commit()
# Close the database connection
database.close()
# Print results
print ("")
print ("")
columns = str(sheet.ncols)
rows = str(sheet.nrows)
print ("Imported", columns,"columns and", rows, "rows. All Done!")
insert()
What you are looking for is INSERT ... ON DUPLICATE KEY UPDATE ...
Take a look here https://dev.mysql.com/doc/refman/8.0/en/insert-on-duplicate.html
Regarding the extraneous data, if its a static value for all rows you can just hard code it right into the INSERT query. If it's dynamic you'll have to write some additional logic.
For example:
query = """INSERT INTO operation (id_ops, hard_coded_value, cif_store, date, client,
time_resp, id_area_service) VALUES (%s, "my hard coded value", %s, %s, %s, %s, %s)"""

Handling of binary data with Python + psycopg2 (DB: Postgres 9.5)

Lets say I have table test with columns id (of bigint type) & data (of bytea type).
I don't want to display actual binary data from column data when I execute below query in Python.
select * from test;
I just want a place holder which display <binary data> or <BLOB>, because some of data are in hundreds of MB in that column which does not make any sense to display binary data in column.
Is it possible to identify and replace binary data with place holder in psycopg2 ?
#!/usr/bin/python
import psycopg2
conn = psycopg2.connect(database = "testdb", user = "postgres", password = "pass123", host = "127.0.0.1", port = "5432")
print "Opened database successfully"
cur = conn.cursor()
cur.execute("SELECT id, data from test")
rows = cur.fetchall()
for row in rows:
print("ID = ", row[0])
print("DATA = ", row[1])
print "Operation done successfully";
conn.close()
We fetch the result from database and generate html report from the result, here user can provide any query in html textbox so query is not static, we execute that query and generate the html table. This is in-house report generation script.
If the data is bytea you can write your own bytea typecaster as an object that wraps a binary string.
Note that the data is fetched and sent on the network anyway. If you don't want that overhead just don't select those fields.
>>> import psycopg2
>>> class Wrapper:
... def __init__(self, thing):
... self.thing = thing
...
>>> psycopg2.extensions.register_type(
... psycopg2.extensions.new_type(
... psycopg2.BINARY.values, "WRAPPER", lambda x, cur: Wrapper(x)))
>>> cnn = psycopg2.connect('')
>>> cur = cnn.cursor()
>>> cur.execute("create table btest(id serial primary key, data bytea)")
>>> cur.execute("insert into btest (data) values ('foobar')")
>>> cur.execute("select * from btest")
>>> r = cur.fetchone()
>>> r
(1, <__main__.Wrapper instance at 0x7fb8740eba70>)
>>> r[1].thing
'\\x666f6f626172'
Please refer to the documentation for the extensions functions used.

Iterating through MySQL table and updating

I have a MySQL table which stores a couple thousands addresses. I need to parse them to geolocation API, get latitude and longitude and then put them back into corresponding address row (I made special columns for that). The question is what is the most efficient way to do it? Currently I am using python with mysql.connector and geopy for geolocations. So there is a simple code I use for geocoding:
cursor = conn.cursor()
cursor.execute("SELECT description FROM contacts WHERE kind = 'Home adress'")
row = cursor.fetchone()
while row is not None:
geocoded = geolocator.geocode(row, exactly_one=True)
if geocoded is not None:
lat = geocoded.latitude
lon = geocoded.longitude
row = cursor.fetchone()
You can use cursor.executemany() to update the table in one go. This requires that a list of update parameters be created which can then be passed to executemany(). The parameter list can be created from the results of the initial SELECT query. In the example below I have assumed that there is some primary key named key_id for the contacts table:
cursor = conn.cursor()
cursor.execute("SELECT key_id, description FROM contacts WHERE kind = 'Home adress'")
update_params = []
for key_id, description in cursor:
geocoded = geolocator.geocode(description, exactly_one=True)
if geocoded is not None:
lat = geocoded.latitude
lon = geocoded.longitude
update_params.append((lat, lon, key_id))
c.executemany("update contacts set lat = %s, lon = %s where key_id = %s", update_params)
As mentioned above this assumes existence of a primary key. If there is not one and description is a unique field in the table then you could use that. Just remove key_id from the SELECT query, and replace key_id with the description field for both the update_params list and the update query.
#mhavke, thanks a lot! Just what I needed. Here is a finally working code (I made some adjustments). Also I am aware that using '%s' is unsafe, but this goes for internal use only, so not really worried about it.
cursor = conn.cursor()
cursor.execute("SELECT key_id, description FROM contacts WHERE kind = 'Home address'")
update_params = []
for key_id, description in cursor:
geocoded = geolocator.geocode(description, exactly_one=True)
if geocoded is not None:
lat = geocoded.latitude
lon = geocoded.longitude
update_params.append((lat, lon, key_id))
cursor.executemany("update contacts set latitude = %s, longitude = %s where key_id = %s", update_params)
conn.commit()

How to store python dictionary in to mysql DB through python

I am trying to store the the following dictionary into mysql DB by converting the dictionary into a string and then trying to insert, but I am getting following error. How can this be solved, or is there any other way to store a dictionary into mysql DB?
dic = {'office': {'component_office': ['Word2010SP0', 'PowerPoint2010SP0']}}
d = str(dic)
# Sql query
sql = "INSERT INTO ep_soft(ip_address, soft_data) VALUES ('%s', '%s')" % ("192.xxx.xx.xx", d )
soft_data is a VARCHAR(500)
Error:
execution exception (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to
use near 'office': {'component_office': ['Word2010SP0', 'PowerPoint2010SP0' at line 1")
Any suggestions or help please?
First of all, don't ever construct raw SQL queries like that. Never ever. This is what parametrized queries are for. You've asking for an SQL injection attack.
If you want to store arbitrary data, as for example Python dictionaries, you should serialize that data. JSON would be good choice for the format.
Overall your code should look like this:
import MySQLdb
import json
db = MySQLdb.connect(...)
cursor = db.cursor()
dic = {'office': {'component_office': ['Word2010SP0', 'PowerPoint2010SP0']}}
sql = "INSERT INTO ep_soft(ip_address, soft_data) VALUES (%s, %s)"
cursor.execute(sql, ("192.xxx.xx.xx", json.dumps(dic)))
cursor.commit()
Change your code as below:
dic = {'office': {'component_office': ['Word2010SP0', 'PowerPoint2010SP0']}}
d = str(dic)
# Sql query
sql = """INSERT INTO ep_soft(ip_address, soft_data) VALUES (%r, %r)""" % ("192.xxx.xx.xx", d )
Try this:
dic = { 'office': {'component_office': ['Word2010SP0', 'PowerPoint2010SP0'] } }
"INSERT INTO `db`.`table`(`ip_address`, `soft_data`) VALUES (`{}`, `{}`)".format("192.xxx.xx.xx", str(dic))
Change db and table to the values you need.
It is a good idea to sanitize your inputs, and '.format' is useful when needing to use the same variable multiple times within a query. (Not that you to for this example)
dic = {'office': {'component_office': ['Word2010SP0', 'PowerPoint2010SP0']}}
ip = '192.xxx.xx.xx'
with conn.cursor() as cur:
cur.execute("INSERT INTO `ep_soft`(`ip_address`, `soft_data`) VALUES ({0}, '{1}')".format(cur.escape(ip),json.dumps(event)))
conn.commit()
If you do not use cur.escape(variable), you will need to enclose the placeholder {} in quotes.
This answer has some pseudo code regarding the connection object and the flavor of mysql is memsql, but other than that it should be straightforward to follow.
import json
#... do something
a_big_dict = getAHugeDict() #build a huge python dict
conn = getMeAConnection(...)
serialized_dict = json.dumps(a_big_dict) #serialize dict to string
#Something like this to hold the serialization...
qry_create = """
CREATE TABLE TABLE_OF_BIG_DICTS (
ROWID BIGINT NOT NULL AUTO_INCREMENT,
SERIALIZED_DICT BLOB NOT NULL,
UPLOAD_DT TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
KEY (`ROWID`) USING CLUSTERED COLUMNSTORE
);
"""
conn.execute(qry_create)
#Something like this to hold em'
qry_insert = """
INSERT INTO TABLE_OF_BIG_DICTS (SERIALIZED_DICT)
SELECT '{SERIALIZED_DICT}' as SERIALIZED_DICT;
"""
#Send it to db
conn.execute(qry_insert.format(SERIALIZED_DICT=serialized_dict))
#grab the latest
qry_read = """
SELECT a.SERIALIZED_DICT
from TABLE_OF_BIG_DICTS a
JOIN
(
SELECT MAX(UPLOAD_DT) AS MAX_UPLOAD_DT
FROM TABLE_OF_BIG_DICTS
) b
ON a.UPLOAD_DT = b.MAX_UPLOAD_DT
LIMIT 1
"""
#something like this to read the latest dict...
df_dict = conn.sql_to_dataframe(qry_read)
dict_str = df_dict.iloc[df_dict.index.min()][0]
#dicts never die they just get rebuilt
dict_better = json.loads(dict_str)

Good way to read csvData using psycopg2

I am trying to get a fast i.e. fast and not a lot of code, way to get csv data into postgres data base. I am reading into python using csvDictreader which works fine. Then I need to generate code somehow that takes the dicts and puts it into a table. I want to do this automaticaly as my tables often have hundreds of variables. (I don't want to read directly to Postgres because in many cases I must transform the data and python is good for that)
This is some of what I have got:
import psycopg2
import sys
import itertools
import sys, csv
import psycopg2.extras
import psycopg2.extensions
csvReader=csv.DictReader(open( '/home/matthew/Downloads/us_gis_data/statesp020.csv', "rb"), delimiter = ',')
#close.cursor()
x = 0
ConnectionString = "host='localhost' dbname='mydb' user='postgres' password='######"
try:
connection = psycopg2.extras.DictConnection(ConnectionString)
print "connecting"
except:
print "did not work"
# Create a test table with some data
dict_cur = connection.cursor()
#dict_cur.execute("CREATE TABLE test (id serial PRIMARY KEY, num integer, data varchar);")
for i in range(1,50):
x = x+1
print x
dict_cur.execute("INSERT INTO test (num, data) VALUES(%s, %s)",(x, 3.6))#"abc'def"))
### how to I create the table and insert value using the dictreader?
dict_cur.execute("SELECT * FROM test")
for k in range(0,x+1):
rec = dict_cur.fetchone()
print rec['num'], rec['data']
Say you have a list of field names (presumably you can get this from the header of your csv file):
fieldnames = ['Name', 'Address', 'City', 'State']
Assuming they're all VARCHARs, you can create the table "TableName":
sql_table = 'CREATE TABLE TableName (%s)' % ','.join('%s VARCHAR(50)' % name for name in fieldnames)
cursor.execute(sql_table)
You can insert the rows from a dictionary "dict":
sql_insert = ('INSERT INTO TableName (%s) VALUES (%s)' %
(','.join('%s' % name for name in fieldnames),
','.join('%%(%s)s' % name for name in fieldnames)))
cursor.execute(sql_insert, dict)
Or do it in one go, given a list dictionaries:
dictlist = [dict1, dict2, ...]
cursor.executemany(sql_insert, dictlist)
You can adapt this as necessary based on the type of your fields and the use of DictReader.
I am a novice but this worked for me. I used PG Admin to create the 'testCSV' table.
import psycopg2 as dbapi
con = dbapi.connect(database="testpg", user="postgres", password="secret")
cur = con.cursor()
import csv
csvObject = csv.reader(open(r'C:\testcsv.csv', 'r'), dialect = 'excel', delimiter = ',')
passData = "INSERT INTO testCSV (param1, param2, param3, param4, param5) VALUES (%s,%s,%s,%s,%s);"
for row in csvObject:
csvLine = row
cur.execute(passData, csvLine)
con.commit()

Categories

Resources