Handling of binary data with Python + psycopg2 (DB: Postgres 9.5) - python

Lets say I have table test with columns id (of bigint type) & data (of bytea type).
I don't want to display actual binary data from column data when I execute below query in Python.
select * from test;
I just want a place holder which display <binary data> or <BLOB>, because some of data are in hundreds of MB in that column which does not make any sense to display binary data in column.
Is it possible to identify and replace binary data with place holder in psycopg2 ?
#!/usr/bin/python
import psycopg2
conn = psycopg2.connect(database = "testdb", user = "postgres", password = "pass123", host = "127.0.0.1", port = "5432")
print "Opened database successfully"
cur = conn.cursor()
cur.execute("SELECT id, data from test")
rows = cur.fetchall()
for row in rows:
print("ID = ", row[0])
print("DATA = ", row[1])
print "Operation done successfully";
conn.close()
We fetch the result from database and generate html report from the result, here user can provide any query in html textbox so query is not static, we execute that query and generate the html table. This is in-house report generation script.

If the data is bytea you can write your own bytea typecaster as an object that wraps a binary string.
Note that the data is fetched and sent on the network anyway. If you don't want that overhead just don't select those fields.
>>> import psycopg2
>>> class Wrapper:
... def __init__(self, thing):
... self.thing = thing
...
>>> psycopg2.extensions.register_type(
... psycopg2.extensions.new_type(
... psycopg2.BINARY.values, "WRAPPER", lambda x, cur: Wrapper(x)))
>>> cnn = psycopg2.connect('')
>>> cur = cnn.cursor()
>>> cur.execute("create table btest(id serial primary key, data bytea)")
>>> cur.execute("insert into btest (data) values ('foobar')")
>>> cur.execute("select * from btest")
>>> r = cur.fetchone()
>>> r
(1, <__main__.Wrapper instance at 0x7fb8740eba70>)
>>> r[1].thing
'\\x666f6f626172'
Please refer to the documentation for the extensions functions used.

Related

Read PostgreSQL array data in Python

After a query with Python's psycopg2
SELECT
id,
array_agg(еnty_pub_uuid) AS ptr_entity_public
FROM table
GROUP BY id
I get returned an array:
{a630e0a3-c544-11ea-9b8c-b73c488956ba,c2f03d24-2402-11eb-ab91-3f8e49eb63e7}
How can I parse this to a list in python?
Is there a builtin function in psycopg2?
psycopg2 cares about type conversations between python and postgres:
import psycopg2
conn = psycopg2.connect("...")
cur = conn.cursor()
cur.execute(
"select user_id, array_agg(data_name) from user_circles where user_id = '81' group by user_id"
)
res = cur.fetchall()
print(res[0])
print(type(res[0][1]))
Out:
('81', ['f085b2e3-b943-429e-850f-4ecf358abcbc', '65546d63-be96-4711-a4c1-a09f48fbb7f0', '81d03c53-9d71-4b18-90c9-d33322b0d3c6', '00000000-0000-0000-0000-000000000000'])
<class 'list'>
you need to register the UUID type for python and postgres to infer types.
import psycopg2.extras
psycopg2.extras.register_uuid()
sql = """
SELECT
id,
array_agg(еnty_pub_uuid) AS ptr_entity_public
FROM table
GROUP BY id
"""
cursor = con.cursor()
cursor.execute(sql)
results = cursor.fetchall()
for r in results:
print(type(r[1]))

Storing a pickled dictionary in Postgresql (Psycopg2)

I am trying to store a pickled nested dictionary in Postgresql (I am aware that this is a quick & dirty method and won't be able to access dictionary contents from Postgresql - usually bad practice)
# boilerplate, preamble and upstream work.
import psycopg2
''' Inputs: nd = dictionary to be pickled '''
pickled = pickle.dumps(nd)
connection = psycopg2.connect(user = "-----",
password = "----",
host = "----",
port = "----",
database = "----")
name = 'database1'
print('Connected...')
cursor = connection.cursor()
print(connection.get_dsn_parameters(),"\n")
cursor.execute("CREATE TABLE thetable (name TEXT, ablob BYTEA)")
print('Created Table...')
cursor.execute("INSERT INTO thetable VALUES(%s)",(psycopg2.Binary(pickled),))
connection.commit()
print('Added Data...')
cursor.close()
connection.close()
print('Connection closed...')
When I come to data data retrieval, I am having many issues importing the data from Postgres - essentially the data is to be opened, unpickled back to the dictionary and visualised. I have tried:
import psycopg2
from io import BytesIO
connection = psycopg2.connect(user = "----",
password = "----",
host = "----",
port = "----",
database = "----")
cursor = connection.cursor()
cursor.execute("SELECT ablob FROM thetable")
result, = cursor.fetchone()
cursor.close()
connection.rollback()
result = BytesIO(result)
print(pickle.load(result))
As per this link: https://www.oreilly.com/library/view/python-cookbook/0596001673/ch08s08.html, and consulted: Insert an image in postgresql database and: saving python object in postgres table with pickle, however have been unable to return the pickled dictionary.
Any advice in achieving this is greatly appreciated!
When your CREATE TABLE lists two fields, you have to list in INSERT which ones you want to fill, unless you fill them all.
import psycopg2
import pickle
dict = {
"foo": "bar"
}
p = pickle.dumps(dict)
connection = psycopg2.connect(database = "test")
cursor = connection.cursor()
cursor.execute("CREATE TABLE thetable (name TEXT, ablob BYTEA)")
cursor.execute("INSERT INTO thetable VALUES(%s,%s)",('test',p))
connection.commit()
cursor.close()
connection.close()
and reading
import psycopg2
import pickle
connection = psycopg2.connect(database = "test")
cursor = connection.cursor()
cursor.execute("SELECT ablob FROM thetable WHERE name='test';")
result = cursor.fetchone()
print pickle.loads(result[0])
cursor.close()
connection.close()

Create/Insert Json in Postgres with requests and psycopg2

Just started a project with PostgreSQL. I would like to make the leap from Excel to a database and I am stuck on create and insert. Once I run this I will have to switch it to Update I believe so I don't continue to write over the current data. I know my connection is working but i get the following error.
My Error is: TypeError: not all arguments converted during string formatting
#!/usr/bin/env python
import requests
import psycopg2
conn = psycopg2.connect(database='NHL', user='postgres', password='postgres', host='localhost', port='5432')
req = requests.get('http://www.nhl.com/stats/rest/skaters?isAggregate=false&reportType=basic&isGame=false&reportName=skatersummary&sort=[{%22property%22:%22playerName%22,%22direction%22:%22ASC%22},{%22property%22:%22goals%22,%22direction%22:%22DESC%22},{%22property%22:%22assists%22,%22direction%22:%22DESC%22}]&cayenneExp=gameTypeId=2%20and%20seasonId%3E=20172018%20and%20seasonId%3C=20172018')
data = req.json()['data']
my_data = []
for item in data:
season = item['seasonId']
player = item['playerName']
first_name = item['playerFirstName']
last_Name = item['playerLastName']
playerId = item['playerId']
height = item['playerHeight']
pos = item['playerPositionCode']
handed = item['playerShootsCatches']
city = item['playerBirthCity']
country = item['playerBirthCountry']
state = item['playerBirthStateProvince']
dob = item['playerBirthDate']
draft_year = item['playerDraftYear']
draft_round = item['playerDraftRoundNo']
draft_overall = item['playerDraftOverallPickNo']
my_data.append([playerId, player, first_name, last_Name, height, pos, handed, city, country, state, dob, draft_year, draft_round, draft_overall, season])
cur = conn.cursor()
cur.execute("CREATE TABLE t_skaters (data json);")
cur.executemany("INSERT INTO t_skaters VALUES (%s)", (my_data,))
Sample of data:
[[8468493, 'Ron Hainsey', 'Ron', 'Hainsey', 75, 'D', 'L', 'Bolton', 'USA', 'CT', '1981-03-24', 2000, 1, 13, 20172018], [8471339, 'Ryan Callahan', 'Ryan', 'Callahan', 70, 'R', 'R', 'Rochester', 'USA', 'NY', '1985-03-21', 2004, 4, 127, 20172018]]
It seems like you want to create a table with one column named "data". The type of this column is JSON. (I would recommend creating one column per field, but it's up to you.)
In this case the variable data (that is read from the request) is a list of dicts. As I mentioned in my comment, you can loop over data and do the inserts one at a time as executemany() is not faster than multiple calls to execute().
What I did was the following:
Create a list of fields that you care about.
Loop over the elements of data
For each item in data, extract the fields into my_data
Call execute() and pass in json.dumps(my_data) (Converts my_data from a dict into a JSON-string)
Try this:
#!/usr/bin/env python
import requests
import psycopg2
import json
conn = psycopg2.connect(database='NHL', user='postgres', password='postgres', host='localhost', port='5432')
req = requests.get('http://www.nhl.com/stats/rest/skaters?isAggregate=false&reportType=basic&isGame=false&reportName=skatersummary&sort=[{%22property%22:%22playerName%22,%22direction%22:%22ASC%22},{%22property%22:%22goals%22,%22direction%22:%22DESC%22},{%22property%22:%22assists%22,%22direction%22:%22DESC%22}]&cayenneExp=gameTypeId=2%20and%20seasonId%3E=20172018%20and%20seasonId%3C=20172018')
# data here is a list of dicts
data = req.json()['data']
cur = conn.cursor()
# create a table with one column of type JSON
cur.execute("CREATE TABLE t_skaters (data json);")
fields = [
'seasonId',
'playerName',
'playerFirstName',
'playerLastName',
'playerId',
'playerHeight',
'playerPositionCode',
'playerShootsCatches',
'playerBirthCity',
'playerBirthCountry',
'playerBirthStateProvince',
'playerBirthDate',
'playerDraftYear',
'playerDraftRoundNo',
'playerDraftOverallPickNo'
]
for item in data:
my_data = {field: item[field] for field in fields}
cur.execute("INSERT INTO t_skaters VALUES (%s)", (json.dumps(my_data),))
# commit changes
conn.commit()
# Close the connection
conn.close()
I am not 100% sure if all of the postgres syntax is correct here (I don't have access to a PG database to test), but I believe that this logic should work for what you are trying to do.
Update For Separate Columns
You can modify your create statement to handle multiple columns, but it would require knowing the data type of each column. Here's some psuedocode you can follow:
# same boilerplate code from above
cur = conn.cursor()
# create a table with one column per field
cur.execute(
"""CREATE TABLE t_skaters (seasonId INTEGER, playerName VARCHAR, ...);"""
)
fields = [
'seasonId',
'playerName',
'playerFirstName',
'playerLastName',
'playerId',
'playerHeight',
'playerPositionCode',
'playerShootsCatches',
'playerBirthCity',
'playerBirthCountry',
'playerBirthStateProvince',
'playerBirthDate',
'playerDraftYear',
'playerDraftRoundNo',
'playerDraftOverallPickNo'
]
for item in data:
my_data = [item[field] for field in fields]
# need a placeholder (%s) for each variable
# refer to postgres docs on INSERT statement on how to specify order
cur.execute("INSERT INTO t_skaters VALUES (%s, %s, ...)", tuple(my_data))
# commit changes
conn.commit()
# Close the connection
conn.close()
Replace the ... with the appropriate values for your data.

Replace L, in SQL results in python

I'm running pyodbc connected to my db and when i run a simply query I get a load of results back such as
(7L, )(12L,) etc.
How do I replace the the 'L, ' with '' so I can pass the ids into another query
Thanks
Here's my code
import pyodbc
cnxn = pyodbc.connect('DSN=...;UID=...;PWD=...', ansi=True)
cursor = cnxn.cursor()
rows = cursor.execute("select id from orders")
for row in rows:
test = cursor.execute("select name from customer where order_id = %(id)s" %{'id':row})
print test
Use parameters:
...
test = cursor.execute("select name from customer where order_id = ?", row.id)
...
The L after the number indicates that the value is a long type.

Good way to read csvData using psycopg2

I am trying to get a fast i.e. fast and not a lot of code, way to get csv data into postgres data base. I am reading into python using csvDictreader which works fine. Then I need to generate code somehow that takes the dicts and puts it into a table. I want to do this automaticaly as my tables often have hundreds of variables. (I don't want to read directly to Postgres because in many cases I must transform the data and python is good for that)
This is some of what I have got:
import psycopg2
import sys
import itertools
import sys, csv
import psycopg2.extras
import psycopg2.extensions
csvReader=csv.DictReader(open( '/home/matthew/Downloads/us_gis_data/statesp020.csv', "rb"), delimiter = ',')
#close.cursor()
x = 0
ConnectionString = "host='localhost' dbname='mydb' user='postgres' password='######"
try:
connection = psycopg2.extras.DictConnection(ConnectionString)
print "connecting"
except:
print "did not work"
# Create a test table with some data
dict_cur = connection.cursor()
#dict_cur.execute("CREATE TABLE test (id serial PRIMARY KEY, num integer, data varchar);")
for i in range(1,50):
x = x+1
print x
dict_cur.execute("INSERT INTO test (num, data) VALUES(%s, %s)",(x, 3.6))#"abc'def"))
### how to I create the table and insert value using the dictreader?
dict_cur.execute("SELECT * FROM test")
for k in range(0,x+1):
rec = dict_cur.fetchone()
print rec['num'], rec['data']
Say you have a list of field names (presumably you can get this from the header of your csv file):
fieldnames = ['Name', 'Address', 'City', 'State']
Assuming they're all VARCHARs, you can create the table "TableName":
sql_table = 'CREATE TABLE TableName (%s)' % ','.join('%s VARCHAR(50)' % name for name in fieldnames)
cursor.execute(sql_table)
You can insert the rows from a dictionary "dict":
sql_insert = ('INSERT INTO TableName (%s) VALUES (%s)' %
(','.join('%s' % name for name in fieldnames),
','.join('%%(%s)s' % name for name in fieldnames)))
cursor.execute(sql_insert, dict)
Or do it in one go, given a list dictionaries:
dictlist = [dict1, dict2, ...]
cursor.executemany(sql_insert, dictlist)
You can adapt this as necessary based on the type of your fields and the use of DictReader.
I am a novice but this worked for me. I used PG Admin to create the 'testCSV' table.
import psycopg2 as dbapi
con = dbapi.connect(database="testpg", user="postgres", password="secret")
cur = con.cursor()
import csv
csvObject = csv.reader(open(r'C:\testcsv.csv', 'r'), dialect = 'excel', delimiter = ',')
passData = "INSERT INTO testCSV (param1, param2, param3, param4, param5) VALUES (%s,%s,%s,%s,%s);"
for row in csvObject:
csvLine = row
cur.execute(passData, csvLine)
con.commit()

Categories

Resources