Preserve Python None/NULL when storing JSON in Postgres

Preserve Python None/NULL when storing JSON in Postgres - python

I have a PostgreSQL database table that has a JSON datatype column. When I store JSON strings in the database, None is not preserved. This can be reproduced with this code snippet:
import json,psycopg2
dictionary = {}
dictionary[None] = 0
conn = psycopg2.connect("host='localhost' dbname='testdb' user='postgres'
password='postgres'")
cursor = conn.cursor()
cursor.execute("""INSERT INTO testtable (json_column) VALUES (%s)""",
(json.dumps(dictionary), ))
cursor.execute("""SELECT json_column FROM testtable""")
result = cursor.fetchall()[0][0].keys()[0]
print result
print type(result)
if result is None:
print 'result is None'
else:
print 'result is not None'
The output of this Python code is:
drew#debian:~/python$ python test.py
null
<type 'unicode'>
result is not None
drew#debian:~/python$
How can I store None as a key in the JSON column? The JSON object also has keys for 'None' and 'null', so the value stored must be None or null.

From RFC 7159:
An object structure is represented as a pair of curly brackets
surrounding zero or more name/value pairs (or members). A name is a
string.
And from the Python docs:
Keys in key/value pairs of JSON are always of the type str. When a
dictionary is converted into JSON, all the keys of the dictionary are
coerced to strings.
JSON does not have any concept of non-string dict keys. If you want those, you don't want JSON. The closest you'll get with JSON is detecting the string None gets converted to and hoping that you never need to actually use the string 'null' as a dict key.

Related

populating sqlite3 db with links in python

I'm trying to populate a database with a single column with a list of strings (links). I scraped the list and I must modify every single link before sending it to the database. This is the code:
for event in events:
link_url = "https://www.website.com"+event+"#all"
c.execute("INSERT INTO table (links) VALUES(?)", link_url)
I can get it working if I modify the variables and send a tuple, like this:
for event in events:
link_url = "https://www.website.com"+event+"#all"
link = (link_url,)
c.execute("INSERT INTO seriea (links) VALUES(?)", link_url)
but I don't want to use this solution since I want to get a list of strings back out later:
c = connection.execute('select links from table')
list_of_urls = c.fetchall()
But this gives me a list of tuples.
This is the error I have: ProgrammingError: Incorrect number of bindings supplied. The current statement uses 1, and there are 80 supplied.
I think that's because the string characters are counted (actually more but I noticed that the number before "supplied" changes with the link fed)

I don't want to use this solution since I want to get a list of strings back out later:
c = connection.execute('select links from table')
list_of_urls = c.fetchall()
But this gives me a list of tuples.
The list of tuples you're getting when you do a select have nothing to do with the way you insert data. Remember, tables have two dimensions:
id
links
something
else
1
"foo"
"bar"
"baz"
2
"quux"
"herp"
"derp"
When you do a select you get a list that corresponds to the rows here. But each row has multiple fields: id, links, something, and else. Each tuple in the list contains the values for each of the fields in the table.
If you just want the URLs as a list of strings you can use a list comprehension or similar:
c = connection.execute('select links from table')
list_of_rows = c.fetchall()
list_of_strings = [row[0] for row in list_of_rows]
# ^ index of first element in
# ^^^ the tuple of values for each row
Note that you do have to provide a tuple or other sequence when you insert the data:
For the qmark style, parameters must be a sequence. For the named style, it can be either a sequence or dict instance. The length of the sequence must match the number of placeholders, or a ProgrammingError is raised. If a dict is given, it must contain keys for all named parameters.
You might be thinking of the tuple part of it the wrong way. You don't need to pass in a tuple of URLs, you need to pass in a tuple of parameters. You're not saying "the links column should contain this tuple" but rather "this tuple contains enough values to fill in the placeholders in this query".
I'd rewrite that like so:
for event in events:
link_url = "https://www.website.com"+event+"#all"
c.execute("INSERT INTO seriea (links) VALUES(?)", (link_url,))
This is so you can have multiple parameters, e.g.
c.execute(
"INSERT INTO seriea (links, some, other) VALUES(?, ?, ?)",
(link_url, foo, bar),
)
The current statement uses 1, and there are 80 supplied.
I think that's because the string characters are counted
Yes, that's most likely what's happening. c.execute() expects to receive a sequence, and strings are a sequence of characters.

TypeError: Error while fetching data from PostgresSQL not all arguments converted during string formatting

TypeError: Error while fetching data from PostgresSQL not all
arguments converted during string formatting.
Please help me
Main_List = [[1,12345,10.100.30.27,2020-10-09 09:45:31,12,0,0,9]]
records_to_insert= [tuple(x) for x in Main_list]
sql_insert_query = """ INSERT INTO dailypunch (ip_address, emp_code, datetime,number)
VALUES (%s,%s,%s,%s) """
result = cursor.executemany(sql_insert_query, records_to_insert)

Your tuple has 8 fields and you are trying to insert into table with 4 fields. Values in tuple needs to be in correct order - ip_addres should be first, emp_code second and so on. What's more your Main_List is not a valid list in python it lacks quotes:
Main_List = [[1,12345,'10.100.30.27','2020-10-09 09:45:31',12,0,0,9]]

SQLite ignore None on INSERT

I'm using an API that returns different dicts depending on the query. So because I can't be certain I'll have the desired keys, I'm using dict.get() to avoid raising a KeyError. But I am currently inserting these results into a database, and would like to avoid filling rows with None.
What would be the preferred way to deal with this?

Use NULL as default value with dict.get(). In case key is not present in your dict object, it will return NULL instead of None. 'NULL' in databases (most) is equivalent to None in Python. For example:
>>> my_dict = {}
# v Returns `NULL` if key not found in `my_dict`
>>> my_dict.get('key', 'NULL')
'NULL'
In case, you have column as NOT NULL, set them as empty string. For example:
>>> my_dict.get('key', '')
''

TypeError when inserting JSON data into MySQL using MySQL-Python

I'm trying to insert data from a JSON string to MySQL using MySQLdb. The total number of columns is fixed. Each row of data from the JSON string does not always have values for each column.
Here is my sample code:
vacant_building = 'http://data.cityofchicago.org/resource/7nii-7srd.json?%24where=date_service_request_was_received=%272014-06-02T00:00:00%27'
obj = urllib2.urlopen(vacant_building)
data = json.load(obj)
def insert_mysql(columns, placeholders, data):
sql = "INSERT INTO vacant_buildings (%s) VALUES (%s)" % (columns, placeholders)
db = MySQLdb.connect(host="localhost", user="xxxx", passwd="xxxx", db="chicago_data")
cur = db.cursor()
cur.execute(sql, data)
for row in data:
placeholders = ', '.join(['%s'] * len(row))
columns = ', '.join(c[:64] for c in row.keys())
row_data = ', '.join(str(value) for value in row.values())
insert_mysql(columns, placeholders, row_data)
I get the following error:
query = query % tuple([db.literal(item) for item in args])
TypeError: not all arguments converted during string formatting
I'm pretty sure the error has to do with the way I'm inserting the values. I've tried to change this to:
sql = "INSERT INTO vacant_buildings (%s) VALUES (%s) (%s)" % (columns, placeholders, data)
but I get a 1064 error. It's because the values are not enclosed by quotes (').
Thoughts to fix?

In order to parameterize your query using MySQLdb's cursor.execute method, the second argument to execute has to be a sequence of values; in your for loop, you're joining the values together into one string with the following line:
row_data = ', '.join(str(value) for value in row.values())
Since you generated a number of placeholders for your values equal to len(row), you need to supply that many values to cursor.execute. If you gave it only a single string, it will put that entire string into the first placeholder, leaving the others without any arguments. This will throw a TypeError - the message in this case would read, "not enough arguments for format string," but I'm going to assume you simply mixed up when copy/pasting because the opposite case (supplying too many arguments/too few placeholders) reads as you indicate, "not all arguments converted during string formatting."
In order to run an INSERT statement through MySQLdb with a variable set of columns, you could do just as you've done for the columns and placeholders, but I prefer to use mapping types with the extended formatting syntax supported by MySQLdb (e.g., %(name)s instead of %s) to make sure that I've constructed my query correctly and not put the values into any wrong order. I also like using advanced string formatting where possible in my own code.
You could prepare your inputs like this:
max_key_length = 64
columns = ','.join(k[:max_key_length] for k in row.keys())
placeholders = ','.join('%({})s'.format(k[:max_key_length]) for k in row.keys())
row_data = [str(v) for v in row.values()]
Noting that the order of the dict comprehensions is guaranteed, so long as you don't alter the dict in the meanwhile.
Generally speaking, this should work okay with the sort of code in your insert_mysql function. However, looking at the JSON data you're actually pulling from that URL, you should be aware that you may run into nesting issues; for example:
>>> pprint.pprint(data[0])
{u'address_street_direction': u'W',
u'address_street_name': u'61ST',
u'address_street_number': u'424',
u'address_street_suffix': u'ST',
u'any_people_using_property_homeless_childen_gangs_': True,
u'community_area': u'68',
u'date_service_request_was_received': u'2014-06-02T00:00:00',
u'if_the_building_is_open_where_is_the_entry_point_': u'FRONT',
u'is_building_open_or_boarded_': u'Open',
u'is_the_building_currently_vacant_or_occupied_': u'Vacant',
u'is_the_building_vacant_due_to_fire_': False,
u'latitude': u'41.78353874626324',
u'location': {u'latitude': u'41.78353874626324',
u'longitude': u'-87.63573355602661',
u'needs_recoding': False},
u'location_of_building_on_the_lot_if_garage_change_type_code_to_bgd_': u'Front',
u'longitude': u'-87.63573355602661',
u'police_district': u'7',
u'service_request_number': u'14-00827306',
u'service_request_type': u'Vacant/Abandoned Building',
u'ward': u'20',
u'x_coordinate': u'1174508.30988836',
u'y_coordinate': u'1864483.93566661',
u'zip_code': u'60621'}
The string representation of the u'location' column is:
"{u'latitude': u'41.78353874626324', u'needs_recoding': False, u'longitude': u'-87.63573355602661'}"
You may not want to put that into a database field, especially considering that there are atomic lat/lon fields already in the JSON object.

SQLAlchemy execute() return ResultProxy as Tuple, not dict

I have the following code:
query = """
SELECT Coalesce((SELECT sp.param_value
FROM sites_params sp
WHERE sp.param_name = 'ci'
AND sp.site_id = s.id
ORDER BY sp.id DESC
LIMIT 1), -1) AS ci
FROM sites s
WHERE s.deleted = 0
AND s.id = 10
"""
site = db_session.execute(query)
# print site
# <sqlalchemy.engine.result.ResultProxy object at 0x033E63D0>
site = db_session.execute(query).fetchone()
print site # (u'375')
print list(site) # [u'375']
Why does SQLAlchemy return tuples, not dicts, for this query? I want to use the following style to access the results of the query:
print site.ci
# u'375'

This is an old question, but still relevant today. Getting SQL Alchemy to return a dictionary is very useful, especially when working with RESTful based APIs that return JSON.
Here is how I did it using the db_session in Python 3:
resultproxy = db_session.execute(query)
d, a = {}, []
for rowproxy in resultproxy:
# rowproxy.items() returns an array like [(key0, value0), (key1, value1)]
for column, value in rowproxy.items():
# build up the dictionary
d = {**d, **{column: value}}
a.append(d)
The end result is that the array a now contains your query results in dictionary format.
As for how this works in SQL Alchemy:
Thedb_session.execute(query) returns a ResultProxy object
The ResultProxy object is made up of RowProxy objects
The RowProxy object has an .items() method that returns key, value tuples of all the items in the row, which can be unpacked as key, value in a for operation.
And here a one-liner alternative:
[{column: value for column, value in rowproxy.items()} for rowproxy in resultproxy]
From the docs:
class sqlalchemy.engine.RowProxy(parent, row, processors, keymap)
Proxy values from a single cursor row.
Mostly follows “ordered dictionary” behavior, mapping result values to the string-based column name, the integer position of the result in the row, as well as Column instances which can be mapped to the original Columns that produced this result set (for results that correspond to constructed SQL expressions).
has_key(key)
Return True if this RowProxy contains the given key.
items()
Return a list of tuples, each tuple containing a key/value pair.
keys()
Return the list of keys as strings represented by this RowProxy.
Link: http://docs.sqlalchemy.org/en/latest/core/connections.html#sqlalchemy.engine.RowProxy.items

Did you take a look at the ResultProxy docs?
It describes exactly what #Gryphius and #Syed Habib M suggest, namely to use site['ci'].
The ResultProxy does not "return a tuple" as you claim - it is (not surprisingly) a proxy that behaves (e.g. prints) like a tuple but also supports dictionary-like access:
From the docs:
Individual columns may be accessed by their integer position,
case-insensitive column name, or by schema.Column object. e.g.:
row = fetchone()
col1 = row[0] # access via integer position
col2 = row['col2'] # access via name
col3 = row[mytable.c.mycol] # access via Column object.

I've built a simple class to work like a database interface in our processes. Here it goes:
from sqlalchemy import create_engine
class DBConnection:
def __init__(self, db_instance):
self.db_engine = create_engine('your_database_uri_string')
self.db_engine.connect()
def read(self, statement):
"""Executes a read query and returns a list of dicts, whose keys are column names."""
data = self.db_engine.execute(statement).fetchall()
results = []
if len(data)==0:
return results
# results from sqlalchemy are returned as a list of tuples; this procedure converts it into a list of dicts
for row_number, row in enumerate(data):
results.append({})
for column_number, value in enumerate(row):
results[row_number][row.keys()[column_number]] = value
return results

You can easily convert each result row to a dictionary by using dict(site).
Then site['ci'] would be available if ci column is exists.
In order to have site.ci (according to https://stackoverflow.com/a/22084672/487460):
from collections import namedtuple
Site = namedtuple('Site', site.keys())
record = Site(*site)

This may help solve the OPs question. I think the problem he was having is that the row object only contained column values, but not the column names themselves, as is the case with ORM queries where the results have a dict attribute with both keys and values.
python sqlalchemy get column names dynamically?

The easiest way that I found is using list comprehension with calling dict() func on every RowProxy:
site = db_session.execute(query)
result = [dict(row) for row in site]

Based on Essential SQLAlchemy book:
A ResultProxy is a wrapper around a DBAPI cursor object, and its main
goal is to make it easier to use and manipulate the results of a
statement
Simple select example:
from sqlalchemy.sql import select
stmnt = select([cookies])
result_proxy = connection.execute(stmnt)
results = result_proxy.fetchall()
Results going to be like this:
# ID, cookie_name, quantity, amount
[
(1, u'chocolate chip', 12, Decimal('0.50')),
(2, u'dark chocolate chip', 1, Decimal('0.75')),
(3, u'peanut butter', 24, Decimal('0.25')),
(4, u'oatmeal raisin', 100, Decimal('1.00'))
]
It makes handling query results easier by allowing access using an index, name, or Column object.
Accessing cookie_name in different ways:
first_row = results[0]
first_row[1]
first_row.cookie_name
first_row[cookies.c.cookie_name]
These all result in u'chocolate chip' and they each reference the exact same data element in the first record of our results variable. This flexibility in access is only part of the power of the ResultProxy.
We can also leverage the ResultProxy as an iterable:
result_proxy = connection.execute(stmnt)
for record in result_proxy:
print(record.cookie_name)

This method uses list comprehensions, it receives a sql alchemy rowset object and returns the same items as a list of dictionaries:
class ResultHelper():
#classmethod
def resultproxy_to_list(cls, sql_alchemy_rowset):
return [{tuple[0]: tuple[1] for tuple in rowproxy.items()}
for rowproxy in sql_alchemy_rowset]

As you call db.execute(sql).fetchall(), you can easily use the following function to parse the return data to a dict:
def query_to_dict(ret):
if ret is not None:
return [{key: value for key, value in row.items()} for row in ret if row is not None]
else:
return [{}]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Preserve Python None/NULL when storing JSON in Postgres - python

Related

populating sqlite3 db with links in python

TypeError: Error while fetching data from PostgresSQL not all arguments converted during string formatting

SQLite ignore None on INSERT

TypeError when inserting JSON data into MySQL using MySQL-Python

SQLAlchemy execute() return ResultProxy as Tuple, not dict

Categories

Resources