python string parsing issues when saving sql commands to file

python string parsing issues when saving sql commands to file - python

I have dicts data I am looping through to form INSERT commands for postgresql. where the parent keys of the dicts are the column names and the parent values are the column values:
data = {
'id': 45,
'col1': "foo's",
'col2': {'dict_key': 5}
}
columns = ', '.join(data.keys())
# replace single quote with double to form a json type for psql column
data['col2'] = data['col2'].replace("'", '"')
with open("file.sql", "w") as f:
command = "INSERT INTO table1({}) VALUES {};"
f.write(command.format(columns, tuple(data.values()))
The problem is that the output of this is not formatted correctly for sql to execute. This is the output of the above:
INSERT INTO table1(id, col1, col2) VALUES (45, "foo's", '{"dict_key":5}');
The json field is formatted correctly with the single quotes around the value. But col2 keeps the double quotes if the string in col2 contains a single quote. This is a problem because postgresql requires single quotes to identify TEXT input.
Is there a better way to parse data into psql insert commands?

Did you try using json.dump() and repr()?
columns = ', '.join(data.keys())
data['col1'] = repr(data['col1'])
data['col2'] = json.dumps(data['col2'])
...

This appears to be a limitation (or rather an implementation detail) in Python, with how the __repr__() for strings or str is defined.
Try this sample code out:
value = 'wont fix'; assert f'{value!r}' == "'wont fix'"
value = 'won\'t fix'; assert f'{value!r}' == '"won\'t fix"'
As can be seen, single quotes are preferred in the repr for strings, unless the string itself contains a single quote - in that case, double quotes are used to wrap the repr for the string.
A "quick and dirty" solution is to implement a custom string subclass, SQStr, which effectively overrides the default repr to always wrap a string with single quotes:
class SQStr(str):
def __repr__(self):
value = self.replace("'", r"\'")
return f"'{value}'"
If you want to also support double-escaped single quotes like r"\\\'", then something like this:
class SQStr(str):
def __repr__(self, _escaped_sq=r"\'", _tmp_symbol="|+*+|",
_default_repr=str.__repr__):
if "'" in self:
if _escaped_sq in self:
value = (self
.replace(_escaped_sq, _tmp_symbol)
.replace("'", _escaped_sq)
.replace(_tmp_symbol, r"\\\'"))
else:
value = self.replace("'", _escaped_sq)
return f"'{value}'"
# else, string doesn't contain single quotes, so we
# can use the default str.__repr__()
return _default_repr(self)
Now it appears to work as expected:
value = 'wont fix'; assert f'{SQStr(value)!r}' == "'wont fix'"
value = 'won\'t fix'; assert f'{SQStr(value)!r}' == r"'won\'t fix'"
# optional
value = r"won't fix, won\'t!"; assert f'{SQStr(value)!r}' == r"'won\'t fix, won\\\'t!'"

Related

how do i convert a list into a string

i have tried to use the .replace or .strip method but have been unsuccessful with doing such. I am trying to print out a single stringed list separated by commas.
does anyone know a way to make it so it us printing out with no [] or single quotes ''
def get_format(header1):
format_lookup = "SELECT ID, FormatName, HeaderRow, ColumnStatus, ColumnMobileID, ColumnVendorID, ColumnTechID, " \
"ColumnCallType, ColumnCallDate, ColumnCallTime, ColumnCallTo, ColumnQty, ColumnQtyLabel " \
"from dynamic_format WHERE HeaderRow=%s"
header1 = (str(header1),)
cursor = connection.cursor()
cursor.execute(format_lookup, header1)
record = cursor.fetchone()
return record

I suppose I'll post my comment as an answer:
In [1]: header1 = ['ESN', 'ORIGINAL_QUANTITY', 'INVOICE_DATE']
In [2]: ", ".join(header1)
Out[2]: 'ESN, ORIGINAL_QUANTITY, INVOICE_DATE'
In [3]: print(", ".join(header1))
ESN, ORIGINAL_QUANTITY, INVOICE_DATE
The reason you're getting those errors is because header1 is a list object and .replace() is a string method.
#sbabtizied's answer is what you'd use if header1 was a string:
# a single string, what sbabti assumed you had
"['ESN', 'ORIGINAL_QUANTITY', 'INVOICE_DATE']"
# a list of strings, what you actually have
['ESN', 'ORIGINAL_QUANTITY', 'INVOICE_DATE']

When I query MySQL I get a result with \u3000 in it

I'm connecting to a MySQL database, using utf8mbs, with the following code:
def name():
with conn.cursor() as cursor:
sql = "select name from fake_user where id = 147951"
cursor.execute(sql)
interentname = cursor.fetchall()
for i in interentname:
i = str(i)
new_name = i.strip("',)")
new_name = cc.strip("('")
# return new_name.encode('utf8').decode('unicode_escape')
return re.sub("[\u3000]", "", new_name)
print(name())
This keeps printing ♚\u3000\u3000 恏😊, I want to know how to get rid of the \u3000 part in that.
The above code doesn't get rid of the \u3000 though, why is that?
interentname is a tuple
new_name is a str string
How do I decode this properly?

You are turning each row, a tuple, into a string representation:
for i in interentname:
i = str(i)
Don't do that. A tuple is a sequence of values, and for your specific query, there will be only a single value in it, the value for the name column. Index into the tuple to get the single value:
for row in interentname:
name = row[0]
You can also use tuple assignment:
for row in interentname:
name, = row
Note the comma after name there, it tells Python that row must be a sequence with one value and that that one value should be assigned to name. You can even do this in the for loop target:
for name, in interentname:
print(name)
interentname is a sequence of tuples, not just a single tuple, so each iteration, you get a value like:
>>> row = ('♚\u3000\u3000 恏😊',)
The \u3000 codepoints in there are U+3000 IDEOGRAPHIC SPACE characters, which Python will always echo as \uxxxx escapes when the string is represented (as anything will be inside the standard containers).
By turning a tuple into a string, you then capture the representation as a string:
>>> str(row)
>>> str(row)
"('♚\\u3000\\u3000 恏😊',)"
Python represents tuples using valid Python syntax, and uses valid Python syntax for strings too. But removing the tuple syntax from that output (so the "(' at the start and ',) at the end) does not give you the proper string value back.
Indexing the tuple object gives you the value in it:
>>> row[0]
'♚\u3000\u3000 恏😊'
>>> print(row[0])
♚　　 恏😊

Convert csv file to json with no quotes around float values

I have some csv files that I need to convert to json. Some of the float values in the csv are numeric strings (to maintain trailing zeros). When converting to json, all keys and values are wrapped in double quotes. I need the numeric string float values to not have quotes, but maintain the trailing zeros.
Here is a sample of the input csv file:
ACCOUNTNAMEDENORM,DELINQUENCYSTATUS,RETIRED,INVOICEDAYOFWEEK,ID,BEANVERSION,ACCOUNTTYPE,ORGANIZATIONTYPEDENORM,HIDDENTACCOUNTCONTAINERID,NEWPOLICYPAYMENTDISTRIBUTABLE,ACCOUNTNUMBER,PAYMENTMETHOD,INVOICEDELIVERYTYPE,DISTRIBUTIONLIMITTYPE,CLOSEDATE,FIRSTTWICEPERMTHINVOICEDOM,HELDFORINVOICESENDING,FEINDENORM,COLLECTING,ACCOUNTNUMBERDENORM,CHARGEHELD,PUBLICID
John Smith,2.0000000000,0.0000000000,5.0000000000,1234567.0000000000,69.0000000000,1.0000000000,,4321987.0000000000,1,000-000-000-00,10012.0000000000,10002.0000000000,3.0000000000,,1.0000000000,0,,0,000-000-000-00,0,bc:1234346
The json output I am getting is:
{"ACCOUNTNAMEDENORM":"John Smith","DELINQUENCYSTATUS":"2.0000000000","RETIRED":"0.0000000000","INVOICEDAYOFWEEK":"5.0000000000","ID":"1234567.0000000000","BEANVERSION":"69.0000000000","ACCOUNTTYPE":"1.0000000000","ORGANIZATIONTYPEDENORM":null,"HIDDENTACCOUNTCONTAINERID":"4321987.0000000000","NEWPOLICYPAYMENTDISTRIBUTABLE":"1","ACCOUNTNUMBER":"000-000-000-00","PAYMENTMETHOD":"12345.0000000000","INVOICEDELIVERYTYPE":"98765.0000000000","DISTRIBUTIONLIMITTYPE":"3.0000000000","CLOSEDATE":null,"FIRSTTWICEPERMTHINVOICEDOM":"1.0000000000","HELDFORINVOICESENDING":"0","FEINDENORM":null,"COLLECTING":"0","ACCOUNTNUMBERDENORM":"000-000-000-00","CHARGEHELD":"0","PUBLICID":"xx:1234346"}
Here is the code I am using:
import csv
import json
csvfile = open('output2.csv', 'r')
jsonfile = open('output2.json', 'w')
readHeaders = csv.reader(csvfile)
fieldnames = next(readHeaders)
reader = csv.DictReader(csvfile, fieldnames)
for row in reader:
json.dump(row, jsonfile, separators=(',', ':'))
jsonfile.write('\n')
I would like the output to have no quotes around float values, similar to the following:
{"ACCOUNTNAMEDENORM":"John Smith","DELINQUENCYSTATUS":2.0000000000,"RETIRED":0.0000000000,"INVOICEDAYOFWEEK":5.0000000000,"ID":1234567.0000000000,"BEANVERSION":69.0000000000,"ACCOUNTTYPE":1.0000000000,"ORGANIZATIONTYPEDENORM":null,"HIDDENTACCOUNTCONTAINERID":4321987.0000000000,"NEWPOLICYPAYMENTDISTRIBUTABLE":"1","ACCOUNTNUMBER":"000-000-000-00","PAYMENTMETHOD":12345.0000000000,"INVOICEDELIVERYTYPE":98765.0000000000,"DISTRIBUTIONLIMITTYPE":3.0000000000,"CLOSEDATE":null,"FIRSTTWICEPERMTHINVOICEDOM":1.0000000000,"HELDFORINVOICESENDING":"0","FEINDENORM":null,"COLLECTING":"0","ACCOUNTNUMBERDENORM":"000-000-000-00","CHARGEHELD":"0","PUBLICID":"xx:1234346"}

Now, from your comments, that I understand your question better, here's a completely different answer. Note that it doesn't use the json module and just does the processing needed "manually". Although it probably could be done using the module, getting it to format the Python data types it recognizes by default differently can be fairly involved — I know from experience — as compared to the relatively simple logic used below anyway.
Anther note: Like your code, this converts each row of the csv file into a valid JSON object and writes each one to a file on a separate line. However the contents of the resulting file technically won't be valid JSON because all of these individual objects need to be be comma-separated and enclosed in [ ] brackets (i.e. thereby becoming a valid JSON "Array" Object).
import csv
with open('output2.csv', 'r', newline='') as csvfile, \
open('output2.json', 'w') as jsonfile:
for row in csv.DictReader(csvfile):
newfmt = []
for field, value in row.items():
field = '"{}"'.format(field)
try:
float(value)
except ValueError:
value = 'null' if value == '' else '"{}"'.format(value)
else:
# Avoid changing integer values to float.
try:
int(value)
except ValueError:
pass
else:
value = '"{}"'.format(value)
newfmt.append((field, value))
json_repr = '{' + ','.join(':'.join(pair) for pair in newfmt) + '}'
jsonfile.write(json_repr + '\n')
This is the JSON written to the file:
{"ACCOUNTNAMEDENORM":"John Smith","DELINQUENCYSTATUS":2.0000000000,"RETIRED":0.0000000000,"INVOICEDAYOFWEEK":5.0000000000,"ID":1234567.0000000000,"BEANVERSION":69.0000000000,"ACCOUNTTYPE":1.0000000000,"ORGANIZATIONTYPEDENORM":null,"HIDDENTACCOUNTCONTAINERID":4321987.0000000000,"NEWPOLICYPAYMENTDISTRIBUTABLE":"1","ACCOUNTNUMBER":"000-000-000-00","PAYMENTMETHOD":12345.0000000000,"INVOICEDELIVERYTYPE":98765.0000000000,"DISTRIBUTIONLIMITTYPE":3.0000000000,"CLOSEDATE":null,"FIRSTTWICEPERMTHINVOICEDOM":1.0000000000,"HELDFORINVOICESENDING":"0","FEINDENORM":null,"COLLECTING":"0","ACCOUNTNUMBERDENORM":"000-000-000-00","CHARGEHELD":"0","PUBLICID":"bc:1234346"}
Shown again below with added whitespace:
{"ACCOUNTNAMEDENORM": "John Smith",
"DELINQUENCYSTATUS": 2.0000000000,
"RETIRED": 0.0000000000,
"INVOICEDAYOFWEEK": 5.0000000000,
"ID": 1234567.0000000000,
"BEANVERSION": 69.0000000000,
"ACCOUNTTYPE": 1.0000000000,
"ORGANIZATIONTYPEDENORM": null,
"HIDDENTACCOUNTCONTAINERID": 4321987.0000000000,
"NEWPOLICYPAYMENTDISTRIBUTABLE": "1",
"ACCOUNTNUMBER": "000-000-000-00",
"PAYMENTMETHOD": 12345.0000000000,
"INVOICEDELIVERYTYPE": 98765.0000000000,
"DISTRIBUTIONLIMITTYPE": 3.0000000000,
"CLOSEDATE": null,
"FIRSTTWICEPERMTHINVOICEDOM": 1.0000000000,
"HELDFORINVOICESENDING": "0",
"FEINDENORM": null,
"COLLECTING": "0",
"ACCOUNTNUMBERDENORM": "000-000-000-00",
"CHARGEHELD": "0",
"PUBLICID": "bc:1234346"}

Might be a bit of overkill, but with pandas it would be pretty simple:
import pandas as pd
data = pd.read_csv('output2.csv')
data.to_json(''output2.json')

One solution is to use a regular expression to see if the string value looks like a float, and convert it to a float if it is.
import re
null = None
j = {"ACCOUNTNAMEDENORM":"John Smith","DELINQUENCYSTATUS":"2.0000000000",
"RETIRED":"0.0000000000","INVOICEDAYOFWEEK":"5.0000000000",
"ID":"1234567.0000000000","BEANVERSION":"69.0000000000",
"ACCOUNTTYPE":"1.0000000000","ORGANIZATIONTYPEDENORM":null,
"HIDDENTACCOUNTCONTAINERID":"4321987.0000000000",
"NEWPOLICYPAYMENTDISTRIBUTABLE":"1","ACCOUNTNUMBER":"000-000-000-00",
"PAYMENTMETHOD":"12345.0000000000","INVOICEDELIVERYTYPE":"98765.0000000000",
"DISTRIBUTIONLIMITTYPE":"3.0000000000","CLOSEDATE":null,
"FIRSTTWICEPERMTHINVOICEDOM":"1.0000000000","HELDFORINVOICESENDING":"0",
"FEINDENORM":null,"COLLECTING":"0","ACCOUNTNUMBERDENORM":"000-000-000-00",
"CHARGEHELD":"0","PUBLICID":"xx:1234346"}
for key in j:
if j[key] is not None:
if re.match("^\d+?\.\d+?$", j[key]):
j[key] = float(j[key])
I used null = None here to deal with the "null"s that show up in the JSON. But you can replace 'j' here with each CSV row you're reading, then use this to update the row before writing it back with the floats replacing the strings.
If you're OK with converting any numerical string into a float, then you can skip the regular expression (re.match() command) and replace it with j[key].isnumeric(), if it's available for your Python version.
EDIT: I don't think floats in Python handle the "precision" in a way you might think. It may look like 2.0000000000 is being "truncated" to 2.0, but I think this is more of a formatting and display issue, rather than losing information. Consider the following examples:
>>> float(2.0000000000)
2.0
>>> float(2.00000000001)
2.00000000001
>>> float(1.00) == float(1.000000000)
True
>>> float(3.141) == float(3.140999999)
False
>>> float(3.141) == float(3.1409999999999999)
True
>>> print('%.10f' % 3.14)
3.1400000000
It's possible though to get the JSON to have those zeroes, but in that case it comes down to treating the number as a string, namely a formatted one.

Hah, it's really interesting, I want to find the opposite answer with you that is the results are with quotes.
Actually it's very easy to remove it automatically, just remove the param "separators=(',', ':')".
For me, just adding this param is Okay.

Python: remove double quotes from JSON dumps

I have a database which returns the rows as lists in following format:
data = ['(1000,"test value",0,0.00,0,0)', '(1001,"Another test value",0,0.00,0,0)']
After that, I use json_str = json.dumps(data) to get a JSON string. After applying json.dumps(), I get the following output:
json_str = ["(1000,\"test value\",0,0.00,0,0)", "(1001,\"Another test value\",0,0.00,0,0)"]
However, I need the JSON string in the following format:
json_str = [(1000,\"test value\",0,0.00,0,0), (1001,\"Another test value\",0,0.00,0,0)]
So basically, I want to remove the surrounding double quotes. I tried to accomplish this with json_str = json_str.strip('"') but this doesn't work. Then, I tried json_str = json_str.replace('"', '') but this also removes the escaped quotes.
Does anybody know a way to accomplish this or is there a function in Python similiar to json.dumps() which produces the same result, but without the surrounding double quotes?

You are dumping list of strings so json.dumps does exactly what you are asking for. Rather ugly solution for your problem could be something like below.
def split_and_convert(s):
bits = s[1:-1].split(',')
return (
int(bits[0]), bits[1], float(bits[2]),
float(bits[3]), float(bits[4]), float(bits[5])
)
data_to_dump = [split_and_convert(s) for s in data]
json.dumps(data_to_dump)

Python program treating dictionary like a string

I set up a dictionary, and filled it from a file, like so:
filedusers = {} # cheap way to keep track of users, not for production
FILE = open(r"G:\School\CS442\users.txt", "r")
filedusers = ast.literal_eval("\"{" + FILE.readline().strip() + "}\"")
FILE.close()
then later I did a test on it, like this:
if not filedusers.get(words[0]):
where words[0] is a string for a username, but I get the following error:
'str' object has no attribute 'get'
but I verified already that after the FILE.close() I had a dictionary, and it had the correct values in it.
Any idea what's going on?

literal_eval takes a string, and converts it into a python object. So, the following is true...
ast.literal_eval('{"a" : 1}')
>> {'a' : 1}
However, you are adding in some quotations that aren't needed. If your file simply contained an empty dictionary ({}), then the string you create would look like this...
ast.literal_eval('"{}"') # The quotes that are here make it return the string "{}"
>> '{}'
So, the solution would be to change the line to...
ast.literal_eval("{" + FILE.readline().strip() + "}")
...or...
ast.literal_eval(FILE.readline().strip())
..depending on your file layout. Otherwise, literal_eval sees your string as an ACTUAL string because of the quotes.

>>> import ast
>>> username = "asd: '123'"
>>> filedusers = ast.literal_eval("\"{" + username + "}\"")
>>> print filedusers, type(filedusers)
{asd} <type 'str'>
You don't have a dictionary, it just looks like one. You have a string.

Python is dynamically typed: it does not require you to define variables as a specific type. And it lets you define variables implicitly. What you are doing is defining filedusers as a dictionary, and then redefining it as a string by assigning the result of ast.literal_eval to it.
EDIT: You need to remove those quotes. ast.literal_eval('"{}"') evaluates to a string. ast.literal_eval('{}') evaluates to a dictionary.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python string parsing issues when saving sql commands to file - python

Did you try using json.dump() and repr()? columns = ', '.join(data.keys()) data['col1'] = repr(data['col1']) data['col2'] = json.dumps(data['col2']) ...

Related

how do i convert a list into a string

When I query MySQL I get a result with \u3000 in it

Convert csv file to json with no quotes around float values

Python: remove double quotes from JSON dumps

Python program treating dictionary like a string

Categories

Resources