When I query MySQL I get a result with \u3000 in it

When I query MySQL I get a result with \u3000 in it - python

I'm connecting to a MySQL database, using utf8mbs, with the following code:
def name():
with conn.cursor() as cursor:
sql = "select name from fake_user where id = 147951"
cursor.execute(sql)
interentname = cursor.fetchall()
for i in interentname:
i = str(i)
new_name = i.strip("',)")
new_name = cc.strip("('")
# return new_name.encode('utf8').decode('unicode_escape')
return re.sub("[\u3000]", "", new_name)
print(name())
This keeps printing ♚\u3000\u3000 恏😊, I want to know how to get rid of the \u3000 part in that.
The above code doesn't get rid of the \u3000 though, why is that?
interentname is a tuple
new_name is a str string
How do I decode this properly?

You are turning each row, a tuple, into a string representation:
for i in interentname:
i = str(i)
Don't do that. A tuple is a sequence of values, and for your specific query, there will be only a single value in it, the value for the name column. Index into the tuple to get the single value:
for row in interentname:
name = row[0]
You can also use tuple assignment:
for row in interentname:
name, = row
Note the comma after name there, it tells Python that row must be a sequence with one value and that that one value should be assigned to name. You can even do this in the for loop target:
for name, in interentname:
print(name)
interentname is a sequence of tuples, not just a single tuple, so each iteration, you get a value like:
>>> row = ('♚\u3000\u3000 恏😊',)
The \u3000 codepoints in there are U+3000 IDEOGRAPHIC SPACE characters, which Python will always echo as \uxxxx escapes when the string is represented (as anything will be inside the standard containers).
By turning a tuple into a string, you then capture the representation as a string:
>>> str(row)
>>> str(row)
"('♚\\u3000\\u3000 恏😊',)"
Python represents tuples using valid Python syntax, and uses valid Python syntax for strings too. But removing the tuple syntax from that output (so the "(' at the start and ',) at the end) does not give you the proper string value back.
Indexing the tuple object gives you the value in it:
>>> row[0]
'♚\u3000\u3000 恏😊'
>>> print(row[0])
♚　　 恏😊

Related

python string parsing issues when saving sql commands to file

I have dicts data I am looping through to form INSERT commands for postgresql. where the parent keys of the dicts are the column names and the parent values are the column values:
data = {
'id': 45,
'col1': "foo's",
'col2': {'dict_key': 5}
}
columns = ', '.join(data.keys())
# replace single quote with double to form a json type for psql column
data['col2'] = data['col2'].replace("'", '"')
with open("file.sql", "w") as f:
command = "INSERT INTO table1({}) VALUES {};"
f.write(command.format(columns, tuple(data.values()))
The problem is that the output of this is not formatted correctly for sql to execute. This is the output of the above:
INSERT INTO table1(id, col1, col2) VALUES (45, "foo's", '{"dict_key":5}');
The json field is formatted correctly with the single quotes around the value. But col2 keeps the double quotes if the string in col2 contains a single quote. This is a problem because postgresql requires single quotes to identify TEXT input.
Is there a better way to parse data into psql insert commands?

Did you try using json.dump() and repr()?
columns = ', '.join(data.keys())
data['col1'] = repr(data['col1'])
data['col2'] = json.dumps(data['col2'])
...

This appears to be a limitation (or rather an implementation detail) in Python, with how the __repr__() for strings or str is defined.
Try this sample code out:
value = 'wont fix'; assert f'{value!r}' == "'wont fix'"
value = 'won\'t fix'; assert f'{value!r}' == '"won\'t fix"'
As can be seen, single quotes are preferred in the repr for strings, unless the string itself contains a single quote - in that case, double quotes are used to wrap the repr for the string.
A "quick and dirty" solution is to implement a custom string subclass, SQStr, which effectively overrides the default repr to always wrap a string with single quotes:
class SQStr(str):
def __repr__(self):
value = self.replace("'", r"\'")
return f"'{value}'"
If you want to also support double-escaped single quotes like r"\\\'", then something like this:
class SQStr(str):
def __repr__(self, _escaped_sq=r"\'", _tmp_symbol="|+*+|",
_default_repr=str.__repr__):
if "'" in self:
if _escaped_sq in self:
value = (self
.replace(_escaped_sq, _tmp_symbol)
.replace("'", _escaped_sq)
.replace(_tmp_symbol, r"\\\'"))
else:
value = self.replace("'", _escaped_sq)
return f"'{value}'"
# else, string doesn't contain single quotes, so we
# can use the default str.__repr__()
return _default_repr(self)
Now it appears to work as expected:
value = 'wont fix'; assert f'{SQStr(value)!r}' == "'wont fix'"
value = 'won\'t fix'; assert f'{SQStr(value)!r}' == r"'won\'t fix'"
# optional
value = r"won't fix, won\'t!"; assert f'{SQStr(value)!r}' == r"'won\'t fix, won\\\'t!'"

Retrieving text from sqlite without string formatting for comparison

Saving a string into a sqlite table, retrieving it again and comparing it to the original requires some filters to work and i dont know why exactly.
tl;dr
How can i retrieve string Data from the SQLITE DB without requiring Filter Nr 3 as its dangerous for more complex strings ?
import sqlite3
RAWSTRING = 'This is a DB Teststing'
# create database and table
currentdb = sqlite3.connect('test.db')
currentdb.execute('''CREATE TABLE tickertable (teststring text)''')
# enter RAWSTRING into databasse
currentdb.execute('''INSERT INTO tickertable VALUES(?);''', (RAWSTRING,))
# get RAWSTRING from database
cursorObj = currentdb.cursor()
cursorObj.execute('SELECT * FROM tickertable')
DB_RAWSTRING = cursorObj.fetchall()
currentdb.commit()
currentdb.close()
# Prints This is a DB Teststing
print('originalstring : ', RAWSTRING)
# Prints [('This is a DB Teststing',)]
print('retrieved from DB: ', DB_RAWSTRING)
# Get first entry from List because fetchall gives a list
FILTER1_DB_RAWSTRING = DB_RAWSTRING[0]
# Convert the Listelement to String because its still a listelement and comparing fails to string
FILTER2_DB_RAWSTRING = str(FILTER1_DB_RAWSTRING)
# Remove annoying db extra characters and i dont know why they exist anyway
FILTER3_DB_RAWSTRING = FILTER2_DB_RAWSTRING.replace("'", "").replace("(", "").replace(")", "").replace(",", "")
if RAWSTRING == FILTER3_DB_RAWSTRING:
print('Strings are the same as they should')
else:
print('String are not the same because of db weirdness')

So here's your problem: fetchall returns a list of tuples. This means that casting them to a string puts pesky parenthesis around each row and commas between each element of each row. If you'd like to retrieve the raw information from each column, that can be done by indexing the tuples:
entries = cursorObj.fetchall()
first_row = entries[0]
first_item = first_row[0]
print(first_item)
This ought to print just the content of the first row and column in the DB. If not, let me know!
David

how do i convert a list into a string

i have tried to use the .replace or .strip method but have been unsuccessful with doing such. I am trying to print out a single stringed list separated by commas.
does anyone know a way to make it so it us printing out with no [] or single quotes ''
def get_format(header1):
format_lookup = "SELECT ID, FormatName, HeaderRow, ColumnStatus, ColumnMobileID, ColumnVendorID, ColumnTechID, " \
"ColumnCallType, ColumnCallDate, ColumnCallTime, ColumnCallTo, ColumnQty, ColumnQtyLabel " \
"from dynamic_format WHERE HeaderRow=%s"
header1 = (str(header1),)
cursor = connection.cursor()
cursor.execute(format_lookup, header1)
record = cursor.fetchone()
return record

I suppose I'll post my comment as an answer:
In [1]: header1 = ['ESN', 'ORIGINAL_QUANTITY', 'INVOICE_DATE']
In [2]: ", ".join(header1)
Out[2]: 'ESN, ORIGINAL_QUANTITY, INVOICE_DATE'
In [3]: print(", ".join(header1))
ESN, ORIGINAL_QUANTITY, INVOICE_DATE
The reason you're getting those errors is because header1 is a list object and .replace() is a string method.
#sbabtizied's answer is what you'd use if header1 was a string:
# a single string, what sbabti assumed you had
"['ESN', 'ORIGINAL_QUANTITY', 'INVOICE_DATE']"
# a list of strings, what you actually have
['ESN', 'ORIGINAL_QUANTITY', 'INVOICE_DATE']

Strip carriage returns / line feeds from database records before they are written to csv

I have the following code that reads lines from a database and writes the matching results to a csv file. The problem I am running into is there are occasionally carriage returns / line feeds in some of the fields in different rows which cause the csv file to be unusable because there are bogus rows.
For example, here is a sample of what happens when there are carriage returns / line feeds in the SQL data and the affect it has on the file. ... Sample content of messed up file:
field1|field2|field3|field4|field5
value 1|value 2|value 3|value 4|value 5
value 1|value 2|value 3|value 4|value 5
value 1|value 2|val
ue 3|value 4|value 5
value 1|value 2|value 3|va
lue 4|value 5
Here is the code that writes the results of a SQL query to the output file. What I am trying to do is strip any results that have carriage returns / line feeds.
'''
While loop to read each row. if compares row[2] (updated) against the last record processed
'''
latest = params #Declare 'latest' variable for consumption by while loop
while row:
if row[2] > latest:
latest = row[2]
logger.debug("[%s] - Writing row %s", correlationId, row)
writer.writerow(row)
row = cursor.fetchone()
logger.info("[%s] - last letter date %s " % (correlationId, lastProcessed))
lastProcessedLog = open(LAST_PROCESSED_LOGFILE , 'wt')
lastProcessedString = str(latest)
lastProcessedString = lastProcessedString[0:19]
lastProcessedLog.write(lastProcessedString)
lastProcessedLog.close()
conn.close()
ofile.close()
logger.info("[%s] - Copying %s to root for loadBackflow as %s", correlationId, writeFile, outfile)
shutil.copyfile(writeFile, outfile)
logger.info("[%s] - Moving %s to completion folder %s", correlationId, writeFile, completionFolder)
shutil.move(writeFile, completionFolder)
I have tried changing the writer.write(row) line to include a replace but I get an error. Similarly, I get errors when trying to use replace with row = row.replace("\r\n", "") ... I have pasted my attempts and the corresponding errors below.
Any insights on how I can strip carriage returns / line feeds at the time they are being read from the SQL query results into the data file are much appreciated.
Thanks in advance! :)
# Attempt1:
writer.writerow(row).replace("\r\n", "")
# Error:
Unexpected error: 'NoneType' object has no attribute 'replace'
# Attempt2:
row = row.replace("\r\n", "")
#Error:
Unexpected error: 'tuple' object has no attribute 'replace'
#Attempt3:
row = row.replace("\r", "")
row = row.replace("\n", "")
#Error:
Unexpected error: 'tuple' object has no attribute 'replace'

Programming by permutation is a well known antipattern... db.cursor.fetchone() returns a tuple, which - as the error message tells you - has no replace() method (what would be the semantic of (1, 2, 3).replace("a string", "another string") ?), and csv.Writer.writerow() return None (and since it would already have written the row, what would be the point of trying to modify it afterward anyway ?).
So, to make a long story short, you have two options:
modify the strings in the row before passing it to writerow()
use a csv format that allow you to escape newlines
I don't know exactly how you plan on using the generated csv, but solution #2 is still the best approach if possible - these newlines might be here for a reason - and it only requires passing the right arguments when instanciating your csv.Writer.
If that's definitly not an option, solution #1 needs a bit more work, as tuples and strings are immutable:
def preprocess_item(item):
if isinstance(item, str):
return item.replace("\n", " ").replace("\r", " ")
return item
def preprocess_row(row):
return tuple(preprocess_item(item) for item in row)
def yourfunction(whatever):
### some code here
row = cursor.fetchone()
row = preprocess_row(row)
writer.writerow(row)

The fetchone() method returns the row as a tuple. It may contain more than one filed value depending on the table. You cannot perform the string method of replace on a tuple and that explains the error you see. I don't know the structure of the tuple returned by the fetchone, so I'll assume it to be the first element.
stripped_row = row[0].replace("\r", "")
stripped_row = stripped_row.replace("\n", "")

Read specific columns using python

I have a file like this
SEQ_NUM|ICS_ORIG_STRT_DT|EDW_FIRST_OUT_IFP_DT|CURR_DT|DEV_GE_NUM_DAYS|DEV_LE_NUM_DAYS|FILENAME|CAMPAIGN_NAME_DESC| CAMPAIGN_WAVE|MARKET_SEGMENT|CAMPAIGN_NAME|CAMPAIGN_WAVE_RUN|EFFORT_TYPE|EFFORT_NUM|UU_ID|PRINT_ACCT_NUM|PRINT_PUB_CD|PREFIX|SUFFIX|FIRST_NAME|LAST_NAME|EMAIL|PHONE_NUM|BUS_PHONE|CO_NAME|STREET_NUM|ADDR|ADDR2|CITY|STATE_PROVINCE|ZIP_POSTAL|ZIP4|TRACK_CD|VANITY_URL|BILL_FORM|LETTER_TEXT|OUTER
130|20140401|00010101|20140728|85||Apr14WSJ_CNYR_NOEMAIL_CAP_TM_20140728.txt|Apr14WSJ_CNYR_NOEMAIL_CAP_TM|WSJ_CNYR_NOEMAIL_CAP_TM|CNYR|WSJ_CNYR_NOEMAIL_CAP|Apr14|TM|||032714296269|J|||ARTHUR|MURPHY||9784255147|||46|LANTERN###WAY||SHIRLEY|MA|01464|2136|aaqecw0c||||
I am trying to get PRINT_PUB_CODE =130,PRINT_ACCT_NUM= 20140401 CO_NAME = 00010101 and prefix = 20140728 and so on.
I am new to python and tried this code along with other codes but the the results coming like
130
20140401
None
aaqecw0c
Please let me know where I am doing wrong.What I can do to fix this.
The code is:
#!/usr/bin/env python
import csv
csv.register_dialect('piper', delimiter='|', quoting=csv.QUOTE_NONE)
with open('temp1.txt','rb') as csvfile:
for row in csv.DictReader(csvfile, dialect='piper'):
print row['PRINT_PUB_CODE']
print row['PRINT_ACCT_NUM']
print row['CO_NAME']
print row['PREFIX']
original file is at http://pastebin.com/QFvLwcHu

print is writing out exactly what you tell it to, namely the values themselves. If you want name/value pairs, you'll need to arrange to format it yourself.
Try this in place of the existing print statements:
names = ('PRINT_PUB_CODE', 'PRINT_ACCT_NUM', 'CO_NAME', 'PREFIX')
print(", ".join("%s=%r" % (n, row[n]) for n in names))
Here's how the above code works:
", ".join(sequence) takes a sequence of strings and concatenates them together with ", " between each value.
% is the string formatting operator, which takes the string on the left and replaces patterns like %s and %r with formatted values from the tuple on the right
%s is the format code for a string, and it's used for the name in front of the = sign.
%r is the format code for "string representation of," or repr, and it's used to convert the value in the row to a string.
for n in names is part of a list comprehension, making Python evaluate the % operator once for each element in the names tuple.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

When I query MySQL I get a result with \u3000 in it - python

Related

python string parsing issues when saving sql commands to file

Retrieving text from sqlite without string formatting for comparison

how do i convert a list into a string

Strip carriage returns / line feeds from database records before they are written to csv

Read specific columns using python

Categories

Resources