Formatting string for readability and printing - python

I am trying to format the string below with variables for readability, I would like to break it up so it easier read, right now it takes up 199 characters in the script line, every attempt I make seems to break it up so when printed it has large gaps, can anyone shed some light? I tried wrapping it in """ triple quotes and \ at the end but it still has spaces when printed or logged.
copy_sql = "COPY {0} FROM 's3://{1}/{2}' CREDENTIALS 'aws_access_key_id={3};aws_secret_access_key={4}' {5}; ".format(table_name,bucket,key,aws_access_key_id,aws_secret_access_key,options)
Desired result would be something to this affect:
copy_sql = "COPY {0} FROM 's3://{1}/{2}' \
CREDENTIALS 'aws_access_key_id={3};aws_secret_access_key={4}' {5}; \
".format(table_name,bucket,key, \
aws_access_key_id,aws_secret_access_key,options)
However when I print it I get large spaces between .gz and credentials:
COPY analytics.table FROM 's3://redshift-fake/storage/2017-11-02/part-00000.gz' CREDENTIALS 'aws_access_key_id=SECRET;aws_secret_access_key=SECRET' DELIMITER '\t' dateformat 'auto' fillrecord removequotes gzip;
I am thinking this would still work but I would like to clean it up for logging readability.

You can use string literal concatenation:
Multiple adjacent string literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation. Thus, "hello" 'world' is equivalent to "helloworld".
In your case, something like this:
copy_sql = ("COPY {0} FROM 's3://{1}/{2}' "
"CREDENTIALS 'aws_access_key_id={3};aws_secret_access_key={4}' {5};"
).format(table_name,bucket,key,
aws_access_key_id,aws_secret_access_key,options)
Note the extra parentheses to make it parse correctly. As long as a line ending is inside at least one pair of parenthes, Python will always treat it as a line continuation, without the need for backslashes.

Related

EOL SyntaxError in python [duplicate]

I have the above-mentioned error in s1="some very long string............"
Does anyone know what I am doing wrong?
You are not putting a " before the end of the line.
Use """ if you want to do this:
""" a very long string ......
....that can span multiple lines
"""
I had this problem - I eventually worked out that the reason was that I'd included \ characters in the string. If you have any of these, "escape" them with \\ and it should work fine.
(Assuming you don't have/want line breaks in your string...)
How long is this string really?
I suspect there is a limit to how long a line read from a file or from the commandline can be, and because the end of the line gets choped off the parser sees something like s1="some very long string.......... (without an ending ") and thus throws a parsing error?
You can split long lines up in multiple lines by escaping linebreaks in your source like this:
s1="some very long string.....\
...\
...."
In my situation, I had \r\n in my single-quoted dictionary strings. I replaced all instances of \r with \\r and \n with \\n and it fixed my issue, properly returning escaped line breaks in the eval'ed dict.
ast.literal_eval(my_str.replace('\r','\\r').replace('\n','\\n'))
.....
I faced a similar problem. I had a string which contained path to a folder in Windows e.g. C:\Users\ The problem is that \ is an escape character and so in order to use it in strings you need to add one more \.
Incorrect: C:\Users\
Correct: C:\\Users\\
You can try this:
s = r'long\annoying\path'
I too had this problem, though there were answers here I want to an important point to this
after
/ there should not be empty spaces.Be Aware of it
I also had this exact error message, for me the problem was fixed by adding an " \"
It turns out that my long string, broken into about eight lines with " \" at the very end, was missing a " \" on one line.
Python IDLE didn't specify a line number that this error was on, but it red-highlighted a totally correct variable assignment statement, throwing me off. The actual misshapen string statement (multiple lines long with " \") was adjacent to the statement being highlighted. Maybe this will help someone else.
In my case, I use Windows so I have to use double quotes instead of single.
C:\Users\Dr. Printer>python -mtimeit -s"a = 0"
100000000 loops, best of 3: 0.011 usec per loop
In my case with Mac OS X, I had the following statement:
model.export_srcpkg(platform, toolchain, 'mymodel_pkg.zip', 'mymodel.dylib’)
I was getting the error:
File "<stdin>", line 1
model.export_srcpkg(platform, toolchain, 'mymodel_pkg.zip', 'mymodel.dylib’)
^
SyntaxError: EOL while scanning string literal
After I change to:
model.export_srcpkg(platform, toolchain, "mymodel_pkg.zip", "mymodel.dylib")
It worked...
David
In my case, I forgot (' or ") at the end of string. E.g 'ABC' or "ABC"
I was getting this error in postgresql function. I had a long SQL which I broke into multiple lines with \ for better readability. However, that was the problem. I removed all and made them in one line to fix the issue. I was using pgadmin III.
Your variable(s1) spans multiple lines. In order to do this (i.e you want your string to span multiple lines), you have to use triple quotes(""").
s1="""some very long
string............"""
In this case, three single quotations or three double quotations both will work!
For example:
"""Parameters:
...Type something.....
.....finishing statement"""
OR
'''Parameters:
...Type something.....
.....finishing statement'''
I had faced the same problem while accessing any hard drive directory.
Then I solved it in this way.
import os
os.startfile("D:\folder_name\file_name") #running shortcut
os.startfile("F:") #accessing directory
The picture above shows an error and resolved output.
All code below was tested with Python 3.8.3
Simplest -- just use triple quotes.
Either single:
long_string = '''some
very
long
string
............'''
or double:
long_string = """some
very
long
string
............"""
Note: triple quoted strings retain indentation, it means that
long_string = """some
very
long
string
............"""
and
long_string = """some
very
long
string
............"""
or even just
long_string = """
some
very
long
string
............"""
are not the same.
There is a textwrap.dedent function in standard library to deal with this, though working with it is out of question's scope.
You can, as well, use \n inside a string, residing on single line:
long_string = "some \nvery \nlong \nstring \n............"
Also, if you don't need any linefeeds (i.e. newlines) in your string, you can use \ inside regular string:
long_string = "some \
very \
long \
string \
............"
Most previous answers are correct and my answer is very similar to aaronasterling, you could also do 3 single quotations
s1='''some very long string............'''

Escape space in filepath

I'm trying to write a python tool that will read a logfile and process it
One thing it should do is use the paths listed in the logfile (it's a logfile for a backup tool)
/Volumes/Live_Jobs/Live_Jobs/*SCANS\ and\ LE\ Docs/_LE_PROOFS_DOCS/JEM_lj/JEM/0002_OXO_CorkScrew/3\ Delivery/GG_Double\ Lever\ Waiters\ Corkscrew_072613_Mike_RETOUCHED/gg_3110200_2_V3_Final.tif
Unfortunately the paths that I'm provided with aren't appropriately escaped and I've had trouble properly escaping in python. Perhaps python isn't the best tool for this, but I like it's flexibility - it will allow me to extend whatever I write
Using the regex escape function escapes too many characters, pipes.quote method doesn't escape the spaces, and if I use a regex to replace ' ' with '\ ' I end up getting
/Volumes/Live_Jobs/Live_Jobs/*SCANS\\ and\\ LE\\ Docs/_LE_PROOFS_DOCS/JEM_lj/JEM/0002_OXO_CorkScrew/3\\ Delivery/GG_Double\\ Lever\\ Waiters\\ Corkscrew_072613_Mike_RETOUCHED/gg_3110200_2_V3_Final.tif
which are double escaped and wont pass to python functions like os.path.getsize().
What am I doing wrong??
If you're reading paths out of a file, and passing them to functions like os.path.getsize, you don't need to escape them. For example:
>>> with open('name with spaces', 'w') as f:
... f.write('abc\n')
>>> os.path.getsize('name with spaces')
4
In fact, there are only a handful of functions in Python that need spaces escaped, either because they're passing a string to the shell (like os.system) or because they're trying to do shell-like parsing on your behalf (like subprocess.foo with an arg string instead of an arg list).
So, let's say logfile.txt looks like this:
/Volumes/My Drive/My Scans/Batch 1/foo bar.tif
/Volumes/My Drive/My Scans/Batch 1/spam eggs.tif
/Volumes/My Drive/My Scans/Batch 2/another long name.tif
… then something like this will work fine:
with open('logfile.txt') as logf:
for line in logf:
with open(line.rstrip()) as f:
do_something_with_tiff_file(f)
Noticing those * characters in your example, if these are glob patterns, that's fine too:
with open('logfile.txt') as logf:
for line in logf:
for path in glob.glob(line.rstrip()):
with open(path) as f:
do_something_with_tiff_file(f)
If your problem is the exact opposite of what you described, and the file is full of strings that are escaped, and you want to unescape them, decode('string_escape') will undo Python-style escaping, and there are different functions to undo different kinds of escaping, but without knowing what kind of escaping you want to undo it's hard to say which function you want…
Try this:
myfile = open(r'c:\tmp\junkpythonfile','w')
The 'r' stands for a raw string.
You could also use \ like
myfile = open('c:\\tmp\\junkpythonfile','w')
This command will escape the spaces in a string.
# sample_string = sample_string.replace(key, value)
file_path = file_path.replace(' ','\ ')
For more details see https://thispointer.com/python-replace-multiple-characters-in-a-string

Own pretty print option in python script

I'm outputting pretty huge XML structure to file and I want user to be able to enable/disable pretty print.
I'm working with approximately 150MB of data,when I tried xml.etree.ElementTree and build tree structure from it's element objects, it used awfully lot of memory, so I do this manually by storing raw strings and outputing by .write(). My output sequence looks like this:
ofile.write(pretty_print(u'\
\t\t<LexicalEntry id="%s">\n\
\t\t\t<feat att="languageCode" val="cz"/>\n\
\t\t\t<Lemma>\n\
\t\t\t\t<FormRepresentation>\n\
\t\t\t\t\t<feat att="writtenForm" val="%s"/>\n\
\t\t\t\t</FormRepresentation>\n\
\t\t\t</Lemma>\n\
\t\t\t<Sense>%s\n' % (str(lex_id), word['word'], '' if word['pos']=='' else '\n\t\t\t\t<feat att="partOfSpeech" val="%s"/>' % word['pos'])))
inside the .write() I call my function pretty_print which, depending on command line option, SHOULD strip all tab and newline characters
o_parser = OptionParser()
# ....
o_parser.add_option("-p", "--prettyprint", action="store_true", dest="pprint", default=False)
# ....
def pretty_print(string):
if not options.pprint:
return string.strip('\n\t')
return string
I wrote 'should', because it does not, in this particular case it does not strip any of the characters.
BUT in this case, it works fine:
for ss in word['synsets']:
ofile.write(pretty_print(u'\t\t\t\t<Sense synset="%s-synset"/>\n' % ss))
First thing that came on my mind was that there might be some issues with the substitution, but when i print passed string inside the pretty_print function it looks perfectly fine.
Any suggestiones what might cause that .strip() does not work?
Or if there is any better way to do this, I'll accept any advice
Your issue is that str.strip() only removes from the beginning and end of a string.
You either want str.replace() to remove all instances, or to split it into lines and strip each line, if you want to remove them from the beginning and end of lines.
Also note that for your massive string, Python supports multi-line strings with triple quotes that will make it a lot easier to type out, and the old style string formatting with % has been superseded by str.format() - which you probably want to use instead in new code.

Python, trying to run a program from the command prompt

I am trying to run a program from the command prompt in windows. I am having some issues. The code is below:
commandString = "'C:\Program Files\WebShot\webshotcmd.exe' //url '" + columns[3] + "' //out '"+columns[1]+"~"+columns[2]+".jpg'"
os.system(commandString)
time.sleep(10)
So with the single quotes I get "The filename, directory name, or volume label syntax is incorrect." If I replace the single quotes with \" then it says something to the effect of "'C:\Program' is not a valid executable."
I realize it is a syntax error, but I am not quite sure how to fix this....
column[3] contains a full url copy pasted from a web browser (so it should be url encoded). column[1] will only contain numbers and periods. column[2] contains some text, double quotes and colons are replaced. Mentioning just in case...
Thanks!
Windows requires double quotes in this situation, and you used single quotes.
Use the subprocess module rather than os.system, which is more robust and avoids calling the shell directly, making you not have to worry about confusing escaping issues.
Dont use + to put together long strings. Use string formatting (string %s" % (formatting,)), which is more readable, efficient, and idiomatic.
In this case, don't form a long string as a shell command anyhow, make a list and pass it to subprocess.call.
As best as I can tell you are escaping your forward slash but not your backslashes, which is backwards. A string literal with // has both slashes in the string it makes. In any event, rather than either you should use the os.path module which avoids any confusion from parsing escapes and often makes scripts more portable.
Use the subprocess module for calling system commands. Also ,try removing the single quotes and use double quotes.

Verify CSV against given format

I am expecting users to upload a CSV file of max size 1MB to a web form that should fit a given format similar to:
"<String>","<String>",<Int>,<Float>
That will be processed later. I would like to verify the file fits a specified format so that the program that shall later use the file doesnt receive unexpected input and that there are no security concerns (say some injection attack against the parsing script that does some calculations and db insert).
(1) What would be the best way to go about doing this that would be fast and thorough? From what I've researched I could go the path of regex or something more like this. I've looked at the python csv module but that doesnt appear to have any built in verification.
(2) Assuming I go for a regex, can anyone direct me to towards the best way to do this? Do I match for illegal characters and reject on that? (eg. no '/' '\' '<' '>' '{' '}' etc.) or match on all legal eg. [a-zA-Z0-9]{1,10} for the string component? I'm not too familiar with regular expressions so pointers or examples would be appreciated.
EDIT:
Strings should contain no commas or quotes it would just contain a name (ie. first name, last name). And yes I forgot to add they would be double quoted.
EDIT #2:
Thanks for all the answers. Cutplace is quite interesting but is a standalone. Decided to go with pyparsing in the end because it gives more flexibility should I add more formats.
Pyparsing will process this data, and will be tolerant of unexpected things like spaces before and after commas, commas within quotes, etc. (csv module is too, but regex solutions force you to add "\s*" bits all over the place).
from pyparsing import *
integer = Regex(r"-?\d+").setName("integer")
integer.setParseAction(lambda tokens: int(tokens[0]))
floatnum = Regex(r"-?\d+\.\d*").setName("float")
floatnum.setParseAction(lambda tokens: float(tokens[0]))
dblQuotedString.setParseAction(removeQuotes)
COMMA = Suppress(',')
validLine = dblQuotedString + COMMA + dblQuotedString + COMMA + \
integer + COMMA + floatnum + LineEnd()
tests = """\
"good data","good2",100,3.14
"good data" , "good2", 100, 3.14
bad, "good","good2",100,3.14
"bad","good2",100,3
"bad","good2",100.5,3
""".splitlines()
for t in tests:
print t
try:
print validLine.parseString(t).asList()
except ParseException, pe:
print pe.markInputline('?')
print pe.msg
print
Prints
"good data","good2",100,3.14
['good data', 'good2', 100, 3.1400000000000001]
"good data" , "good2", 100, 3.14
['good data', 'good2', 100, 3.1400000000000001]
bad, "good","good2",100,3.14
?bad, "good","good2",100,3.14
Expected string enclosed in double quotes
"bad","good2",100,3
"bad","good2",100,?3
Expected float
"bad","good2",100.5,3
"bad","good2",100?.5,3
Expected ","
You will probably be stripping those quotation marks off at some future time, pyparsing can do that at parse time by adding:
dblQuotedString.setParseAction(removeQuotes)
If you want to add comment support to your input file, say a '#' followed by the rest of the line, you can do this:
comment = '#' + restOfline
validLine.ignore(comment)
You can also add names to these fields, so that you can access them by name instead of index position (which I find gives more robust code in light of changes down the road):
validLine = dblQuotedString("key") + COMMA + dblQuotedString("title") + COMMA + \
integer("qty") + COMMA + floatnum("price") + LineEnd()
And your post-processing code can then do this:
data = validLine.parseString(t)
print "%(key)s: %(title)s, %(qty)d in stock at $%(price).2f" % data
print data.qty*data.price
I'd vote for parsing the file, checking you've got 4 components per record, that the first two components are strings, the third is an int (checking for NaN conditions), and the fourth is a float (also checking for NaN conditions).
Python would be an excellent tool for the job.
I'm not aware of any libraries in Python to deal with validation of CSV files against a spec, but it really shouldn't be too hard to write.
import csv
import math
dataChecker = csv.reader(open('data.csv'))
for row in dataChecker:
if len(row) != 4:
print 'Invalid row length.'
return
my_int = int(row[2])
my_float = float(row[3])
if math.isnan(my_int):
print 'Bad int found'
return
if math.isnan(my_float):
print 'Bad float found'
return
print 'All good!'
Here's a small snippet I made:
import csv
f = csv.reader(open("test.csv"))
for value in f:
value[0] = str(value[0])
value[1] = str(value[1])
value[2] = int(value[2])
value[3] = float(value[3])
If you run that with a file that doesn't have the format your specified, you'll get an exception:
$ python valid.py
Traceback (most recent call last):
File "valid.py", line 8, in <module>
i[2] = int(i[2])
ValueError: invalid literal for int() with base 10: 'a3'
You can then make a try-except ValueError to catch it and let the users know what they did wrong.
There can be a lot of corner-cases for parsing CSV, so you probably don't want to try doing it "by hand". At least start with a package/library built-in to the language that you're using, even if it doesn't do all the "verification" you can think of.
Once you get there, then examine the fields for your list of "illegal" chars, or examine the values in each field to determine they're valid (if you can do so). You also don't even need a regex for this task necessarily, but it may be more concise to do it that way.
You might also disallow embedded \r or \n, \0 or \t. Just loop through the fields and check them after you've loaded the data with your csv lib.
Try Cutplace. It verifies that tabluar data conforms to an interface control document.
Ideally, you want your filtering to be as restrictive as possible - the fewer things you allow, the fewer potential avenues of attack. For instance, a float or int field has a very small number of characters (and very few configurations of those characters) which should actually be allowed. String filtering should ideally be restricted to only what characters people would have a reason to input - without knowing the larger context it's hard to tell you exactly which you should allow, but at a bare minimum the string match regex should require quoting of strings and disallow anything that would terminate the string early.
Keep in mind, however, that some names may contain things like single quotes ("O'Neil", for instance) or dashes, so you couldn't necessarily rule those out.
Something like...
/"[a-zA-Z' -]+"/
...would probably be ideal for double-quoted strings which are supposed to contain names. You could replace the + with a {x,y} length min/max if you wanted to enforce certain lengths as well.

Categories

Resources