EOL while concatenating string + path - python

I need to concatenate specific folder path with a string, for example:
mystring = "blablabla"
path = "C:\folder\whatever\"
printing (path + mystring) should return:
C:\folder\whatever\blablabla
However I always get the EOL error, and it's a must the path to have the slash like this: \ and not like this: /
Please show me the way, I tried with r' it's not working, I tried adding double "", nothing works and I can't figure it out.

Always use os.path.join() to join paths and the r prefix to allow single back slashes as Windows path separators:
r"C:\folder\whatever"
Now, now trailing back slash is needed:
>>> import os
>>> mystring = "blablabla"
>>> path = r"C:\folder\whatever"
>>> os.path.join(path, mystring)
'C:\\folder\\whatever\\blablabla'

Either use escape character \\ for \:
mystring = "blablabla"
path = "C:\\folder\\whatever\\"
conc = path + mystring
print(conc)
# C:\folder\whatever\blablabla
Or, make use of raw strings, however moving the last backslash from end of path to the start of myString:
mystring = r"\blablabla"
path = r"C:\folder\whatever"
conc = path + mystring
print(conc)
# C:\folder\whatever\blablabla
The reason why your own raw string approach didn't work is that a raw strings may not end with a single backslash:
Specifically, a raw literal cannot end in a single backslash (since
the backslash would escape the following quote character).
From
https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals

Two things.
First, with regards to the EOL error, my best guess - without access to the actual python session - is that python was complaining because you have an unterminated string caused by the final " character being escaped, which will happend even if the string is prefixed with r. My opinion is that you should drop the prefix and just correctly espace all backslashes like so: \\.
In your example, paththen becomes path = "C:\\folder\\whatever\\"
Secondly, instead of manually concatenating paths, you should use os.path.join:
import os
mystring = "blablabla"
path = "C:\\folder\\whatever"
print os.path.join(path, mystring)
## prints C:\\folder\\whatever\\blablabla
Note that os.path will use the path convetions for the operating system where the application is running, so the above code will produce erroneous/unexpected results if you run it on, say, Linux. Check the notes on the top of the page that I have linked for details.

Related

How to escape filenames for Google Drive API

I discovered that if searching for a filename with an apostrophe(') in Google Drive API, I needed to escape the apostrophe with a \. e.g:
# file_name is "tim's file"
file_name = file_name.replace("'", "\\'")
# file_name is "tim\'s file"
response = service.files().list(q = "name='" + file_name + "'").execute() #works
The docs mention that the backslash also needs special treatment.
My question is what the the general solution to this problem of special characters in the filename, are there other characters that similarly needed to be escaped?
TL;DR: No, there isn't a generic way to handle escaping ' and \ Google drive queries (and possibly other Google API's). Each API provider (Microsoft, Amazon, Twitter, etc.) would have their filename/string-escaping rules so creating one for each would be tedious. However, it should have been part of the API client they provided.
My question is what the the general solution to this problem of special characters in the filename
This is separate from the issue of sanitising strings for actual filenames because local filesystems don't follow the same rules as GDrive.
are there other characters that similarly needed to be escaped?
As far as I can tell, GDrive only needs the apostrophe (') and backslash (\) escaped, as you pointed out. As for the actual request, there's:
Note: These examples use the unencoded q parameter, where name = 'hello' is encoded as name+%3d+%27hello%27. Client libraries handle this encoding automatically.
That part is probably being handled by google-api-python-client.
As for the two specific replacements you need:
file_name = r"tim's file\has slashes"
print(file_name)
# tim's file\has slashes
print(file_name.replace('\\', '\\\\').replace("'", "\\'"))
# tim\'s file\\has slashes
# or, better
print(file_name.replace('\\', '\\\\').replace("'", r"\'"))
# tim\'s file\\has slashes
# using raw strings also for the backslash replacement
print(file_name.replace('\\', r'\\').replace("'", r"\'"))
# tim\'s file\\has slashes
Note that there's no point using raw strings for the backslash escape in the find part of the first replacement because the trailing backslash before the close quote needs to be escaped anyway. And r'\' is not a valid Python string (SyntaxError: EOL while scanning string literal). However, r'\\' means two backslashes because in a raw string the first backslash doesn't escape the 2nd backslash. Ie '\\' vs r'\\' == 1 backslash vs 2 backslashes. And if you want 3 or any odd number of number of backslashes.
Btw, replacement order is important because if you did it in reverse, then the backslash added for the apostrophe would then get escaped further:
print(file_name.replace("'", r"\'").replace('\\', r'\\')) # WRONG!
# tim\\'s file\\has slashes
And do use f-strings for the query, it's much more readable:
f"name='{file_name}'"
# "name='tim\\'s file\\\\has slashes'"
print(f"name='{file_name}'")
# name='tim\'s file\\has slashes'
If you are passing filename in apostrophes, you should replace only them. This is the proper way to do it:
file_name = file_name.replace("'", "\'")
Here's why:
>>> print('\'')
'
>>> print('\\'')
File "<stdin>", line 1
print('\\'')
^
SyntaxError: EOL while scanning string literal
You can also do something like that:
response = service.files().list(q = f"name='{file_name}'").execute()
It is a lot easier and more cleaner.
EDIT: I have read in docs that characters like \ should be replaced as well. So you can just replace \ with \\.

Python replace Double Backlash with Single Bcklash

I am trying to convert the window path in Pathlib to string.
However, I can't convert the \\ to \
The code I ran
fileDir = pathlib.Path(self.CURRENTDATAPATH)
fileExt = r"*.xlsx"
for item in list(pathlib.Path(fileDir).glob(fileExt)):
self.XLSXLIST.append( str(item).replace( '\\\\', "\\") )
Got the result:
['D:\\data\\test.xlsx']
I would like to get this result
['D:\data\test.xlsx']
Backslash is used to escape special character in string. To escape a backslash you should use another backslash infront of it '\\'
When contructing string, you can use a leading r symbol before the raw string to avoid escaping.
print(r'\a\b\c')
the output is
\a\b\c
The echo output will always display in the escaped style, but this will not effect your use.
# echo of string s=r'\a\b\c'
'\\a\\b\\c'
So, your code is running as you wish, and the output is correct, just with another displaying format.

Python path string adds an extra slash except for the last one

I have a path to a directory where I want to iterate over all of the files. It looks like this myPath = "D:\workspace\test\main\test docs" and when I print out myPath it looks like "D:\\workspace\test\\main\test docs" . As you can see it added a slash to every slash except the last one.
When I do for path, dirs, files in os.walk(myPath): it doesn't work if there isn'
t the extra slash. Why isn't python adding the extra slash to the last slash?
It was working on a different computer.
Because '\t' is an escape sequence that makes sense: it is a tab, like is specified in the 2.4.1 String literals section. The others only happen not to make sense here, so Python will escape these for you (for free).
You can thus add the extra backslash like:
myPath = "D:\\workspace\\test\\main\\test docs"
or you can use a raw string, by prefixing it with r:
myPath = r"D:\workspace\test\main\test docs"
In that case:
Unless an r' orR' prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C.
So that means the backslash (\) is not interpreted as something special, but only as a backslash.

split a file based on string

I am trying to split one big file into individual entries. Each entry ends with the character “//”. So when I try to use
#!/usr/bin/python
import sys,os
uniprotFile=open("UNIPROT-data.txt") #read original alignment file
uniprotFileContent=uniprotFile.read()
uniprotFileList=uniprotFileContent.split("//")
for items in uniprotFileList:
seqInfoFile=open('%s.dat'%items[5:14],'w')
seqInfoFile.write(str(items))
But I realised that there is another string with “//“(http://www.uniprot.org/terms)
hence it splits there as well and eventually I don’t get the result I want. I tried using regex but was not abler to figure it out.
Use a regex that only splits on // if it's not preceded by :
import re
myre = re.compile("(?<!:)//")
uniprotFileList = myre.split(uniprotFileContent)
I am using the code with modified split pattern and it works fine for me:
#!/usr/bin/python
import sys,os
uniprotFile = open("UNIPROT-data.txt")
uniprotFileContent = uniprotFile.read()
uniprotFileList = uniprotFileContent.split("//\n")
for items in uniprotFileList:
seqInfoFile = open('%s.dat' % items[5:17], 'w')
seqInfoFile.write(str(items))
You're confusing \ (backslash) and / (slash). You don't need to escape a slash, just use "/". For a backslash, you do need to escape it, so use "\\".
Secondly, if you split with a backslash it will not split on a slash or vice-versa.
Split using a regular exception that doesn't permit the "http:" part before your // marker.
For example: "([^:])\/\/"
You appear to be splitting on the wrong characters. Based on your question, you should split on r"\", not "//". Open a prompt and inspect the strings you're using. You'll see something like:
>>> "\\"
'\\'
>>> "\"
SyntaxError
>>> r"\"
'\\'
>>> "//"
'//'
So, you can use "\" or r"\" (I recommend r"\" for clarity in splitting and regex operations.

How can I put an actual backslash in a string literal (not use it for an escape sequence)?

I have this code:
import os
path = os.getcwd()
final = path +'\xulrunner.exe ' + path + '\application.ini'
print(final)
I want output like:
C:\Users\me\xulrunner.exe C:\Users\me\application.ini
But instead I get an error that looks like:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \xXX escape
I don't want the backslashes to be interpreted as escape sequences, but as literal backslashes. How can I do it?
Note that if the string should only contain a backslash - more generally, should have an odd number of backslashes at the end - then raw strings cannot be used. Please use How can I get a string with a single backslash in it? to close questions that are asking for a string with just a backslash in it. Use How to write string literals in python without having to escape them? when the question is specifically about wanting to avoid the need for escape sequences.
To answer your question directly, put r in front of the string.
final= path + r'\xulrunner.exe ' + path + r'\application.ini'
But a better solution would be os.path.join:
final = os.path.join(path, 'xulrunner.exe') + ' ' + \
os.path.join(path, 'application.ini')
(the backslash there is escaping a newline, but you could put the whole thing on one line if you want)
I will mention that you can use forward slashes in file paths, and Python will automatically convert them to the correct separator (backslash on Windows) as necessary. So
final = path + '/xulrunner.exe ' + path + '/application.ini'
should work. But it's still preferable to use os.path.join because that makes it clear what you're trying to do.
You can escape the slash. Use \\ and you get just one slash.
You can escape the backslash with another backslash (\\), but it won’t look nicer. To solve that, put an r in front of the string to signal a raw string. A raw string will ignore all escape sequences, treating backslashes as literal text. It cannot contain the closing quote unless it is preceded by a backslash (which will be included in the string), and it cannot end with a single backslash (or odd number of backslashes).
Another simple (and arguably more readable) approach is using string raw format and replacements like so:
import os
path = os.getcwd()
final = r"{0}\xulrunner.exe {0}\application.ini".format(path)
print(final)
or using the os path method (and a microfunction for readability):
import os
def add_cwd(path):
return os.path.join( os.getcwd(), path )
xulrunner = add_cwd("xulrunner.exe")
inifile = add_cwd("application.ini")
# in production you would use xulrunner+" "+inifile
# but the purpose of this example is to show a version where you could use any character
# including backslash
final = r"{} {}".format( xulrunner, inifile )
print(final)

Categories

Resources