Using raw literal representations when working with variable strings - python

I am trying to use a for loop to copy files in different source folders into the same destination folder. I like to use literal representations to avoid issues with multiple backslashes in the file paths. I could not find the proper way to get a literal representation of a variable. Any tip would be appreciated. The code is below:
import shutil
destination_folder=DF
for i in range (1,3):
new_folder='folder_'+str(i)
new_path=os.path.join('C:\foo', new_folder, file_to_copy)
source_file= r(new_path) #WRONG
destination= r(destination_folder) #WRONG
shutil.copy(source_file, destination)

r is not a function that applies to string objects, it's a modifier that applies to string literals. It changes how the literal gets interpreted as a value. But once it's done, the value is just a plain old string value. In particular:
>>> a = '\n'
>>> b = '''
... '''
>>> a == b
True
So, if a and b are the same value, how can Python possibly know that you want to turn it into r'\n'?
For that matter, imagine this:
>>> c = sys.stdin.readline()
>>> c == a
True
Or this:
>>> d = chr(10)
>>> d == a
You can't go back and re-interpret the string literal as a raw string in any of these other cases—in b it would be unchanged, and in c and d there was no string literal in the first place.
If you want to escape all special characters in a string value, without caring where they came from, you can do that by asking Python to escape the string. For example:
>>> e = a.encode('unicode-escape').decode('ascii')
But you definitely don't want to do that for constructing filenames to pass to the shutil.copy function.
If you have a string literal in your code, and you want it to be treated as a raw string literal, just write it as a raw string literal. So:
new_path=os.path.join(r'C:\foo', new_folder, file_to_copy)
source_file= new_path
destination= destination_folder
You could instead manually escape the backslash in your literal, or use forward slashes instead of backslashes, etc. But those are all things you do to the literal before it gets evaluated by Python, not to the string after the fact.

The concept of a "literal representation" of a variable string doesn't really make sense.
If you have a variable called new_path, the value of this variable is simply a string value. The r prefix only applies to string literals.

Related

Python: How do string variables prevent escape?

>>>m = "\frac{7x+5}{1+y^2}"
>>>print(m)
rac{7x+5}{1+y^2}
>>>print(r""+m)
rac{7x+5}{1+y^2}
>>>print(r"{}".format(m))
rac{7x+5}{1+y^2}
>>>print(repr(m))
'\x0crac{7x+5}{1+y^2}'
I want the result:"\frac{7x+5}{1+y^2}"
Must be a string variable!!!
You need the string literal that contains the slash to be a raw string.
m = r"\frac{7x+5}{1+y^2}"
Raw strings are just another way of writing strings. They aren't a different type. For example r"" is exactly the same as "" because there are no characters to escape, it doesn't produce some kind of raw empty string and adding it to another string changes nothing.
Another option is to add the escape sign to the escape sign to signify that it is a string literal
m = "\\frac{7x+5}{1+y^2}"
print(m)
print(r""+m)
print(r"{}".format(m))
print(repr(m))
A good place to start is to read the docs here. So you can use either the escape character "\" as here
>>> m = "\\frac{7x+5}{1+y^2}"
>>> print(m)
\frac{7x+5}{1+y^2}
or use string literals, which takes the string to be as is
>>> m = r"\frac{7x+5}{1+y^2}"
>>> print(m)
\frac{7x+5}{1+y^2}

How i can replace in a list of strings "\\" with "\" in Python

In my code i have a list of locations and the output of a list is like this
['D:\\Todo\\VLC\\Daft_Punk\\One_more_time.mp4"', ...
i want a replace "\\" with "\"
(listcancion is a list with all strings)
i try to remplace with this code remplacement = [listcancion.replace('\\', '\') for listcancion in listcancion] or this remplacement = [listcancion.replace('\\\\', '\\') for listcancion in listcancion] or also this remplacement = [listcancion.replace('\\', 'X') for listcancion in listcancion]
listrandom = [remplacement.replace('X', '\') for remplacement in remplacement]
I need to change only the character \ i can't do it things like this ("\\Todo", "\Todo") because i have more characters to remplace.
If i can solved without imports thats great.
It is just a matter of string representations.
First, you have to differentiate between a string's "real" content and its representation.
A string's "real" content might be letters, digits, punctuation and so on, which makes displaying it quite easy. But imagine a strring which contains a, a line break and a b. If you print that string, you get the output
a
b
which is what you expect.
But in order to make it more compact, this string's representation is a\nb: the line break is represented as \n, the \ serving as an escape character. Compare the output of print(a) (which is the same as print(str(a))) and of print(repr(a)).
Now, in order not to confuse this with a string which contains a, \, n and b, a "real" backslash in a string has a representation of \\ and the same string, which prints as a\nb, has a representation of a\\nb in order to distinguish that from the first example.
If you print a list of anything, it is displayed as a comma-separated list of the representations of their components, even if they are strings.
If you do
for element in listcancion:
print(element)
you'll see that the string actually contains only one \ where its representation shows \\.
(Oh, and BTW, I am not sure that things like [listcancion.<something> for listcancion in listcancion] work as intended; better use another variable as the loop variablen, such as [element.<something> for element in listcancion].)

How to copy changing substring in string?

How can I copy data from changing string?
I tried to slice, but length of slice is changing.
For example in one case I should copy number 128 from string '"edge_liked_by":{"count":128}', in another I should copy 15332 from "edge_liked_by":{"count":15332}
You could use a regular expression:
import re
string = '"edge_liked_by":{"count":15332}'
number = re.search(r'{"count":(\d*)}', string).group(1)
Really depends on the situation, however I find regular expressions to be useful.
To grab the numbers from the string without caring about their location, you would do as follows:
import re
def get_string(string):
return re.search(r'\d+', string).group(0)
>>> get_string('"edge_liked_by":{"count":128}')
'128'
To only get numbers from the *end of the string, you can use an anchor to ensure the result is pulled from the far end. The following example will grab any sequence of unbroken numbers that is both preceeded by a colon and ends within 5 characters of the end of the string:
import re
def get_string(string):
rval = None
string_match = re.search(r':(\d+).{0,5}$', string)
if string_match:
rval = string_match.group(1)
return rval
>>> get_string('"edge_liked_by":{"count":128}')
'128'
>>> get_string('"edge_liked_by":{"1321":1}')
'1'
In the above example, adding the colon will ensure that we only pick values and don't match keys such as the "1321" that I added in as a test.
If you just want anything after the last colon, but excluding the bracket, try combining split with slicing:
>>> '"edge_liked_by":{"count":128}'.split(':')[-1][0:-1]
'128'
Finally, considering this looks like a JSON object, you can add curly brackets to the string and treat it as such. Then it becomes a nested dict you can query:
>>> import json
>>> string = '"edge_liked_by":{"count":128}'
>>> string = '{' + string + '}'
>>> string = json.loads(string)
>>> string.get('edge_liked_by').get('count')
128
The first two will return a string and the final one returns a number due to being treated as a JSON object.
It looks like the type of string you are working with is read from JSON, maybe you are getting it as the output of some API you are working with?
If it is JSON, you've probably gone one step too far in atomizing it to a string like this. I'd work with the original output, if possible, if I were you.
If not, to make it more JSON like, I'd convert it to JSON by wrapping it in {}, and then working with the json.loads module.
import json
string = '"edge_liked_by":{"count":15332}'
string = "{"+string+"}"
json_obj = json.loads(string)
count = json_obj['edge_liked_by']['count']
count will have the desired output. I prefer this option to using regular expressions because you can rely on the structure of the data and reuse the code in case you wish to parse out other attributes, in a very intuitive way. With regular expressions, the code you use will change if the data are decimal, or negative, or contain non-numeric characters.
Does this help ?
a='"edge_liked_by":{"count":128}'
import re
b=re.findall(r'\d+', a)[0]
b
Out[16]: '128'

Backward slash added when assigning in dictionary. how to avoid it

When i assign a windows path as a value in dictionary, the backward slash gets added.
I did try using raw string.
p = "c:\windows\pat.exe"
print p
c:\windows\pat.exe
d = {"p": p}
print d
{'p': 'c:\\windows\\pat.exe'}
Tried it as raw string
d = {"p": r"%s" % p}
print d
{'p': 'c:\\windows\\pat.exe'}
I dont want the backslash to added when assigned to value in dictionary.
This is a mistake that's very common among people new to Python.
TL;DR:
>>> print "c:\windows\pat.exe" == 'c:\\windows\\pat.exe'
True
Explanation:
In the first instance, where you're assigning a value to the string p and then printing p, Python gets the string to print itself and it does so by outputting its literal value. In your example:
>>> p = "c:\windows\pat.exe"
>>> print p
c:\windows\pat.exe
In Python 3, the same:
>>> p = "c:\windows\pat.exe"
>>> print(p)
c:\windows\pat.exe
In the second instance, since you're creating and then printing a dictionary, Python asks the dictionary to print itself. It does so by printing a short Python code representation of itself, since there is no standard simple way of printing a dictionary, like there is for variables with simple types like strings or numbers.
In your example (slightly modified to work by itself):
>>> d = {"p": "c:\windows\pat.exe"}
>>> print d
{'p': 'c:\\windows\\pat.exe'}
So, why does the value of p in the Python code representation have the double backslashes? Because a single backslash in a string literal has an ambiguous meaning. In your example, it just so happens that \w and \p don't have special meanings in Python. However, you've maybe seen things like \n and perhaps \t used in strings to represent a new line or a tab character.
For example:
>>> print "Hello\nworld!"
Hello
world!
So how does Python know when to print a new line and when to print \n literally, when you want to? It doesn't. It just assumes that if the character after the \ doesn't make for a special character, you probably wanted to write a \ and if it is, you wanted to write the special character. If you want to literally write a \, regardless of what follows, you need to follow up the escape character (that's what the \ is called in this context) with another one.
For example:
>>> print "I can see \\n"
I can see \n
That way, there is no ambiguity and Python knows exactly what is intended. You should always write backslashes as double backslashes in normal string literals, instead of relying on luck in avoiding control characters like \n or \t. And that's why Python, when printing its code version of your string "c:\windows\pat.exe", prefers to write it as 'c:\\windows\\pat.exe'. Using single quotes, which are preferred even though double quotes are fine too and using double backslashes.
It's just how it is written in code, "really" your string has single backslashes and the quotes are of course not part of it at all.
If you don't like having to write double backslashes, you can consider using 'raw strings', which is prefixing a string with r or R, telling Python to ignore special characters and take the string exactly as written in code:
>>> print r"This won't have \n a line break"
This won't have \n a line break
But watch out! This doesn't work if you want your last characters in the string to be an odd number of \, for reasons not worth getting into. In that case, you have no other recourse than writing the string with double backslashes:
>>> print r"Too bad\"
File "<stdin>", line 1
print r"Too bad\"
^
SyntaxError: EOL while scanning string literal
>>> print r"Too bad\\"
Too bad\\
>>> print "Too bad\\"
Too bad\
Maybe it is not a problem, because when you print the values (not the whole dictionary) the string will have one backslash
p = "c:\windows\pat.exe"
d = {"p": p}
print (d)
{'p': 'c:\\windows\\pat.exe'}
for i in d:
print("key:", i, " value:", d[i])
Output
{'p': 'c:\\windows\\pat.exe'}
key: p value: c:\windows\pat.exe
>>>

Replace string content with each others

I have a string: 1x22x1x.
I need to replace all 1 to 2 and vice versa. So example line would be 2x11x2x. Just wondering how is it done. I tried
a = "1x22x1x"
b = a.replace('1', '2').replace('2', '1')
print b
output is 1x11x1x
Maybe i should forget about using replace..?
Here's a way using the translate method of a string:
>>> a = "1x22x1x"
>>> a.translate({ord('1'):'2', ord('2'):'1'})
'2x11x2x'
>>>
>>> # Just to explain
>>> help(str.translate)
Help on method_descriptor:
translate(...)
S.translate(table) -> str
Return a copy of the string S, where all characters have been mapped
through the given translation table, which must be a mapping of
Unicode ordinals to Unicode ordinals, strings, or None.
Unmapped characters are left untouched. Characters mapped to None
are deleted.
>>>
Note however that I wrote this for Python 3.x. In 2.x, you will need to do this:
>>> from string import maketrans
>>> a = "1x22x1x"
>>> a.translate(maketrans('12', '21'))
'2x11x2x'
>>>
Finally, it is important to remember that the translate method is for interchanging characters with other characters. If you want to interchange substrings, you should use the replace method as Rohit Jain demonstrated.
One way is to use a some temporary string as intermediate replacement:
b = a.replace('1', '#temp_replace#').replace('2', '1').replace('#temp_replace#', '2')
But this may fail, if your string already contains #temp_replace#. This technique is also described in PEP 378
If the "sources" are all one character, you can make a new string:
>>> a = "1x22x1x"
>>> replacements = {"1": "2", "2": "1"}
>>> ''.join(replacements.get(c,c) for c in a)
'2x11x2x'
IOW, make a new string using the get method which accepts a default parameter. somedict.get(c,c) means something like somedict[c] if c in somedict else c, so if the character is in the replacements dictionary you use the associated value otherwise you simply use the character itself.

Categories

Resources