How is this weird behaviour of escaping special characters explained? - python

Just for fun, I wrote this simple function to reverse a string in Python:
def reverseString(s):
ret = ""
for c in s:
ret = c + ret
return ret
Now, if I pass in the following two strings, I get interesting results.
print reverseString("Pla\net")
print reverseString("Plan\et")
The output of this is
te
alP
te\nalP
My question is: Why does the special character \n get translated into a new line when passed into the function, but not when the function parses it together by reversing n\? Also, how could I stop the function from parsing \n and instead return n\?

You should take a look at the individual character sequences to see what happens:
>>> list("Pla\net")
['P', 'l', 'a', '\n', 'e', 't']
>>> list("Plan\et")
['P', 'l', 'a', 'n', '\\', 'e', 't']
So as you can see, \n is a single character while \e are two characters as it is not a valid escape sequence.
To prevent this from happening, escape the backslash itself, or use raw strings:
>>> list("Pla\\net")
['P', 'l', 'a', '\\', 'n', 'e', 't']
>>> list(r"Pla\net")
['P', 'l', 'a', '\\', 'n', 'e', 't']

The reason is that '\n' is a single character in the string. I'm guessing \e isn't a valid escape, so it's treated as two characters.
look into raw strings for what you want, or just use '\\' wherever you actually want a literal '\'

The translation is a function of python's syntax, so it only occurs during python's parsing of input to python itself (i.e. when python parses code). It doesn't occur at other times.
In the case of your programme, you have a string which by the time it is constructed as an object, contains the single character denoted by '\n', and a string which when constructed contains the sub-string '\e'. After you reverse them, python doesn't reparse them.

Related

Can you add characters from a string to a list?

I'm wondering if it's possible to take a string e.g. str(input()) and split it into individual chars, then add them to a list. I'm trying to make a simple script (something similar to a hangman game) and at the beginning I wrote this:
x=input('Choose word: ').lower()
letters=[]
letters.append(list(x))
print(letters)
but this code appends the whole list to a list and not individual chars
Edit: this outputs [['o', 'u', 't', 'p', 'u', 't']] meaning that the whole list got appended as one item, but I want this to output ['o', 'u', 't', 'p', 'u', 't'], how do I make it append individual chars and not the whole list
You are simply wrapping the char list in another list.
Try this one-liner instead:
print(list(x))
If you want to remove a character:
letters = list(x)
letters.remove('o')
print(letters)
Use extend instead of append function.
#Extend
x=input('Choose word: ').lower()
letters=[]
letters.extend(list(x))
print(letters)
# ['p', 'y', 't', 'h', 'o', 'n']
And to remove a character from a list while retaining position as blank after removing, use replace while within a list:
y=input("Choose a letter to remove: ").lower()
removed=[s.replace(y,'') for s in letters]
print(removed)
#['p', '', 't', 'h', 'o', 'n']
I hope this help, unless its different from what you want. Then let me know. Otherwise, happy coding!
You don't need to create an empty list and then populate it with individual letters. Simply apply the list() function directly for the user input to create it:
letters = list(input('Choose word: ').lower())
print(letters)
For adding letters from the other user input, use the same approach with the .extend() method:
letters.extend(input('Choose word: ').lower()) # No need to use list() here
A simple one liner:
x = input().lower().split()
print(x)
here we are taking the input and then we are converting to lowercase and then using the split function which will split the string on white spaces you can split the string on whatever string you feel like just give the string you want to split on as the argument in the split function for example:
x = input().lower().split(',')
print(x)
this will split on the ',' so you can give the input in csv format
You may use the + operator (preferably in the form of an augmented assignment statement, i.e. +=, for extending the list to an iterable.
No need to use the list() function here, because the string is iterable:
letters = []
letters += input('Choose word: ').lower()
print(letters)
this outputs [['o', 'u', 't', 'p', 'u', 't']] meaning that the whole
list got appended as one item, but i want this to output ['o', 'u', 't', 'p', 'u', 't']
Based on you comment, you can use:
x = [*input('Choose word: ').lower()]
print(x)
# ['p', 'y', 't', 'h', 'o', 'n']
Demo

How to convert a string to a list if the string has wild characters for a group of characters like [] or {}, ()

I have a string of this sort
s = 'a,s,[c,f],[f,t]'
I want to convert this to a list
S = ['a','s',['c','f'],['f','t']]
I tried using strip()
d = s.strip('][').split(',')
But it is not giving me the desired output:
output = ['a', 's', '[c', 'f]', '[f', 't']
You could use ast.literal_eval(), having first enclosed each element in quotes:
>>> qs = re.sub(r'(\w+)', r'"\1"', s) # add quotes
>>> ast.literal_eval('[' + qs + ']') # enclose in brackets & safely eval
['a', 's', ['c', 'f'], ['f', 't']]
You may need to tweak the regex if your elements can contain non-word characters.
This only works if your input string follows Python expression syntax or is sufficiently close to be mechanically converted to Python syntax (as we did above by adding quotes and brackets). If this assumption does not hold, you might need to look into using a parsing library. (You could also hand-code a recursive descent parser, but that'll probably be more work to do correctly than just using a parsing library.)
Alternative to ast.literal_eval you can use the json package with more or less the same restrictions of NPE's answer:
import re
import json
qs = re.sub(r'(\w+)', r'"\1"', s) # add quotes
ls = json.loads('[' + qs + ']')
print(ls)
# ['a', 's', ['c', 'f'], ['f', 't']]

How do I split up the following Python string in to a list of strings?

I have a string 'Predicate(big,small)'
How do I derive the following list of strings from that, ['Predicate','(','big',',','small',')']
The names can potentially be anything, and there can also be spaces between elements like so (I need to have the whitespace taken out of the list), Predicate (big, small)
So far I've tried this, but this is clearly not the result that I want
>>> str1 = 'Predicate(big,small)'
>>> list(map(str,str1))
Output:
['P', 'r', 'e', 'd', 'i', 'c', 'a', 't', 'e', '(', 'b', 'i', 'g', ',', 's', 'm', 'a', 'l', 'l', ')']
You can use re.split() to split your string on ( or ). You can capture the delimiters in the regex to include them in your final output. Combined with str.strip() to handle spaces and filtering out any ending empty strings you get something like:
import re
s = 'Predicate ( big ,small )'
[s.strip() for s in re.split(r'([\(\),])', s.strip()) if s]
# ['Predicate', '(', 'big', ',', 'small', ')']
You can use re here.
import re
text='Predicate(big,small)'
parsed=re.findall(r'\w+|[^a-zA-Z,\s])
# ['Predicate', '(', 'big', 'small', ')']
\w+ matches any word character (equal to [a-zA-Z0-9_]).
[^a-zA-Z,\s] matches a single character not present in the list.
\s for matching space.

Converting to lower-case: every letter gets tokenized

I have a text document that I want to convert to lower case, but when I do it in the following way every letter of my document gets tokenized. Why does it happen?
with open('assign_1.txt') as g:
assign_1 = g.read()
assign_new = [word.lower() for word in assign_1]
What I get:
assign_new
['b',
'a',
'n',
'g',
'l',
'a',
'd',
'e',
's',
'h',]
You iterated through the entire input, one character at a time, dropped each to lower-case, and specified the result as a list. It's simpler than that:
assign_lower = g.read().lower()
Using the variable "word" doesn't make you iterate over words -- assign_1 still a sequence of characters.
If you want to break this into words, use the split method ... which is independent of the lower-case operation.

Char list instead of a string in Python

I'm trying to run a command using subprocess.check_call(), but it appears from the output that the parameters I'm giving are not interpreted as a string but as a char list.
The command I'm trying to run: 7z x test.rar.
What actually is running:
subprocess.CalledProcessError: Command '['7z', 'x', 't', 'e', 's', 't', '.', 'r', 'a', 'r'] returned non-zero exit status 2.
For some reason, the file name is separated in the characters. What am I missing?
The code:
def main():
parser = argparse.ArgumentParser()
parser.add_argument("input", help="File/Folder to extract")
args = parser.parse_args()
extract_file(args.input)
def extract_file(file):
extract_cmd = ['7z']
extract_cmd.extend('x')
extract_cmd.extend(file)
subprocess.check_call(extract_cmd)
The documentation for lists and strings covers this quite accurately. extend puts new elements into a list, and you started with a list containing a single string. You never used the string concatenation operator. Try this, instead:
def extract_file(file_name):
extract_cmd = ["7z x " + file_name]
subprocess.check_call(extract_cmd)
Note: I changed your parameter, because file is a built-in type.
You can do two approaches here:
subprocess.check_call(['7z', 'x', file])
or if you are not too worried about a shell injection and this is local code, just pass the entire string:
subprocess.check_call('{} {} {}'.format('7z', 'x', file), shell=True)
Others have already chimed in on why you are seeing "chars" in your list. extend will listify your string (file_name in this case) and add it to the end of your list.
a = [11]
a.extend('apples')
print a
output:
[11, 'a', 'p', 'p', 'l', 'e', 's']

Categories

Resources