lexical analysis python

lexical analysis python - python

Recently I made the following observation:
>>> x= "\'"
>>> x
"'"
>>> y="'"
>>> y
"'"
>>> print x
'
>>> print y
'
Can anyone please explain why is it so. I am using python 2.7.x. I know well about escape sequences.
I want to do the following:
I have a string with single quotes in it and I have to enter it in a database so I need to replace the instance of single quote(') with a backslash followed by a single quote(\'). How can I achieve this.

Inside a pair of "", you don't need to escape the ' character. You can, of course, but as you've seen it's unnecessary and has no effect whatsoever.
It'd be necessary to escape if you were to write a ' inside a pair of '' or a " inside a pair of "":
x = '\''
y = "\""
EDIT :
Regarding the last part in the question, added after the edit:
I have a string with single quotes in it and I have to enter it in a database so I need to replace the instance of single quote(') with a backslash followed by a single quote(\'). How can I achieve this
Any of the following will work, notice the use of raw strings for avoiding the need to escape special characters:
v = "\\'"
w = '\\\''
x = r'\''
y = r"\'"
print v, w, x, y
> \' \' \' \'

Related

Convert string to unicode view

I have a string
s = "Санкт-Петербург"
I want to convert the string to
\u0421\u0430\u043D\u043A\u0442-\u041F\u0435\u0442\u0435\u0440\u0431\u0443\u0440\u0433
My code
x = "Санкт-Петербург"
y = str(x.encode('unicode-escape')) # I want y to be string
print(y) # b'\\u0421\\u0430\\u043d\\u043a\\u0442-\\u041f\\u0435\\u0442\\u0435\\u0440\\u0431\\u0443\\u0440\\u0433'
What is the best way to get rid of b' and \\ ?

Use a slice of [2:-1], which will slice off the b' and the ending ', then replace all the '\' with ''. Here is how:
x = "Санкт-Петербург"
y = str(x.encode('unicode-escape'))
print(y[2:-1].replace('\\\\', '\\'))
Output:
\u0421\u0430\u043d\u043a\u0442-\u041f\u0435\u0442\u0435\u0440\u0431\u0443\u0440\u0433

All you need to do is replace 4 backslashes with 2 backslashes using replace() method. After that you can strip away the b' at the start and ' at the end using strip() method.
Here's how you can proceed:
y = y.replace('\\\\', '\\') # Replace backslashes
y = y.strip("b'") # Strip the unnecessary parts
print(y) # Print out the result
Output
\u0421\u0430\u043D\u043A\u0442-\u041F\u0435\u0442\u0435\u0440\u0431\u0443\u0440\u0433
PS: strip("b'") also strips the end of the string for single quotes.
EDIT
As pointed out by #Aplet123, strip("b'") might cause some issues with any string that contains an ASCII b or ' at the start or end. Thus, instead of strip(), string slice could be used. All you need to do is replace the line y = y.strip("b'") with:
y = y[2:][:-1]
Here, 2: will strip away the b' at the start and :-1 will strip away the trailing '

How do I trim a string after certain amount of characters appear more then once in Python?

I am trying to scan a string and every time it reads a certain character 3 times, I would like to cut the remaining string
for example:
The string "C:\Temp\Test\Documents\Test.doc" would turn into "C:\Temp\Test\"
Every time the string hits "\" 3 times it should trim the string
here is my code that I am working on
prefix = ["" for x in range(size)]
num = 0
...
...
for char in os.path.realpath(src):
for x in prefix:
x = char
if x =='\': # I get an error here
num = num + 1
if num == 3:
break
print (num)
print(prefix)
...
...
the os.path.realpath(src) is the string with with the filepath. The "prefix" variable is the string array that I want to store the trimmed string.
Please let me know what I need to fix or if there is a simpler way to perform this.

Do split and then slice list to grab required and join:
s = 'C:\Temp\Test\Documents\Test.doc'
print('\\'.join(s.split('\\')[:3]) + '\\')
# C:\Temp\Test\
Note that \ (backslash) is an escaping character. To specifically mean a backslash, force it to be a backslash by adding a backslash before backslash \\, thereby removing the special meaning of backslash.

In python the backslash character is used as an escape character. If you do \n it does a newline, \t does a tab. There are many other things such as \" lets you do a quote in a string. If you want a regular backslash you should do "\\"

try
s = "C:\\Temp\\Test\\Documents\\Test.doc"
answer = '\\'.join(s.split('\\', 3)[:3])

Something like this would do..
x = "C:\Temp\Test\Documents\Test.doc"
print('\\'.join(x.split("\\")[:3])+"\\")

Text stripping issue

Apologies in advance if this turns out to be a PEBKAC issue, but I can't see what I'm doing wrong.
Python 3.5.1 (FWIW)
I've pulled data from an online source, each line of the page is .strip() 'ed of \r\n, etc. and converted to a utf-8 string. The lines I'm looking for are reduced further below.
I want to take two strings, join them and strip out all the non-alphanumerics.
> x = "ABC"
> y = "Some-text as an example."
> z = x+y.lower()
> type z
<class 'str'>
So here's the problem.
> z = z.strip("'-. ")
> print z
Why is the result:
ABCsome-text as an example.
and not, as I would like:
ABCsometextasanexample
I can get it to work with four .replace() commands, but strip really doesn't want to work here. I've also tried separate split commands:
> y = y.strip("-")
> print(y)
some-text as an example.
Whereas
> y.replace("-", '')
> print(y)
sometext as an example.
Any thoughts on what I might be doing wrong with .strip()?

Since you wish to remove all the non-alphanumeric characters, lets make it more generic using:
import re
x = "ABC"
y = "Some-text as an example."
z = x+y.lower()
z = re.sub(r'\W+', '', z)

Strip doesn't strip all characters, it only removes characters from the ends of strings.
From the official documentation
Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped

Another solution would be using python's filter():
import re
x = "ABC"
y = "Some-text as an example."
z = x+y.lower()
z = filter(lambda c: c.isalnum(), z)

As others have pointed out, the problem with strip() is that it only operates on characters at the beginning and end of strings—so using replace() multiple times would be the way to accomplish what you want using just string methods.
Although not the question you asked, here's how to do it using one call to do with the re.sub() function in the re regular-expression module. The arbitrary characters to be replaced are defined by the contents of the string variable name chars.
import re
x = "ABC"
y = "Some-text as an example."
z = x + y.lower()
print('before: {!r}'.format(z)) # -> before: 'ABCsome-text as an example.'
chars = "'-. " # Characters to be replaced.
z = re.sub('(' + '|'.join(re.escape(ch) for ch in chars) + ')', '', z)
print('after: {!r}'.format(z)) # -> after: 'ABCsometextasanexample'

Python replace/delete special characters

character = (%.,-();'0123456789-—:`’)
character.replace(" ")
character.delete()
I want to delete or replace all the special characters and numbers from my program, I know it can be done in the one string just not sure how to space all the special characters with quotes or anything. Somehow I'm supposed to separate all the special character in the parenthesis just not sure how to break up and keep all the characters stored in the variable.

The translate method is my preferred way of doing this. Create a mapping between the chars you want mapped and then apply that table to your input string.
from string import maketrans
special = r"%.,-();'0123456789-—:`’"
blanks = " " * len(special)
table = maketrans(special, blanks)
input_string.translate(table)

Seems like a good application for filter
>>> s = 'This is a test! It has #1234 and letters?'
>>> filter(lambda i: i.isalpha(), s)
'ThisisatestIthasandletters'

You can have a function with an optional fill value, if not set it will just delete/remove the non alpha characters or you can specify a default replace value:
def delete_replace(s,fill_char = ""):
return "".join([x if x.isalpha() else fill_char for x in s])

Why are Python strings behaving funny?

I'm so confused... why/how is a different from b?! Why don't they print the same thing?
>>> a = '"'
>>> a
'"'
>>> b = "'"
>>> b
"'"

The strings are not presented differently. Their presentation is just adjusted to avoid having to quote the contained quote. Both ' and " are legal string literal delimiters.
Note that the contents of the string are very different. " is not the same string as '; a == b is (patently) False.
Python would have to use a \ backslash for the " or ' character otherwise. If you use both characters in a string, then python is forced to use quoting:
>>> '\'"'
'\'"'
>>> """Tripple quoted means you can use both without escaping them: "'"""
'Tripple quoted means you can use both without escaping them: "\''
As you can see, the string representation used by Python still uses single quotes and a backslash to represent that last string.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

lexical analysis python - python

Related

Convert string to unicode view

How do I trim a string after certain amount of characters appear more then once in Python?

Text stripping issue

Python replace/delete special characters

Why are Python strings behaving funny?

Categories

Resources