How to search and get rid of this character? - python

Bulmaca-Zeka Oyunu<200f>
I have a lot of strings in a text file, and I noticed that one has this <200f> char. I want to find all entries that have this char and remove it. But in Vim I can't find it by searching '<200f>' using the search string '<200f>'. Probably it is one char not 6 individual chars.
In Python or VIM, how to remove or search them?

Those literal characters in angle brackets are how Vim handles some problematic unprintable characters. They are quite puzzling at first but they are really easy to figure out because they simply spell out the character's hexadecimal code.
In this case, <200f> is the literal representation of U+200F, which you search like this:
/\%u200f
So, if you want to get rid of the occurrences of <200f> in the current buffer, all you have to do is:
:%s/\%u200f//g
See :help \%u.

You can't find it in vim because you're not searching for the correct string.
Check your vim documentation: angle brackets are meta-characters. Simply escape them, and remove the undesired strings.
:%s/\<200f\>//g

In Python you could perhaps use string replacement :
line_of_text = "Bulmaca-Zeka Oyunu<200f>"
print(line_of_text.replace("<200f>", "")
Output :
Bulmaca-Zeka Oyunu

Related

Replace string with quotes, brackets, braces, and slashes in python

I have a string where I am trying to replace ["{\" with [{" and all \" with ".
I am struggling to find the right syntax in order to do this, does anyone have a solid understanding of how to do this?
I am working with JSON, and I am inserting a string into the JSON properties. This caused it to put a single quotes around my inserted data from my variable, and I need those single quotes gone. I tried to do json.dumps() on the data and do a string replace, but it does not work.
Any help is appreciated. Thank you.
You can use the replace method.
See documentation and examples here
I would recommend maybe posting more of your code below so we can suggest a better answer. Just based on the information you have provided, I would say that what you are looking for are escape characters. I may be able to help more once you provide us with more info!
Use the target/replacement strings as arguments to replace().
The general format is mystring = mystring.replace("old_text", "new_text")
Since your target strings have backslashes, you also probably want to use raw strings to prevent them from being interpreted as special characters.
mystring = "something"
mystring = mystring.replace(r'["{\"', '[{"')
mystring = mystring.replace(r'\"', '"')
if its two characters you want to replace then you have to first check for first character and then the second(which should be present just after the first one and so on) and shift(shorten the whole array by 3 elements in first case whenever the condition is satisfied and in the second case delete \ from the array.
You can also find particular substring by using inbuilt function and replace it by using replace() function to insert the string you want in its place

Escape Windows's Path Delimiter

I need to change this string by escaping the windows path delimiters. I don't define the original string myself, so I can't pre-pend the raw string 'r'.
I need this:
s = 'C:\foo\bar'
to be this:
s = 'C:\\foo\\bar'
Everything I can find here and elsewhere says to do this:
s.replace( r'\\', r'\\\\' )
(Why I should have to escape the character inside a raw string I can't imagine)
But printing the string results in this. Obviously something has decided to re-interpret the escapes in the modified string:
C:♀oar
This would be so simple in Perl. How do I solve this in Python?
After a bunch of questions back and forth, the actual problem is this:
You have a file with contents like this:
C:\foo\bar
C:\spam\eggs
You want to read the contents of that file, and use it as pathnames, and you want to know how to escape things.
The answer is that you don't have to do anything at all.
Backslash sequences are processed in string literals, not in string objects that you read from a file, or from input (in 3.x; in 2.x that's raw_input), etc. So, you don't need to escape those backslash sequences.
If you think about it, you don't need to add quotes around a string to turn it into a string. And this is exactly the same case. The quotes and the escaping backslashes are both part of the string's representation, not the string itself.
In other words, if you save that example file as paths.txt, and you run the following code:
with open('paths.txt') as f:
file_paths = [line.strip() for line in f]
literal_paths = [r'C:\foo\bar', r'C:\spam\eggs']
print(file_paths == literal_paths)
… it will print out True.
Of course if your file was generated incorrectly and is full of garbage like this:
C:♀oar
Then there is no way to "escape the backslashes", because they're not there to escape. You can try to write heuristic code to reconstruct the original data that was supposed to be there, but that's the best you can do.
For example, you could do something like this:
backslash_map = { '\a': r'\a', '\b': r'\b', '\f': r'\f',
'\n': r'\n', '\r': r'\r', '\t': r'\t', '\v': r'\v' }
def reconstruct_broken_string(s):
for key, value in backslash_map.items():
s = s.replace(key, value)
return s
But this won't help if there were any hex, octal, or Unicode escape sequences to undo. For example, 'C:\foo\x02' and 'C:\foo\b' both represent the exact same string, so if you get that string, there's no way to know which one you're supposed to convert to. That's why the best you can do is a heuristic.
Don't do s.replace(anything). Just stick an r in front of the string literal, before the opening quote, so you have a raw string. Anything based on string replacement would be a horrible kludge, since s doesn't actually have backslashes in it; your code has backslashes in it, but those don't become backslashes in the actual string.
If the string actually has backslashes in it, and you want the string to have two backslashes wherever there once was one, you want this:
s = s.replace('\\', r'\\')
That'll replace any single backslash with two backslashes. If the string literally appears in the source code as s = 'C:\foo\bar', though, the only reasonable solution is to change that line. It's broken, and anything you do to the rest of the code won't make it not broken.

python replace backslashes to slashes

How can I escape the backslashes in the string: 'pictures\12761_1.jpg'?
I know about raw string. How can I convert str to raw if I take 'pictures\12761_1.jpg' value from xml file for example?
You can use the string .replace() method along with rawstring.
Python 2:
>>> print r'pictures\12761_1.jpg'.replace("\\", "/")
pictures/12761_1.jpg
Python 3:
>>> print(r'pictures\12761_1.jpg'.replace("\\", "/"))
pictures/12761_1.jpg
There are two things to notice here:
Firstly to read the text as a drawstring by putting r before the
string. If you don't give that, there will be a Unicode error here.
And also that there were two backslashes given inside the replace method's first argument. The reason for that is that backslash is a literal used with other letters to work as an escape sequence. Now you might wonder what is an escape sequence. So an escape sequence is a sequence of characters that doesn't represent itself when used inside string literal or character. It is composed of two or more characters starting with a backslash. Like '\n' represents a newline and similarly there are many. So to escape backslash itself which is usually an initiation of an escape sequence, we use another backslash to escape it.
I know the second part is bit confusing but I hope it made some sense.
You can also use split/join:
print "/".join(r'pictures\12761_1.jpg'.split("\\"))
EDITED:
The other way you may use is to prepare data during it's retrieving(e.g. the idea is to update string before assign to variable) - for example:
f = open('c:\\tst.txt', "r")
print f.readline().replace('\\','/')
>>>'pictures/12761_1.jpg\n'
I know it is not what you asked exactly, but I think this will work better.
Tit's better to just have the names of your directories and use os.path.join(directory,filename)
"os.path.join(path, *paths)
Join one or more path components intelligently. The return value is the concatenation of path and any members of *paths with exactly one directory separator (os.sep) following each non-empty part except the last, meaning that the result will only end in a separator if the last part is empty. If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component"
https://docs.python.org/2/library/os.path.html

Weird Python Regex Issues

whitespace_pattern = u"\s" # bug: tried to use unicode \u0020, broke regex
time_sig_pattern = \
"""^%(ws)s*time signature:%(ws)s*(?P<top>\d+)%(ws)s*\/%(ws)s*(?P<bottom>\d+)%(ws)s*$""" %{"ws": whitespace_pattern}
time_sig = compile(time_sig_pattern, U|M)
For some reason, adding the Verbose flag, X, to compile breaks the pattern.
Also, I wanted to use unicode for whitespace_pattern recognition (supposedly, we'll get patterns that use non-unicode spaces and we need to explicitly check for that one unicode character as a valid space), but the pattern keeps breaking.
VERBOSE gives you the ability to write comments in your regex to document it.
In order to do so, it ignores spaces, since you need to use line breaks to write comments.
Replace all spaces in your regex by \s to specify they are spaces you want to match in your pattern, and not just some spaces to format your comments.
What's more, you may want to use the r prefix for the string you use as a pattern. It tells Python not to interpret special notations such as \n and let you use backslashes without escaping them.
Always define regexes with the r prefix to indicate they are raw strings.
r"""^%(ws)s*time signature:%(ws)s*(?P<top>\d+)%(ws)s*\/%(ws)s*(?P<bottom>\d+)%(ws)s*$""" %{"ws": whitespace_pattern}
When creating a regex to match unicode characters you do not want to use a Python unicode string. In your example regular expression needs to see the literal characters \u0020, so you should use whitespace_pattern = r"\u0020" instead of u"\u0020".
As other answers have mentioned, you should also use the r prefix for time_sig_pattern, after those two changes your code should work fine.
For VERBOSE to work correctly you need to escape all whitespace in the pattern, so towards the beginning of the pattern replace the space in time signature with "\ " (quotes for clarity), \s, or [ ] as documented here.

Python Regex (Search Multiple values in one string)

In python regex how would I match against a large string of text and flag if any one of the regex values are matched... I have tried this with "|" or statements and i have tried making a regex list.. neither worked for me.. here is an example of what I am trying to do with the or..
I think my "or" gets commented out
patterns=re.compile(r'[\btext String1\b] | [\bText String2\b]')
if(patterns.search(MyTextFile)):
print ("YAY one of your text patterns is in this file")
The above code always says it matches regardless if the string appears and if I change it around a bit I get matches on the first regex but never checks the second.... I believe this is because the "Raw" is commenting out my or statement but how would I get around this??
I also tried to get around this by taking out the "Raw" statement and putting double slashes on my \b for escaping but that didn't work either :(
patterns=re.compile(\\btext String1\\b | \\bText String2\\b)
if(patterns.search(MyTextFile)):
print ("YAY one of your text patterns is in this file")
I then tried to do 2 separate raw statements with the or and the interpreter complains about unsupported str opperands...
patterns=re.compile(r'\btext String1\b' | r'\bText String2\b')
if(patterns.search(MyTextFile)):
print ("YAY one of your text patterns is in this file")
patterns=re.compile(r'(\btext String1\b)|(\bText String2\b)')
You want a group (optionally capturing), not a character class. Technically, you don't need a group here:
patterns=re.compile(r'\btext String1\b|\bText String2\b')
will also work (without any capture).
The way you had it, it checked for either one of the characters between the first square brackets, or one of those between the second pair. You may find a regex tutorial helpful.
It should be clear where the "unsupported str operands" error comes from. You can't OR strings, and you have to remember the | is processed before the argument even gets to compile.
This part [\btext String1\b] means is there a "word separator" or one of the letters in "text String1" present. So that matches anything but an empty line I think.
In a RE pattern, square brackets [ ] indicate a "character class" (depending on what's inside them, "any one of these character" or "any character except one of these", the latter indicate by a caret ^ as the first character after the opening [). This is what you're expressing and it has absolutely nothing to do with what you want -- just remove the brackets and you should be fine;-).

Categories

Resources