trying to search and replace "[" opening square bracket - python

I'm new to python, so please forgive me if this is a stupid question:
I have an XML file which I'm carrying out a number of search and replaces. One of the things I need to replace is a space followed by [ with just [ (no space before). I've tried the following but keep getting an error:
line = re.sub(' [','[',line)
and I assume its because the square brackets is used for wildcards, but I don't know what the syntax should be in order to get this to work
Any help appreciated
Thanks

No need for regex here at all. This will do fine:
line = line.replace(' [', '[')

You need to escape the [ with a backslash:
line = re.sub(r' \[','[',line)

>>> line=' [stuff]'
>>> re.sub(' \[','[',line)
'[stuff]'

Since you are using sub from the regular expression package, you need to escape the bracket since it is used to express character ranges in regular expressions (e.g. [a-z]):
line = re.sub(' \[','[',line)

Related

Find and print the number of non-alphanumeric characters in python

I'm doing a project and I'm having issues getting the re.findall to work properly. Here's the code this far (short and sweet):
pattern = ['^a-zA-Z0-9_']
results = re.findall(pattern, (str(lorem_ipsum))
print(len(results))
I'm getting a syntax error printing this way. Any help will be greatly appreciated. I'm crunched for time, and will be tweaking tomorrow when I have some more time.
You actually don't need any regex to do this. Simply use .isalnum()
text = "hello 23232#"
for character in text:
if not character.isalnum():
print("found: \'{}\'".format(character))
output:
found ' '
found '#'
You're missing a closing bracket after lorem ipsum, you'll also need to turn your pattern into a raw string. essentially pattern must be a string not a list. We add the r in front to make sure that backslashes are considered literally rather than needing to be escaped.
pattern = r'[^a-zA-Z0-9\_]'
results = re.findall(pattern, (str(lorem_ipsum)))
print(len(results))

Simple regex in python

I'm trying to simply get everything after the colon in the following:
hfarnsworth:204b319de6f41bbfdbcb28da724dda23
And then everything before the space in the following:
29ca0a80180e9346295920344d64d1ce ::: 25basement
Here's what I have:
for line in f:
line = line.rstrip() #to remove \n
line = re.compile('.* ',line) #everything before space.
print line
Any tips to point me in the corrent direction? Thanks!
Also, is re.compile the correct function to use if I want the matched string returned? I'm pretty new at python too.
Thanks!!
string = "hfarnsworth:204b319de6f41bbfdbcb28da724dda23"
print(string.split(":")[1:])
string = "29ca0a80180e9346295920344d64d1ce ::: 25basement"
print(string.split(" ")[0])
At first you should probably take a careful look at the doc for re.compile. It doesn't expect for second parameter to be a string to lookup. Try to use re.search or re.findall. E.g.:
>>> s = "29ca0a80180e9346295920344d64d1ce ::: 25basement"
>>> re.findall('(\S*) ', s)[0]
'29ca0a80180e9346295920344d64d1ce'
>>> re.search('(\S*) ', s).groups()
('29ca0a80180e9346295920344d64d1ce',)
BTW, this is not the task for regular expressions. Consider using some simple string operations (like split).
this regular expression seems to work
r"^(?:[^:]*\:)?([^:]*)(?::::.*)?$"

dealing with \n characters at end of multiline string in python

I have been using python with regex to clean up a text file. I have been using the following method and it has generally been working:
mystring = compiledRegex.sub("replacement",mystring)
The string in question is an entire text file that includes many embedded newlines. Some of the compiled regex's cover multiple lines using the re.DOTALL option. If the last character in the compiled regex is a \n the above command will substitute all matches of the regex except the match that ends with the final newline at the end of the string. In fact, I have had several other no doubt related problems dealing with newlines and multiple newlines when they appear at the very end of the string. Can anyone give me a pointer as to what is going on here? Thanks in advance.
If i correctly undestood you and all that you need is to get a text without newline at the end of the each line and then iterate over this text in order to find a required word than you can try to use the following:
data = (line for line in text.split('\n') if line.strip())# gives you all non empty lines without '\n'at the end
Now you can either search/replace any text you need using list slicing or regex functionality.
Or you can use replace in order to replace all '\n' to whenever you want:
text.replace('\n', '')
My bet is that your file does not end with a newline...
>>> content = open('foo').read()
>>> print content
TOTAL:.?C2
abcTOTAL:AC2
defTOTAL:C2
>>> content
'TOTAL:.?C2\nabcTOTAL:AC2\ndefTOTAL:C2'
...so the last line does not match the regex:
>>> regex = re.compile('TOTAL:.*?C2\n', re.DOTALL)
>>> regex.sub("XXX", content)
'XXXabcXXXdefTOTAL:C2'
If that is the case, the solution is simple: just match either a newline or the end of the file (with $):
>>> regex = re.compile('TOTAL:.*?C2(\n|$)', re.DOTALL)
>>> regex.sub("XXX", content)
'XXXabcXXXdefXXX'
I can't get a good handle on what is going on from your explanation but you may be able to fix it by replacing all multiple newlines with a single newline as you read in the file. Another option might be to just trim() the regex removing the \n at the end unless you need it for something.
Is the question mark to prevent the regex matching more than one iine at a time? If so then you probably want to be using the MULTILINE flag instead of DOTALL flag. The ^ sign will now match just after a new line or the beginning of a string and the $ sign will now match just before a newline character or the end of a string.
eg.
regex = re.compile('^TOTAL:.*$', re.MULTILINE)
content = regex.sub('', content)
However, this still leaves with the problem of empty lines. But why not just run one additional regex at the end that removes blank lines.
regex = re.compile('\n{2,}')
content = regex.sub('\n', content)

Python: pattern matching for a string

Im trying to check a file line by line for any_string=any_string. It must be that format, no spaces or anything else. The line must contain a string then a "=" and then another string and nothing else. Could someone help me with the syntax in python to find this please? =]
pattern='*\S\=\S*'
I have this, but im pretty sure its wrong haha.
Don't know if you are looking for lines with the same value on both = sides. If so then use:
the_same_re = re.compile(r'^(\S+)=(\1)$')
if values can differ then use
the_same_re = re.compile(r'^(\S+)=(\S+)$')
In this regexpes:
^ is the beginning of line
$ is the end of line
\S+ is one or more non space character
\1 is first group
r before regex string means "raw" string so you need not escape backslashes in string.
pattern = r'\S+=\S+'
If you want to be able to grab the left and right-hand sides, you could add capture groups:
pattern = r'(\S+)=(\S+)'
If you don't want to allow multiple equals signs in the line (which would do weird things), you could use this:
pattern = r'[^\s=]+=[^\s=]+'
I don't know what the tasks you want make use this pattern. Maybe you want parse configuration file.
If it is true you may use module ConfigParser.
Ok, so you want to find anystring=anystring and nothing else. Then no need regex.
>>> s="anystring=anystring"
>>> sp=s.split("=")
>>> if len(sp)==2:
... print "ok"
...
ok
Since Python 2.5 I prefer this to split. If you don't like spaces, just check.
left, _, right = any_string.partition("=")
if right and " " not in any_string:
# proceed
Also it never hurts to learn regular expressions.

What's a quick one-liner to remove empty lines from a python string?

I have some code in a python string that contains extraneous empty lines. I would like to remove all empty lines from the string. What's the most pythonic way to do this?
Note: I'm not looking for a general code re-formatter, just a quick one or two-liner.
Thanks!
How about:
text = os.linesep.join([s for s in text.splitlines() if s])
where text is the string with the possible extraneous lines?
"\n".join([s for s in code.split("\n") if s])
Edit2:
text = "".join([s for s in code.splitlines(True) if s.strip("\r\n")])
I think that's my final version. It should work well even with code mixing line endings. I don't think that line with spaces should be considered empty, but if so then simple s.strip() will do instead.
LESSON ON REMOVING NEWLINES and EMPTY LINES WITH SPACES
"t" is the variable with the text. You will see an "s" variable, its a temporary variable that only exists during the evaluation of the main set of parenthesis (forgot the name of these lil python things)
First lets set the "t" variable so it has new lines:
>>> t='hi there here is\na big line\n\nof empty\nline\neven some with spaces\n \nlike that\n\n \nokay now what?\n'
Note there is another way to set the varible using triple quotes
somevar="""
asdfas
asdf
asdf
asdf
asdf
""""
Here is how it looks when we view it without "print":
>>> t
'hi there here is\na big line\n\nof empty\nline\neven some with spaces\n \nlike that\n\n \nokay now what?\n'
To see with actual newlines, print it.
>>> print t
hi there here is
a big line
of empty
line
even some with spaces
like that
okay now what?
COMMAND REMOVE ALL BLANK LINES (INCLUDING SPACES):
So somelines newlines are just newlines, and some have spaces so they look like new lines
If you want to get rid of all blank looking lines (if they have just newlines, or spaces as well)
>>> print "".join([s for s in t.strip().splitlines(True) if s.strip()])
hi there here is
a big line
of empty
line
even some with spaces
like that
okay now what?
OR:
>>> print "".join([s for s in t.strip().splitlines(True) if s.strip("\r\n").strip()])
hi there here is
a big line
of empty
line
even some with spaces
like that
okay now what?
NOTE: that strip in t.strip().splitline(True) can be removes so its just t.splitlines(True), but then your output can end with an extra newline (so that removes the final newline). The strip() in the last part s.strip("\r\n").strip() and s.strip() is what actually removes the spaces in newlines and newlines.
COMMAND REMOVE ALL BLANK LINES (BUT NOT ONES WITH SPACES):
Technically lines with spaces should NOT be considered empty, but it all depends on the use case and what your trying to achieve.
>>> print "".join([s for s in t.strip().splitlines(True) if s.strip("\r\n")])
hi there here is
a big line
of empty
line
even some with spaces
like that
okay now what?
** NOTE ABOUT THAT MIDDLE strip **
That middle strip there, thats attached to the "t" variable, just removes the last newline (just as the previous note has stated). Here is how it would look like without that strip being there (notice that last newline)
With 1st example (removing newlines and newlines with spaces)
>>> print "".join([s for s in t.strip().splitlines(True) if s.strip("\r\n").strip()])
hi there here is
a big line
of empty
line
even some with spaces
like that
okay now what?
.without strip new line here (stackoverflow cant have me format it in).
With 2nd example (removing newlines only)
>>> print "".join([s for s in t.strip().splitlines(True) if s.strip("\r\n")])
hi there here is
a big line
of empty
line
even some with spaces
like that
okay now what?
.without strip new line here (stackoverflow cant have me format it in).
The END!
filter(None, code.splitlines())
filter(str.strip, code.splitlines())
are equivalent to
[s for s in code.splitlines() if s]
[s for s in code.splitlines() if s.strip()]
and might be useful for readability
By using re.sub function
re.sub(r'^$\n', '', s, flags=re.MULTILINE)
Here is a one line solution:
print("".join([s for s in mystr.splitlines(True) if s.strip()]))
This code removes empty lines (with or without whitespaces).
import re
re.sub(r'\n\s*\n', '\n', text, flags=re.MULTILINE)
IMHO shortest and most Pythonic would be:
str(textWithEmptyLines).replace('\n\n','')
This one will remove lines of spaces too.
re.replace(u'(?imu)^\s*\n', u'', code)
using regex
re.sub(r'^$\n', '', somestring, flags=re.MULTILINE)
And now for something completely different:
Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import string, re
>>> tidy = lambda s: string.join(filter(string.strip, re.split(r'[\r\n]+', s)), '\n')
>>> tidy('\r\n \n\ra\n\n b \r\rc\n\n')
'a\012 b \012c'
Episode 2:
This one doesn't work on 1.5 :-(
BUT not only does it handle universal newlines and blank lines, it also removes trailing whitespace (good idea when tidying up code lines IMHO) AND does a repair job if the last meaningful line is not terminated.
import re
tidy = lambda c: re.sub(
r'(^\s*[\r\n]+|^\s*\Z)|(\s*\Z|\s*[\r\n]+)',
lambda m: '\n' if m.lastindex == 2 else '',
c)
expanding on ymv's answer, you can use filter with join to get desired string,
"".join(filter(str.strip, sample_string.splitlines(True)))
I wanted to remove a bunch of empty lines and what worked for me was:
if len(line) > 2:
myfile.write(output)
I went with 2 since that covered the \r\n.
I did want a few empty rows just to make my formatting look better so in those cases I had to use:
print(" \n"

Categories

Resources