python: how to remove '$'? - python

All I want to do is remove the dollar sign '$'. This seems simple, but I really don't know why my code isn't working.
import re
input = '$5'
if '$' in input:
input = re.sub(re.compile('$'), '', input)
print input
Input still is '$5' instead of just '5'! Can anyone help?

Try using replace instead:
input = input.replace('$', '')
As Madbreaks has stated, $ means match the end of the line in a regular expression.
Here is a handy link to regular expressions: http://docs.python.org/2/library/re.html

In this case, I'd use str.translate
>>> '$$foo$$'.translate(None,'$')
'foo'
And for benchmarking purposes:
>>> def repl(s):
... return s.replace('$','')
...
>>> def trans(s):
... return s.translate(None,'$')
...
>>> import timeit
>>> s = '$$foo bar baz $ qux'
>>> print timeit.timeit('repl(s)','from __main__ import repl,s')
0.969965934753
>>> print timeit.timeit('trans(s)','from __main__ import trans,s')
0.796354055405
There are a number of differences between str.replace and str.translate. The most notable is that str.translate is useful for switching 1 character with another whereas str.replace replaces 1 substring with another. So, for problems like, I want to delete all characters a,b,c, or I want to change a to d, I suggest str.translate. Conversely, problems like "I want to replace the substring abc with def" are well suited for str.replace.
Note that your example doesn't work because $ has special meaning in regex (it matches at the end of a string). To get it to work with regex you need to escape the $:
>>> re.sub('\$','',s)
'foo bar baz qux'
works OK.

$ is a special character in regular expressions that translates to 'end of the string'
you need to escape it if you want to use it literally
try this:
import re
input = "$5"
if "$" in input:
input = re.sub(re.compile('\$'), '', input)
print input

You need to escape the dollar sign - otherwise python thinks it is an anchor http://docs.python.org/2/library/re.html
import re
fred = "$hdkhsd%$"
print re.sub ("\$","!", fred)
>> !hdkhsd%!

Aside from the other answers, you can also use strip():
input = input.strip('$')

Related

how to remove tokens that contains number followed by character using regular expression in python?

As the question says, how do I replace a token like '23abc' with '' using regular expression in python. It shouldn't affect for characters tokens like 'hello','jimmy','trip','travel' etc.
my code:
import re
str="23abcd"
print re.sub(r"[0-9a-z]","",str)
But the code doesn't work if str like 'hello' are passed. It still replaces with ''. Please help. Thanks.
Try this pattern:
re.sub(r"[0-9]+[a-z]+","",str)
It should be:
>>> import re
>>> pattern="23abcd"
>>> _str = "a mlmsm 23abcd smo jimmy"
>>> re.sub(pattern, "", _str)
'a mlmsm smo jimmy'

How to read access-log hosts with regex?

I have such entries:
e179206120.adsl.alicedsl.de
safecamp-plus-2098.unibw-hamburg.de
p5B30EBFE.dip0.t-ipconnect.de
and I would like to match only the main domain names like
alicedsl.de
unibw-hamburg.de
t-ipconnect.de
I tried this \.\w+\.\w+\.\w{2,3} but that matches .adsl.alicedsl.de
How about [^.]+\.\w+$
See it work
Or, in Python:
import re
tgt='''\
e179206120.adsl.alicedsl.de
safecamp-plus-2098.unibw-hamburg.de
p5B30EBFE.dip0.t-ipconnect.de'''
print re.findall(r'([^.]+\.\w+$)', tgt, re.M | re.S)
# ['alicedsl.de', 'unibw-hamburg.de', 't-ipconnect.de']
Regex explanation:
[^.]+ 1 or more characters EXCEPT a literal .
\. literal . It needs the \ because it would be any chaarcter to regex if not used
\w+ 1 or more characters in the ranges of [a-z] [A-Z] [0-9] [_] Potentially a better regex for TLD's in ASCII is [a-zA-Z]+ since there aren't any old TLD's that are not ASCII. If you want to manage newer Internationalized TLD's, you need a different regex.
$ assertion for the end of the line
You should know that you definition of TLD's is incomplete. For example, this regex approach will break on the legitimate url of bbc.co.uk and many others that include a common SLD. Use a library if you can for more general applicability. You can also use the mozilla list of TLD and SLD's to know when it is appropriate to include two periods in the definition of host.
You could use the following with your given data.
[^.]+\.[^.]+$
See Live demo
If you dont have restrictions on using external libraries, check out TLD extract library
https://pypi.python.org/pypi/tldextract
import tldextract
for input in ["e179206120.adsl.alicedsl.de", "safecamp-plus-2098.unibw-hamburg.de", "p5B30EBFE.dip0.t-ipconnect.de"]:
input_tld = tldextract.extract(input)
print input_tld.domain+"."+input_tld.suffix
You actually do not need Regex for this. A list comprehension will be far more efficient:
>>> mystr = """
... e179206120.adsl.alicedsl.de
... safecamp-plus-2098.unibw-hamburg.de
... p5B30EBFE.dip0.t-ipconnect.de
... """
>>> [".".join(line.rsplit(".", 2)[-2:]) for line in mystr.splitlines() if line]
['alicedsl.de', 'unibw-hamburg.de', 't-ipconnect.de']
>>>
Also, if you want it, here is a reference on Python's string methods (it explains str.splitlines, str.rsplit, and str.join).
If you run a speed test using timeit.timeit, you will see that the list comprehension is much faster:
>>> from timeit import timeit
>>> mystr = """
... e179206120.adsl.alicedsl.de
... safecamp-plus-2098.unibw-hamburg.de
... p5B30EBFE.dip0.t-ipconnect.de
... """
>>> def func():
... import re
... re.findall(r'([^.]+\.\w+$)', mystr, re.M | re.S)
...
>>> timeit("func()", "from __main__ import func") # Regex's time
51.85605544838802
>>> def func():
... [".".join(line.rsplit(".", 2)[-2:]) for line in mystr.splitlines() if line]
...
>>> timeit("func()", "from __main__ import func") # List comp.'s time
12.113929004943316
>>>

Using regular expression to extract string

I need to extract the IP address from the following string.
>>> mydns='ec2-54-196-170-182.compute-1.amazonaws.com'
The text to the left of the dot needs to be returned. The following works as expected.
>>> mydns[:18]
'ec2-54-196-170-182'
But it does not work in all cases. For e.g.
mydns='ec2-666-777-888-999.compute-1.amazonaws.com'
>>> mydns[:18]
'ec2-666-777-888-99'
How to I use regular expressions in python?
No need for regex... Just use str.split
mydns.split('.', 1)[0]
Demo:
>>> mydns='ec2-666-777-888-999.compute-1.amazonaws.com'
>>> mydns.split('.', 1)[0]
'ec2-666-777-888-999'
If you wanted to use regex for this:
Regex String
ec2-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*
Alternative (EC2 Agnostic):
.*\b([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*
Replacement String
Regular: \1.\2.\3.\4
Reverse: \4.\3.\2.\1
Python code
import re
subject = 'ec2-54-196-170-182.compute-1.amazonaws.com'
result = re.sub("ec2-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*", r"\1.\2.\3.\4", subject)
print result
This regex will match (^[^.]+:
So Try this:
import re
string = "ec2-54-196-170-182.compute-1.amazonaws.com"
ip = re.findall('^[^.]+',string)[0]
print ip
Output:
ec2-54-196-170-182
Best thing is this will match even if the instance was ec2,ec3 so this regex is actually very much similar to the code of #mgilson

regex and replace on string using python

I am rather new to Python Regex (regex in general) and I have been encountering a problem. So, I have a few strings like so:
str1 = r'''hfo/gfbi/mytag=a_17014b_82c'''
str2 = r'''/bkyhi/oiukj/game/?mytag=a_17014b_82c&'''
str3 = r'''lkjsd/image/game/mytag=a_17014b_82c$'''
the & and the $ could be any symbol.
I would like to have a single regex (and replace) which replaces:
mytag=a_17014b_82c
to:
mytag=myvalue
from any of the above 3 strings. Would appreciate any guidance on how I can achieve this.
UPDATE: the string to be replaced is always not the same. So, a_17014b_82c could be anything in reality.
If the string to be replaced is constant you don't need a regex. Simply use replace:
>>> str1 = r'''hfo/gfbi/mytag=a_17014b_82c'''
>>> str1.replace('a_17014b_82c','myvalue')
'hfo/gfbi/mytag=myvalue'
Use re.sub:
>>> import re
>>> r = re.compile(r'(mytag=)(\w+)')
>>> r.sub(r'\1myvalue', str1)
'hfo/gfbi/mytag=myvalue'
>>> r.sub(r'\1myvalue', str2)
'/bkyhi/oiukj/game/?mytag=myvalue&'
>>> r.sub(r'\1myvalue', str3)
'lkjsd/image/game/mytag=myvalue$'
import re
r = re.compile(r'(mytag=)\w+$')
r.sub(r'\1myvalue', str1)
This is based on #Ashwini's answer, two small changes are we are saying the mytag=a_17014b part should be at the end of input, so that even inputs such as
str1 = r'''/bkyhi/mytag=blah/game/?mytag=a_17014b_82c&'''
will work fine, substituting the last mytag instead of the the first.
Another small change is we are not unnecessarily capturing the \w+, since we aren't using it anyway. This is just for a bit of code clarity.

Python - Use a Regex to Filter Data

Is there a simple way to remove all characters from a given string that match a given regular expression? I know in Ruby I can use gsub:
>> key = "cd baz ; ls -l"
=> "cd baz ; ls -l"
>> newkey = key.gsub(/[^\w\d]/, "")
=> "cdbazlsl"
What would the equivalent function be in Python?
import re
re.sub(pattern, '', s)
Docs
The answers so far have focused on doing the same thing as your Ruby code, which is exactly the reverse of what you're asking in the English part of your question: the code removes character that DO match, while your text asks for
a simple way to remove all characters
from a given string that fail to match
For example, suppose your RE's pattern was r'\d{2,}', "two or more digits" -- so the non-matching parts would be all non-digits plus all single, isolated digits. Removing the NON-matching parts, as your text requires, is also easy:
>>> import re
>>> there = re.compile(r'\d{2,}')
>>> ''.join(there.findall('123foo7bah45xx9za678'))
'12345678'
Edit: OK, OP's clarified the question now (he did indeed mean what his code, not his text, said, and now the text is right too;-) but I'm leaving the answer in for completeness (the other answers suggesting re.sub are correct for the question as it now stands).
I realize you probably mean what you "say" in your Ruby code, and not what you say in your English text, but, just in case, I thought I'd better complete the set of answers!-)
import re
old = "cd baz ; ls -l"
regex = r"[^\w\d]" # which is the same as \W btw
pat = re.compile( regex )
new = pat.sub('', old )
re.subn() is your friend:
>>> import re
>>> key = "cd baz ; ls -l"
>>> re.subn(r'\W', "", key)
('cdbazlsl', 6)
>>> re.subn(r'\W', "", key)[0]
'cdbazlsl'
Returns a tuple. Take the first element if you only want the resulting string. Or just call re.sub(), as SilentGhost notes. (Which is to say, his answer is more exact.)
May be the shortest way:
In [32]: pattern='[-0-9.]'
....: price_str="¥-607.6B"
....: ''.join(re.findall(pattern,price_str))
Out[32]: '-607.6'

Categories

Resources