Replace ",**" with a linebreak using RegEx (or something else) - python

I'm getting started with RegEx and I was wondering if anyone could help me craft a statement to convert coordinates as follows:
145.00694,-37.80421,9 145.00686,-37.80382,9 145.00595,-37.8035,16 145.00586,-37.80301,16
to
145.00694,-37.80421
145.00686,-37.80382
145.00595,-37.8035
145.00586,-37.80301
(Strip off the last comma and value and turn it into a line break.)
I can't figure out how to use wildcards to do something like that. Any help would be greatly appreciated! Thanks.

"Some people, when confronted with a
problem, think 'I know, I'll use
regular expressions.' Now they have
two problems." --Jamie Zawinski
Avoid that problem and use string methods:
s="145.00694,-37.80421,9 145.00686,-37.80382,9 145.00595,-37.8035,16 145.00586,37.80301,16"
lines = s.split(' ') # each line is separated by ' '
for line in lines:
a,b,c=line.split(',') # three parts, separated by ','
print a,b
Regex have their uses, but this is not one of them.

>>> import re
>>> s="145.00694,-37.80421,9 145.00686,-37.80382,9 145.00595,-37.8035,16 145.00586,-37.80301,16"
>>> print re.sub(",\d*\w","\n",s)
145.00694,-37.80421
145.00686,-37.80382
145.00595,-37.8035
145.00586,-37.80301

String methods seem to suffice here, regex are overkill:
>>> s='145.00694,-37.80421,9 145.00686,-37.80382,9 145.00595,-37.8035,16 145.00586,-37.80301,16'
>>> print('\n'.join(line.rpartition(',')[0] for line in s.split()))
145.00694,-37.80421
145.00686,-37.80382
145.00595,-37.8035
145.00586,-37.80301

>>> s = '145.00694,37.80421,9 145.00686,-37.80382,9 145.00595,-37.8035,16 145.00586,-37.80301,16
>>> patt = '(%s,%s),%s' % (('[+-]?\d+\.?\d*', )*3)
>>> m = re.findall(patt, s)
>>> m
['145.00694,37.80421', '145.00686,-37.80382', '145.00595,-37.8035', '145.00586,-37.80301']
>>> print '\n'.join(m)
145.00694,37.80421
145.00686,-37.80382
145.00595,-37.8035
145.00586,-37.80301
but I prefer not use regular expressions in this case
I like SilentGhost solution

Related

python 3 remove before and after on string

I have this string /1B5DB40?full and I want to convert it to 1B5DB40.
I need to remove the ?full and the front /
My site won't always have ?full at the end so I need something that will still work even if the ?full is not there.
Thanks and hopefully this isn't too confusing to get some help :)
EDIT:
I know I could slice at 0 and 8 or whatever, but the 1B5DB40 could be longer or shorter. For example it could be /1B5DB4000?full or /1B5
Using str.lstrip (to remove leading /) and str.split (to remove optinal part after ?):
>>> '/1B5DB40?full'.lstrip('/').split('?')[0]
'1B5DB40'
>>> '/1B5DB40'.lstrip('/').split('?')[0]
'1B5DB40'
or using urllib.parse.urlparse:
>>> import urllib.parse
>>> urllib.parse.urlparse('/1B5DB40?full').path.lstrip('/')
'1B5DB40'
>>> urllib.parse.urlparse('/1B5DB40').path.lstrip('/')
'1B5DB40'
You can use lstrip and rstrip:
>>> data.lstrip('/').rstrip('?full')
'1B5DB40'
This only works as long as you don't have the characters f, u, l, ?, / in the part that you want to extract.
You can use regular expressions:
>>> import re
>>> extract = re.compile('/?(.*?)\?full')
>>> print extract.search('/1B5DB40?full').group(1)
1B5DB40
>>> print extract.search('/1Buuuuu?full').group(1)
1Buuuuu
What about regular expressions?
import re
re.search(r'/(?P<your_site>[^\?]+)', '/1B5DB40?full').group('your_site')
In this case it matches everything that is between '/' and '?', but you can change it to your specific requirements
>>> '/1B5DB40?full'split('/')[1].split('?')[0]
'1B5DB40'
>>> '/1B5'split('/')[1].split('?')[0]
'1B5'
>>> '/1B5DB40000?full'split('/')[1].split('?')[0]
'1B5DB40000'
Split will simply return a single element list containing the original string if the separator is not found.

Complex regex in Python

I am trying to write a generic pattern using regex so that it fetches only particular things from the string. Let's say we have strings like GigabitEthernet0/0/0/0 or FastEthernet0/4 or Ethernet0/0.222. The regex should fetch the first 2 characters and all the numerals. Therefore, the fetched result should be something like Gi0000 or Fa04 or Et00222 depending on the above cases.
x = 'GigabitEthernet0/0/0/2
m = re.search('([\w+]{2}?)[\\\.(\d+)]{0,}',x)
I am not able to understand how shall I write the regular expression. The values can be fetched in the form of a list also. I write few more patterns but it isn't helping.
In regex, you may use re.findall function.
>>> import re
>>> s = 'GigabitEthernet0/0/0/0 '
>>> s[:2]+''.join(re.findall(r'\d', s))
'Gi0000'
OR
>>> ''.join(re.findall(r'^..|\d', s))
'Gi0000'
>>> ''.join(re.findall(r'^..|\d', 'Ethernet0/0.222'))
'Et00222'
OR
>>> s = 'GigabitEthernet0/0/0/0 '
>>> s[:2]+''.join([i for i in s if i.isdigit()])
'Gi0000'
z="Ethernet0/0.222."
print z[:2]+"".join(re.findall(r"(\d+)(?=[\d\W]*$)",z))
You can try this.This will make sure only digits from end come into play .
Here is another option:
s = 'Ethernet0/0.222'
"".join(re.findall('^\w{2}|[\d]+', s))

In Python how to strip dollar signs and commas from dollar related fields only

I'm reading in a large text file with lots of columns, dollar related and not, and I'm trying to figure out how to strip the dollar fields ONLY of $ and , characters.
so say I have:
a|b|c
$1,000|hi,you|$45.43
$300.03|$MS2|$55,000
where a and c are dollar-fields and b is not.
The output needs to be:
a|b|c
1000|hi,you|45.43
300.03|$MS2|55000
I was thinking that regex would be the way to go, but I can't figure out how to express the replacement:
f=open('sample1_fixed.txt','wb')
for line in open('sample1.txt', 'rb'):
new_line = re.sub(r'(\$\d+([,\.]\d+)?k?)',????, line)
f.write(new_line)
f.close()
Anyone have an idea?
Thanks in advance.
Unless you are really tied to the idea of using a regex, I would suggest doing something simple, straight-forward, and generally easy to read:
def convert_money(inval):
if inval[0] == '$':
test_val = inval[1:].replace(",", "")
try:
_ = float(test_val)
except:
pass
else:
inval = test_val
return inval
def convert_string(s):
return "|".join(map(convert_money, s.split("|")))
a = '$1,000|hi,you|$45.43'
b = '$300.03|$MS2|$55,000'
print convert_string(a)
print convert_string(b)
OUTPUT
1000|hi,you|45.43
300.03|$MS2|55000
A simple approach:
>>> import re
>>> exp = '\$\d+(,|\.)?\d+'
>>> s = '$1,000|hi,you|$45.43'
>>> '|'.join(i.translate(None, '$,') if re.match(exp, i) else i for i in s.split('|'))
'1000|hi,you|45.43'
It sounds like you are addressing the entire line of text at once. I think your first task would be to break up your string by columns into an array or some other variables. Once you've don that, your solution for converting strings of currency into numbers doesn't have to worry about the other fields.
Once you've done that, I think there is probably an easier way to do this task than with regular expressions. You could start with this SO question.
If you really want to use regex though, then this pattern should work for you:
\[$,]\g
Demo on regex101
Replace matches with empty strings. The pattern gets a little more complicated if you have other kinds of currency present.
I Try this regex take if necessary.
\$(\d+)[\,]*([\.]*\d*)
SEE DEMO : http://regex101.com/r/wM0zB6/2
Use the regexx
((?<=\d),(?=\d))|(\$(?=\d))
eg
import re
>>> x="$1,000|hi,you|$45.43"
re.sub( r'((?<=\d),(?=\d))|(\$(?=\d))', r'', x)
'1000|hi,you|45.43'
Try the below regex and then replace the matched strings with \1\2\3
\$(\d+(?:\.\d+)?)(?:(?:,(\d{2}))*(?:,(\d{3})))?
DEMO
Defining a black list and checking if the characters are in it, is an easy way to do this:
blacklist = ("$", ",") # define characters to remove
with open('sample1_fixed.txt','wb') as f:
for line in open('sample1.txt', 'rb'):
clean_line = "".join(c for c in line if c not in blacklist)
f.write(clean_line)
\$(?=(?:[^|]+,)|(?:[^|]+\.))
Try this.Replace with empty string.Use re.M option.See demo.
http://regex101.com/r/gT6kI4/6

Using regular expression to extract string

I need to extract the IP address from the following string.
>>> mydns='ec2-54-196-170-182.compute-1.amazonaws.com'
The text to the left of the dot needs to be returned. The following works as expected.
>>> mydns[:18]
'ec2-54-196-170-182'
But it does not work in all cases. For e.g.
mydns='ec2-666-777-888-999.compute-1.amazonaws.com'
>>> mydns[:18]
'ec2-666-777-888-99'
How to I use regular expressions in python?
No need for regex... Just use str.split
mydns.split('.', 1)[0]
Demo:
>>> mydns='ec2-666-777-888-999.compute-1.amazonaws.com'
>>> mydns.split('.', 1)[0]
'ec2-666-777-888-999'
If you wanted to use regex for this:
Regex String
ec2-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*
Alternative (EC2 Agnostic):
.*\b([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*
Replacement String
Regular: \1.\2.\3.\4
Reverse: \4.\3.\2.\1
Python code
import re
subject = 'ec2-54-196-170-182.compute-1.amazonaws.com'
result = re.sub("ec2-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*", r"\1.\2.\3.\4", subject)
print result
This regex will match (^[^.]+:
So Try this:
import re
string = "ec2-54-196-170-182.compute-1.amazonaws.com"
ip = re.findall('^[^.]+',string)[0]
print ip
Output:
ec2-54-196-170-182
Best thing is this will match even if the instance was ec2,ec3 so this regex is actually very much similar to the code of #mgilson

Python regular words cut

I have string: './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
I need string: '27-10-2011 17:07:02'
How can i do this in python?
There are many ways to do this, one way is to use str.partition:
text='./money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
before,_,after = text.partition('[')
print(after[:-1])
# 27-10-2011 17:07:02
Another is to use str.split:
before,after = text.split('[',1)
print(after[:-1])
# 27-10-2011 17:07:02
or str.find and str.rfind:
ind1 = text.find('[')+1
ind2 = text.rfind(']')
print(text[ind1:ind2])
All these methods rely on the desired substring immediately following the first left-bracket [.
The first two methods also rely on the desired substring ending at the next-to-last character in text. The last method (using rfind) searches from the right for the index of the right-bracket, so it is a little more general, and does not depend on quite so many (potential off-by-one) constants.
If your string has always the same structure this is probably the simplest solution:
s = r'./money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
s[s.find("[")+1:s.find("]")]
Update:
After seeing some of the other answers this is a slight improvement:
s[s.find("[")+1:-1]
Exploiting the fact that the closing square bracket is the last character in your string.
If the format is "fixed", you can also use this
>>> s = './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
>>> s[-20:-1:]
'27-10-2011 17:07:02'
>>>
You can also use regular expression:
import re
s = './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
print re.search(r'\[(.*?)\]', s).group(1)
Try with a regex :
import re
re.findall(".*\[(.*)\]", './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]')
>>> ['27-10-2011 17:07:02']
Probably the easiest way(if you know the string will always be in this format
>>> s = './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
>>> s[s.index('[') + 1:-1]
'27-10-2011 17:07:02'

Categories

Resources