Extract substring between specific characters

Extract substring between specific characters - python

I have some strings like:
\i{}Agrostis\i0{} <L.>
I would like to get rid of the '\i{}', '\io{}' characters, so that I could get just:
Agrostis <L.>
I've tried the following code (adapted from here):
m = re.search('\i{}(.+?)\i0', item_name)
if m:
name = m.group(1).strip('\\')
else:
name = item_name
It works in part, because when I run it I get just:
Agrostis
without the
<L.>
part (which I want to keep).
Any hints?
Thanks in advance for any assistance you can provide!

Use s.replace('\i{}', '') and s.replace('\io{}', '')

You ca do this in different ways.
The simplest one is to use str.replace
s = '''\i{}Agrostis\i0{} <L.>'''
s2 = s.replace('''\i{}''', '').replace('''\i0{}''', '')
Another way is to use re.sub()

You need to use the re.sub function.
In [34]: import re
In [35]: s = "\i{}Agrostis\i0{} <L.>"
In [36]: re.sub(r'\\i\d*{}', '', s)
Out[36]: 'Agrostis <L.>'

You could use a character class along with re.sub()
import re
regex = r'\\i[\d{}]+'
string = "\i{}Agrostis\i0{} <L.>"
string = re.sub(regex, '', string)
print string
See a demo on ideone.com.

You can either use s.replace('\i{}', '') and s.replace('\io{}', ''), as Julien said, or, continuing with the regex approach, change your pattern to:
re.search('\i{}(.+?)\i0(.++)', item_name)
And use m.group(1).strip('\\') + m.group(2).strip('\\') as the result.

Related

replace before and after a string using re in python

i have string like this 'approved:rakeshc#IAD.GOOGLE.COM'
i would like extract text after ':' and before '#'
in this case the test to be extracted is rakeshc
it can be done using split method - 'approved:rakeshc#IAD.GOOGLE.COM'.split(':')[1].split('#')[0]
but i would want this be done using regular expression.
this is what i have tried so far.
import re
iptext = 'approved:rakeshc#IAD.GOOGLE.COM'
re.sub('^(.*approved:)',"", iptext) --> give everything after ':'
re.sub('(#IAD.GOOGLE.COM)$',"", iptext) --> give everything before'#'
would want to have the result in single expression. expression would be used to replace a string with only the middle string

Here is a regex one-liner:
inp = "approved:rakeshc#IAD.GOOGLE.COM"
output = re.sub(r'^.*:|#.*$', '', inp)
print(output) # rakeshc
The above approach is to strip all text from the start up, and including, the :, as well as to strip all text from # until the end. This leaves behind the email ID.

Use a capture group to copy the part between the matches to the result.
result = re.sub(r'.*approved:(.*)#IAD\.GOOGLE\.COM$', r'\1', iptext)

Hope this works for you:
import re
input_text = "approved:rakeshc#IAD.GOOGLE.COM"
out = re.search(':(.+?)#', input_text)
if out:
found = out.group(1)
print(found)

You can use this one-liner:
re.sub(r'^.*:(\w+)#.*$', r'\1', iptext)
Output:
rakeshc

how to remove comma at the end from the below string in python code

input string
str = "(\"Cardinal\", \"Tom B. Erichsen\", \"Skagen 21\",)"
output string should look like:
("Cardinal", "Tom B. Erichsen", "Skagen 21")
The comma at the end should be removed, help me how to do this in python code.
I tried with str.rstrip(",") it dint work.

You can use some regex for example you can replace (.*),([^,]+)$ with \1\2
result = re.sub(r"(.*),([^,]+)$", r"\1\2", yourstring)
here is a regex demo

Check this code
str = str.replace('",)', '")')

you can chain different str.replace()
str.replace(", )",")").replace(",)",")")
That will work for your string

You can do this in following way
str = "(\"Cardinal\", \"Tom B. Erichsen\", \"Skagen 21\",)"
str = str[:len(str)-2] + str[len(str)-1]

You could use the regex module:
import re
s = "INSERT INTO Customers (CustomerName, ContactName, Address, ) VALUES (\"Cardinal\", \"Tom B. Erichsen\", \"Skagen 21\",)"
print re.sub(r',(\s+)*\)', ')', s)

How to define a re pattern that matches parenthesis in pair or no parenthesis but not single parenthesis using Python?

I want to define a re pattern that matches telephone number like
(514) 123-4567
514-123-4567
But it should not accept the single parenthesis like
(514 123-4567
514) 123-4567
I know I can fix it using stack, but how can I solve it in pure regular expression way? Thanks a lot.

I thing this should to it: (?:(?:\([0-9]*?\))|(?:[0-9]*))*

Use the following regex pattern:
^((?=.*\().*(?=.*\)).*|[^()]+)$
https://regex101.com/r/E1yHVY/3

^(.*(?<=\().*\).*)$|^[^\(\)]+$
This one does not check phone numbers specifically, but applies your parenthesis rules to any string.
https://regex101.com/r/7RhcTq/1

You can try this:
import re
s1 = "(514) 123-4567"
s2 = "514-123-4567"
a = [s1, s2]
numbers = [re.findall("(\(\d{3}\)\s\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})", i) for i in a]
final_numbers = [[b for b in i if b][0][0] for i in numbers if i]

In Python how to strip dollar signs and commas from dollar related fields only

I'm reading in a large text file with lots of columns, dollar related and not, and I'm trying to figure out how to strip the dollar fields ONLY of $ and , characters.
so say I have:
a|b|c
$1,000|hi,you|$45.43
$300.03|$MS2|$55,000
where a and c are dollar-fields and b is not.
The output needs to be:
a|b|c
1000|hi,you|45.43
300.03|$MS2|55000
I was thinking that regex would be the way to go, but I can't figure out how to express the replacement:
f=open('sample1_fixed.txt','wb')
for line in open('sample1.txt', 'rb'):
new_line = re.sub(r'(\$\d+([,\.]\d+)?k?)',????, line)
f.write(new_line)
f.close()
Anyone have an idea?
Thanks in advance.

Unless you are really tied to the idea of using a regex, I would suggest doing something simple, straight-forward, and generally easy to read:
def convert_money(inval):
if inval[0] == '$':
test_val = inval[1:].replace(",", "")
try:
_ = float(test_val)
except:
pass
else:
inval = test_val
return inval
def convert_string(s):
return "|".join(map(convert_money, s.split("|")))
a = '$1,000|hi,you|$45.43'
b = '$300.03|$MS2|$55,000'
print convert_string(a)
print convert_string(b)
OUTPUT
1000|hi,you|45.43
300.03|$MS2|55000

A simple approach:
>>> import re
>>> exp = '\$\d+(,|\.)?\d+'
>>> s = '$1,000|hi,you|$45.43'
>>> '|'.join(i.translate(None, '$,') if re.match(exp, i) else i for i in s.split('|'))
'1000|hi,you|45.43'

It sounds like you are addressing the entire line of text at once. I think your first task would be to break up your string by columns into an array or some other variables. Once you've don that, your solution for converting strings of currency into numbers doesn't have to worry about the other fields.
Once you've done that, I think there is probably an easier way to do this task than with regular expressions. You could start with this SO question.
If you really want to use regex though, then this pattern should work for you:
\[$,]\g
Demo on regex101
Replace matches with empty strings. The pattern gets a little more complicated if you have other kinds of currency present.

I Try this regex take if necessary.
\$(\d+)[\,]*([\.]*\d*)
SEE DEMO : http://regex101.com/r/wM0zB6/2

Use the regexx
((?<=\d),(?=\d))|(\$(?=\d))
eg
import re
>>> x="$1,000|hi,you|$45.43"
re.sub( r'((?<=\d),(?=\d))|(\$(?=\d))', r'', x)
'1000|hi,you|45.43'

Try the below regex and then replace the matched strings with \1\2\3
\$(\d+(?:\.\d+)?)(?:(?:,(\d{2}))*(?:,(\d{3})))?
DEMO

Defining a black list and checking if the characters are in it, is an easy way to do this:
blacklist = ("$", ",") # define characters to remove
with open('sample1_fixed.txt','wb') as f:
for line in open('sample1.txt', 'rb'):
clean_line = "".join(c for c in line if c not in blacklist)
f.write(clean_line)

\$(?=(?:[^|]+,)|(?:[^|]+\.))
Try this.Replace with empty string.Use re.M option.See demo.
http://regex101.com/r/gT6kI4/6

Using regular expression to extract string

I need to extract the IP address from the following string.
>>> mydns='ec2-54-196-170-182.compute-1.amazonaws.com'
The text to the left of the dot needs to be returned. The following works as expected.
>>> mydns[:18]
'ec2-54-196-170-182'
But it does not work in all cases. For e.g.
mydns='ec2-666-777-888-999.compute-1.amazonaws.com'
>>> mydns[:18]
'ec2-666-777-888-99'
How to I use regular expressions in python?

No need for regex... Just use str.split
mydns.split('.', 1)[0]
Demo:
>>> mydns='ec2-666-777-888-999.compute-1.amazonaws.com'
>>> mydns.split('.', 1)[0]
'ec2-666-777-888-999'

If you wanted to use regex for this:
Regex String
ec2-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*
Alternative (EC2 Agnostic):
.*\b([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*
Replacement String
Regular: \1.\2.\3.\4
Reverse: \4.\3.\2.\1
Python code
import re
subject = 'ec2-54-196-170-182.compute-1.amazonaws.com'
result = re.sub("ec2-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*", r"\1.\2.\3.\4", subject)
print result

This regex will match (^[^.]+:
So Try this:
import re
string = "ec2-54-196-170-182.compute-1.amazonaws.com"
ip = re.findall('^[^.]+',string)[0]
print ip
Output:
ec2-54-196-170-182
Best thing is this will match even if the instance was ec2,ec3 so this regex is actually very much similar to the code of #mgilson

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extract substring between specific characters - python

Use s.replace('\i{}', '') and s.replace('\io{}', '')

You ca do this in different ways. The simplest one is to use str.replace s = '''\i{}Agrostis\i0{} <L.>''' s2 = s.replace('''\i{}''', '').replace('''\i0{}''', '') Another way is to use re.sub()

You need to use the re.sub function. In [34]: import re In [35]: s = "\i{}Agrostis\i0{} <L.>" In [36]: re.sub(r'\\i\d*{}', '', s) Out[36]: 'Agrostis <L.>'

You could use a character class along with re.sub() import re regex = r'\\i[\d{}]+' string = "\i{}Agrostis\i0{} <L.>" string = re.sub(regex, '', string) print string See a demo on ideone.com.

You can either use s.replace('\i{}', '') and s.replace('\io{}', ''), as Julien said, or, continuing with the regex approach, change your pattern to: re.search('\i{}(.+?)\i0(.++)', item_name) And use m.group(1).strip('\\') + m.group(2).strip('\\') as the result.

Related

replace before and after a string using re in python

how to remove comma at the end from the below string in python code

How to define a re pattern that matches parenthesis in pair or no parenthesis but not single parenthesis using Python?

In Python how to strip dollar signs and commas from dollar related fields only

Using regular expression to extract string

Categories

Resources