Python regular words cut

Python regular words cut - python

I have string: './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
I need string: '27-10-2011 17:07:02'
How can i do this in python?

There are many ways to do this, one way is to use str.partition:
text='./money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
before,_,after = text.partition('[')
print(after[:-1])
# 27-10-2011 17:07:02
Another is to use str.split:
before,after = text.split('[',1)
print(after[:-1])
# 27-10-2011 17:07:02
or str.find and str.rfind:
ind1 = text.find('[')+1
ind2 = text.rfind(']')
print(text[ind1:ind2])
All these methods rely on the desired substring immediately following the first left-bracket [.
The first two methods also rely on the desired substring ending at the next-to-last character in text. The last method (using rfind) searches from the right for the index of the right-bracket, so it is a little more general, and does not depend on quite so many (potential off-by-one) constants.

If your string has always the same structure this is probably the simplest solution:
s = r'./money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
s[s.find("[")+1:s.find("]")]
Update:
After seeing some of the other answers this is a slight improvement:
s[s.find("[")+1:-1]
Exploiting the fact that the closing square bracket is the last character in your string.

If the format is "fixed", you can also use this
>>> s = './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
>>> s[-20:-1:]
'27-10-2011 17:07:02'
>>>

You can also use regular expression:
import re
s = './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
print re.search(r'\[(.*?)\]', s).group(1)

Try with a regex :
import re
re.findall(".*\[(.*)\]", './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]')
>>> ['27-10-2011 17:07:02']

Probably the easiest way(if you know the string will always be in this format
>>> s = './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
>>> s[s.index('[') + 1:-1]
'27-10-2011 17:07:02'

Related

Python3: Replace splitted string

This is my string:
VISA1240129006|36283354A|081016860665
I need to replace first string.
FIXED_REPLACED_STRING|36283354A|081016860665
I mean, I for example, I need to get next string:
Is there any elegant way to get it using python3?

You can do this way:
>>> l = 'VISA1240129006|36283354A|081016860665'.split('|')
>>> l[0] = 'FIXED_REPLACED_STRING'
>>> l
['FIXED_REPLACED_STRING', '36283354A', '081016860665']
>>> '|'.join(l)
'FIXED_REPLACED_STRING|36283354A|081016860665'
Explanation: first, you split a string into a list. Then, you change what you need in the position(s) you want. Finally, you rebuild the string from such a modified list.
If you need a complete replacement of all the occurrences regardless of their position, check out also the other answers here.

You can use the .replace() method:
l="VISA1240129006|36283354A|081016860665"
l=l.replace("VISA1240129006","FIXED_REPLACED_STRING")

You can use re.sub() from regex library. See similar problem with yours. replace string
My solution using regex is:
import re
l="VISA1240129006|36283354A|081016860665"
new_l = re.sub('^(\w+|)',r'FIXED_REPLACED_STRING',l)
It replaces first string before "|" character

Why is the split() returning list objects that are empty? [duplicate]

I have the following file names that exhibit this pattern:
000014_L_20111007T084734-20111008T023142.txt
000014_U_20111007T084734-20111008T023142.txt
...
I want to extract the middle two time stamp parts after the second underscore '_' and before '.txt'. So I used the following Python regex string split:
time_info = re.split('^[0-9]+_[LU]_|-|\.txt$', f)
But this gives me two extra empty strings in the returned list:
time_info=['', '20111007T084734', '20111008T023142', '']
How do I get only the two time stamp information? i.e. I want:
time_info=['20111007T084734', '20111008T023142']

I'm no Python expert but maybe you could just remove the empty strings from your list?
str_list = re.split('^[0-9]+_[LU]_|-|\.txt$', f)
time_info = filter(None, str_list)

Don't use re.split(), use the groups() method of regex Match/SRE_Match objects.
>>> f = '000014_L_20111007T084734-20111008T023142.txt'
>>> time_info = re.search(r'[LU]_(\w+)-(\w+)\.', f).groups()
>>> time_info
('20111007T084734', '20111008T023142')
You can even name the capturing groups and retrieve them in a dict, though you use groupdict() rather than groups() for that. (The regex pattern for such a case would be something like r'[LU]_(?P<groupA>\w+)-(?P<groupB>\w+)\.')

If the timestamps are always after the second _ then you can use str.split and str.strip:
>>> strs = "000014_L_20111007T084734-20111008T023142.txt"
>>> strs.strip(".txt").split("_",2)[-1].split("-")
['20111007T084734', '20111008T023142']

Since this came up on google and for completeness, try using re.findall as an alternative!
This does require a little re-thinking, but it still returns a list of matches like split does. This makes it a nice drop-in replacement for some existing code and gets rid of the unwanted text. Pair it with lookaheads and/or lookbehinds and you get very similar behavior.
Yes, this is a bit of a "you're asking the wrong question" answer and doesn't use re.split(). It does solve the underlying issue- your list of matches suddenly have zero-length strings in it and you don't want that.

>>> f='000014_L_20111007T084734-20111008T023142.txt'
>>> f[10:-4].split('-')
['0111007T084734', '20111008T023142']
or, somewhat more general:
>>> f[f.rfind('_')+1:-4].split('-')
['20111007T084734', '20111008T023142']

Complex regex in Python

I am trying to write a generic pattern using regex so that it fetches only particular things from the string. Let's say we have strings like GigabitEthernet0/0/0/0 or FastEthernet0/4 or Ethernet0/0.222. The regex should fetch the first 2 characters and all the numerals. Therefore, the fetched result should be something like Gi0000 or Fa04 or Et00222 depending on the above cases.
x = 'GigabitEthernet0/0/0/2
m = re.search('([\w+]{2}?)[\\\.(\d+)]{0,}',x)
I am not able to understand how shall I write the regular expression. The values can be fetched in the form of a list also. I write few more patterns but it isn't helping.

In regex, you may use re.findall function.
>>> import re
>>> s = 'GigabitEthernet0/0/0/0 '
>>> s[:2]+''.join(re.findall(r'\d', s))
'Gi0000'
OR
>>> ''.join(re.findall(r'^..|\d', s))
'Gi0000'
>>> ''.join(re.findall(r'^..|\d', 'Ethernet0/0.222'))
'Et00222'
OR
>>> s = 'GigabitEthernet0/0/0/0 '
>>> s[:2]+''.join([i for i in s if i.isdigit()])
'Gi0000'

z="Ethernet0/0.222."
print z[:2]+"".join(re.findall(r"(\d+)(?=[\d\W]*$)",z))
You can try this.This will make sure only digits from end come into play .

Here is another option:
s = 'Ethernet0/0.222'
"".join(re.findall('^\w{2}|[\d]+', s))

Using regular expression to extract string

I need to extract the IP address from the following string.
>>> mydns='ec2-54-196-170-182.compute-1.amazonaws.com'
The text to the left of the dot needs to be returned. The following works as expected.
>>> mydns[:18]
'ec2-54-196-170-182'
But it does not work in all cases. For e.g.
mydns='ec2-666-777-888-999.compute-1.amazonaws.com'
>>> mydns[:18]
'ec2-666-777-888-99'
How to I use regular expressions in python?

No need for regex... Just use str.split
mydns.split('.', 1)[0]
Demo:
>>> mydns='ec2-666-777-888-999.compute-1.amazonaws.com'
>>> mydns.split('.', 1)[0]
'ec2-666-777-888-999'

If you wanted to use regex for this:
Regex String
ec2-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*
Alternative (EC2 Agnostic):
.*\b([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*
Replacement String
Regular: \1.\2.\3.\4
Reverse: \4.\3.\2.\1
Python code
import re
subject = 'ec2-54-196-170-182.compute-1.amazonaws.com'
result = re.sub("ec2-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*", r"\1.\2.\3.\4", subject)
print result

This regex will match (^[^.]+:
So Try this:
import re
string = "ec2-54-196-170-182.compute-1.amazonaws.com"
ip = re.findall('^[^.]+',string)[0]
print ip
Output:
ec2-54-196-170-182
Best thing is this will match even if the instance was ec2,ec3 so this regex is actually very much similar to the code of #mgilson

regex and replace on string using python

I am rather new to Python Regex (regex in general) and I have been encountering a problem. So, I have a few strings like so:
str1 = r'''hfo/gfbi/mytag=a_17014b_82c'''
str2 = r'''/bkyhi/oiukj/game/?mytag=a_17014b_82c&'''
str3 = r'''lkjsd/image/game/mytag=a_17014b_82c$'''
the & and the $ could be any symbol.
I would like to have a single regex (and replace) which replaces:
mytag=a_17014b_82c
to:
mytag=myvalue
from any of the above 3 strings. Would appreciate any guidance on how I can achieve this.
UPDATE: the string to be replaced is always not the same. So, a_17014b_82c could be anything in reality.

If the string to be replaced is constant you don't need a regex. Simply use replace:
>>> str1 = r'''hfo/gfbi/mytag=a_17014b_82c'''
>>> str1.replace('a_17014b_82c','myvalue')
'hfo/gfbi/mytag=myvalue'

Use re.sub:
>>> import re
>>> r = re.compile(r'(mytag=)(\w+)')
>>> r.sub(r'\1myvalue', str1)
'hfo/gfbi/mytag=myvalue'
>>> r.sub(r'\1myvalue', str2)
'/bkyhi/oiukj/game/?mytag=myvalue&'
>>> r.sub(r'\1myvalue', str3)
'lkjsd/image/game/mytag=myvalue$'

import re
r = re.compile(r'(mytag=)\w+$')
r.sub(r'\1myvalue', str1)
This is based on #Ashwini's answer, two small changes are we are saying the mytag=a_17014b part should be at the end of input, so that even inputs such as
str1 = r'''/bkyhi/mytag=blah/game/?mytag=a_17014b_82c&'''
will work fine, substituting the last mytag instead of the the first.
Another small change is we are not unnecessarily capturing the \w+, since we aren't using it anyway. This is just for a bit of code clarity.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python regular words cut - python

I have string: './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]' I need string: '27-10-2011 17:07:02' How can i do this in python?

If the format is "fixed", you can also use this >>> s = './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]' >>> s[-20:-1:] '27-10-2011 17:07:02' >>>

You can also use regular expression: import re s = './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]' print re.search(r'\[(.*?)\]', s).group(1)

Try with a regex : import re re.findall(".\[(.)\]", './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]') >>> ['27-10-2011 17:07:02']

Probably the easiest way(if you know the string will always be in this format >>> s = './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]' >>> s[s.index('[') + 1:-1] '27-10-2011 17:07:02'

Related

Python3: Replace splitted string

Why is the split() returning list objects that are empty? [duplicate]

Complex regex in Python

Using regular expression to extract string

regex and replace on string using python

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python regular words cut - python

I have string: './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]' I need string: '27-10-2011 17:07:02' How can i do this in python?

If the format is "fixed", you can also use this >>> s = './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]' >>> s[-20:-1:] '27-10-2011 17:07:02' >>>

You can also use regular expression: import re s = './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]' print re.search(r'\[(.*?)\]', s).group(1)

Try with a regex : import re re.findall(".*\[(.*)\]", './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]') >>> ['27-10-2011 17:07:02']

Probably the easiest way(if you know the string will always be in this format >>> s = './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]' >>> s[s.index('[') + 1:-1] '27-10-2011 17:07:02'

Related

Python3: Replace splitted string

Why is the split() returning list objects that are empty? [duplicate]

Complex regex in Python

Using regular expression to extract string

regex and replace on string using python

Categories

Resources

Try with a regex : import re re.findall(".\[(.)\]", './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]') >>> ['27-10-2011 17:07:02']