i have
string = 'Server:xxx-zzzzzzzzz.eeeeeeeeeee.frPIPELININGSIZE'
i need a python regex expression to identify xxx-zzzzzzzzz.eeeeeeeeeee.fr to do a sub-string function to it
Expected output :
string : 'Server:PIPELININGSIZE'
the URL is inside a string, i tried a lot of regex expressions
Not sure if this helps, because your question was quite vaguely formulated. :)
import re
string = 'Server:xxx-zzzzzzzzz.eeeeeeeeeee.frPIPELININGSIZE'
string_1 = re.search('[a-z.-]+([A-Z]+)', string).group(1)
print(f'string: Server:{string_1}')
Output:
string: Server:PIPELININGSIZE
No regex. single line use just to split on your target word.
string = 'Server:xxx-zzzzzzzzz.eeeeeeeeeee.frPIPELININGSIZE'
last = string.split("fr",1)[1]
first =string[:string.index(":")]
print(f'{first} : {last}')
Gives #
Server:PIPELININGSIZE
The wording of the question suggests that you wish to find the hostname in the string, but the expected output suggests that you want to remove it. The following regular expression will create a tuple and allow you to do either.
import re
str = "Server:xxx-zzzzzzzzz.eeeeeeeeeee.frPIPELININGSIZE"
p = re.compile('^([A-Za-z]+[:])(.*?)([A-Z]+)$')
m = re.search(p, str)
result = m.groups()
# ('Server:', 'xxx-zzzzzzzzz.eeeeeeeeeee.fr', 'PIPELININGSIZE')
Remove the hostname:
print(f'{result[0]} {result[2]}')
# Output: 'Server: PIPELININGSIZE'
Extract the hostname:
print(result[1])
# Output: 'xxx-zzzzzzzzz.eeeeeeeeeee.fr'
Related
I have a string like this
ABC/AAAA DEF/78kkk OBJ/89KKK KLE/67899
and I pass the substring to find and replace after. so If I pass DEF/00012 and the original string
should be replaced as like this
ABC/AAAA DEF/00012 OBJ/89KKK KLE/67899
I have tried with string.replace('DEF', 'DEF/00012')
I would get the output as
ABC/AAAA DEF/00012/78kkk OBJ/89KKK KLE/67899
any suggestions would be highly appreciated.
Thanks
I would do:
txt = 'ABC/AAAA DEF/78kkk OBJ/89KKK KLE/67899'
change = 'DEF'
changeto = 'DEF/00012'
newtxt = ' '.join(changeto if i.startswith(change) else i for i in txt.split(' '))
print(newtxt)
Output:
ABC/AAAA DEF/00012 OBJ/89KKK KLE/67899
I splitted at spaces and changed part beginning with DEF
string.replace('DEF/78kkk', 'DEF/00012')
If you mean by "substring" is that the succeeding characters after "DEF" is not fixed to a specific value, use regular expressions instead.
result = re.sub("DEF/\w+", "DEF/00012", string)
Assuming there really is a blank space after every "substring" you will have to use re:
import re
your_string = re.sub("DEF/*$", "DEF/00012", your_string)
input string
str = "(\"Cardinal\", \"Tom B. Erichsen\", \"Skagen 21\",)"
output string should look like:
("Cardinal", "Tom B. Erichsen", "Skagen 21")
The comma at the end should be removed, help me how to do this in python code.
I tried with str.rstrip(",") it dint work.
You can use some regex for example you can replace (.*),([^,]+)$ with \1\2
result = re.sub(r"(.*),([^,]+)$", r"\1\2", yourstring)
here is a regex demo
Check this code
str = str.replace('",)', '")')
you can chain different str.replace()
str.replace(", )",")").replace(",)",")")
That will work for your string
You can do this in following way
str = "(\"Cardinal\", \"Tom B. Erichsen\", \"Skagen 21\",)"
str = str[:len(str)-2] + str[len(str)-1]
You could use the regex module:
import re
s = "INSERT INTO Customers (CustomerName, ContactName, Address, ) VALUES (\"Cardinal\", \"Tom B. Erichsen\", \"Skagen 21\",)"
print re.sub(r',(\s+)*\)', ')', s)
I have a input text like this (actual text file contains tons of garbage characters surrounding these 2 string too.)
(random_garbage_char_here)**value=xxx**;(random_garbage_char_here)**value=yyy**;(random_garbage_char_here)
I am trying to parse the text to store something like this:
value1="xxx" and value2="yyy".
I wrote python code as follows:
value1_start = content.find('value')
value1_end = content.find(';', value1_start)
value2_start = content.find('value')
value2_end = content.find(';', value2_start)
print "%s" %(content[value1_start:value1_end])
print "%s" %(content[value2_start:value2_end])
But it always returns:
value=xxx
value=xxx
Could anyone tell me how can I parse the text so that the output is:
value=xxx
value=yyy
Use a regex approach:
re.findall(r'\bvalue=[^;]*', s)
Or - if value can be any 1+ word (letter/digit/underscore) chars:
re.findall(r'\b\w+=[^;]*', s)
See the regex demo
Details:
\b - word boundary
value= - a literal char sequence value=
[^;]* - zero or more chars other than ;.
See the Python demo:
import re
rx = re.compile(r"\bvalue=[^;]*")
s = "$%$%&^(&value=xxx;$%^$%^$&^%^*value=yyy;%$#^%"
res = rx.findall(s)
print(res)
Use regex to filter the data you want from the "junk characters":
>>> import re
>>> _input = '#4#5%value=xxx38u952035983049;3^&^*(^%$3value=yyy#%$#^&*^%;$#%$#^'
>>> matches = re.findall(r'[a-zA-Z0-9]+=[a-zA-Z0-9]+', _input)
>>> matches
['value=xxx', 'value=yyy']
>>> for match in matches:
print(match)
value=xxx
value=yyy
>>>
Summary or the regular expression:
[a-zA-Z0-9]+: One or more alphanumeric characters
=: literal equal sign
[a-zA-Z0-9]+: One or more alphanumeric characters
For this input:
content = '(random_garbage_char_here)**value=xxx**;(random_garbage_char_here)**value=yyy**;(random_garbage_char_here)'
use a simple regex and manually strip off the first and last two characters:
import re
values = [x[2:-2] for x in re.findall(r'\*\*value=.*?\*\*', content)]
for value in values:
print(value)
Output:
value=xxx
value=yyy
Here the assumption is that there are always two leading and two trailing * as in **value=xxx**.
You already have good answers based on the re module. That would certainly be the simplest way.
If for any reason (perfs?) you prefere to use str methods, it is indeed possible. But you must search the second string past the end of the first one :
value2_start = content.find('value', value1_end)
value2_end = content.find(';', value2_start)
I'm trying to get certain results out of the response from Blogger. I wanna get my blog names. How would I go about something like that with Regex? I've tried Googling my issue but none of the answers helped me in my case unfortunately.
So my response looks something like this:
\\x22http://emyblog.blogspot.com/
So it's always starting with the \\x22http:// and ending with .blogspot.com/
I've tried the following re:
regEx = re.findall(b"""\x22http://(.*)\.blogspot\.com""", r)
But unfortunately it returned an empty list. Any idea's on how to solve this problem?
Thanks,
Use a raw string, otherwise \\x22 is interpreted as the character " instead of a literal string. Not sure that the re.findall method is the good method, re.search should suffice.
Assuming your byte-string is:
>>> r = rb'\\x22http://emyblog.blogspot.com/'
With byte-strings:
>>> res = re.search(rb'\\x22http://(.*)\.blogspot\.com/', r)
>>> res.group(1)
b'emyblog'
With normal strings:
>>> res = re.search(r'\\\\x22http://(.*)\.blogspot\.com/', r.decode('utf-8'))
>>> res.group(1)
'emyblog'
use r'' (string is taken as raw string literal) instead of b''
import re
pattern = re.compile(r'\x22http://(.*)\.blogspot\.com')
match = pattern.match('\x22http://emyblog.blogspot.com/')
match.group(1)
# 'emyblog'
This seems to be working!
import re
text = "\x22http://emyblog.blogspot.com/"
regex = re.compile('\x22http://(.*)\.blogspot\.com')
print regex.findall(text)
I have the following line :
CommonSettingsMandatory = #<Import Project="[\\.]*Shared(\\vc10\\|\\)CommonSettings\.targets," />#,true
and i want the following output:
['commonsettingsmandatory', '<Import Project="[\\\\.]*Shared(\\\\vc10\\\\|\\\\)CommonSettings\\.targets," />', 'true'
If i do a simple regex with the comma, it will split the value if there's a value in it, like i wrote a comma after targets, it will split here.
So i want to ignore the text between the ## to make sure there's no splitting there.
I really don't know how to do!
http://docs.python.org/library/re.html#re.split
import re
string = 'CommonSettingsMandatory = #toto,tata#, true'
splitlist = re.split('\s?=\s?#(.*?)#,\s?', string)
Then splitlist contains ['CommonSettingsMandatory', 'toto,tata', 'true'].
While you might be able to use split with a lookbehind, I would use the groups captured by this expression.
(\S+)\s*=\s*##([^#]+)##,\s*(.*)
m = re.Search(expression, myString). use m.group(1) for the first string, m.group(2) for the second, etc.
If I understand you correctly, you're trying to split the string using spaces as delimiters, but you want to also remove any text between pound signs?
If that's correct, why not simply remove the pound sign-delimited text before splitting the string?
import re
myString = re.sub(r'#.*?#', '', myString)
myArray = myString.split(' ')
EDIT: (based on revised question)
import re
myArray = re.findall(r'^(.*?) = #(.*?)#,(.*?)$', myString)
That will actually return an array of tuples including your matches, in the form of:
[
(
'commonsettingsmandatory',
'<Import Project="[\\\\.]*Shared(\\\\vc10\\\\|\\\\)CommonSettings\\.targets," />',
'true'
)
]
(spacing added to illustrate the format better)