string= "im fine.gds how are you"
if '.gds' or '.cdl' in string :
a=string.split("????????")
the above string may contain .gds or .cdl extension. I want to split the string based on the extension.
here how the parameters can be passed to split function.(EX if .gds is present in string then it should take as split(".gds")
if .cdl is there in string then it should get split(".cdl"))
I think you have to split the if statements:
if '.gds' in string:
a = string.split('.gds')
elif '.cdl' in string:
a = string.split('.cdl')
else:
a = string # this is a fallback in case none of the patterns is in the string
Furthermore, your in statement is incorrect; it should have been
if '.gds' in string or '.cdl' in string:
Note that this solution assumes that only one of the patterns will be in the string. If both patterns can occur on the same string, see Vikas's answer.
Use regular expression module re to split either by pattern1 or pattern2
import re
re.split('\.gds|\.cdl', your_string)
Example:
>>> re.split('\.gds|\.cdl', "im fine.gds how are you")
['im fine', ' how are you']
>>> re.split('\.gds|\.cdl', "im fine.cdl how are you")
['im fine', ' how are you']
>>> re.split('\.gds|\.cdl', "im fine.cdl how are.gds you")
['im fine', ' how are', ' you']
You can try to define a function like:
def split_on_extensions(string, *extensions):
for ext in extensions:
if ext in string:
return string.split(ext)
return string
Of course, the order in which you give the extensions is critical, as you'll split on the first one...
Are you guaranteed that one of the two of them will be there?
a = next( string.split(v) for v in ('.gds','.cdl') if v in string )
If you're not positive it will be there, you can catch the StopIteration that is raised in next:
try:
a = next( string.split(v) for v in ('.gds','.cdl') if v in string )
except StopIteration:
a = string #????
Another option is to use the BIF str.partition. This is how it works:
sring= "im fine.gds how are you"
three_parts_of_sring = sring.partition('.gds')
>>> three_parts_of_sring
('im fine', '.gds', ' how are you')
Put it into a little function and your set.
The tags is captured into the first backreference. The question mark in the regex makes the star lazy, to make sure it stops before the first closing tag, rather than before the last, like a greedy star would do.
This regex will not properly match tags nested inside themselves, like in <TAG>one<TAG>two</TAG>one</TAG>.
You can iter separators:
string= "im fine.gds how are you"
separators = ['.gds', '.cdl']
for separator in separators:
if separator in string:
a = string.split(separator)
break
else:
a = []
Related
I'm not familiar with regex, and it would be great to have some help.
I have a string: string = "You get #{turns} guesses."
and I would like to remove #{turns} in order to have string = "You get guesses."
I've tried it with:
string = re.sub(re.compile('#{.*?}'),"",string)
Any suggestions?
For this specific question you can also do it like so:
import re
string = "You get #{turns} guesses."
re.sub(r'#\S+ ', r'', string)
Output:
'You get guesses.'
Regex:
'#\S+ ' Match # and match as many non space characters and a single space.
Your code works, except that it does not remove a sufficient amount of spaces and that compilation is rather useless if you only use it once:
>>> string = "You get #{turns} guesses."
>>> string = re.sub(re.compile('#{.*?}'),"",string)
>>> string
'You get guesses.'
So you probably want to compile the regex once, and then use it, and you better alter it to - for instance - remove tailing spaces:
rgx = re.compile('#{.*?}\s*')
string = rgx.sub('',string)
Note the \s* which will match with an arbitrary amount of spaces after the tag, and thus remove these as well:
>>> string = "You get #{turns} guesses."
>>> rgx = re.compile('#{.*?}\s*')
>>> string = rgx.sub('',string)
>>> string
'You get guesses.'
In case it is one word between the curly brackets ({}), you better use \w to exclude spaces:
rgx = re.compile('#{\w*}\s*')
I have to write a code to do 2 things:
Compress more than one occurrence of the space character into one.
Add a space after a period, if there isn't one.
For example:
input> This is weird.Indeed
output>This is weird. Indeed.
This is the code I wrote:
def correction(string):
list=[]
for i in string:
if i!=" ":
list.append(i)
elif i==" ":
k=i+1
if k==" ":
k=""
list.append(i)
s=' '.join(list)
return s
strn=input("Enter the string: ").split()
print (correction(strn))
This code takes any input by the user and removes all the extra spaces,but it's not adding the space after the period(I know why not,because of the split function it's taking the period and the next word with it as one word, I just can't figure how to fix it)
This is a code I found online:
import re
def correction2(string):
corstr = re.sub('\ +',' ',string)
final = re.sub('\.','. ',corstr)
return final
strn= ("This is as .Indeed")
print (correction2(strn))
The problem with this code is I can't take any input from the user. It is predefined in the program.
So can anyone suggest how to improve any of the two codes to do both the functions on ANY input by the user?
Is this what you desire?
import re
def corr(s):
return re.sub(r'\.(?! )', '. ', re.sub(r' +', ' ', s))
s = input("> ")
print(corr(s))
I've changed the regex to a lookahead pattern, take a look here.
Edit: explain Regex as requested in comment
re.sub() takes (at least) three arguments: The Regex search pattern, the replacement the matched pattern should be replaced with, and the string in which the replacement should be done.
What I'm doing here is two steps at once, I've been using the output of one function as input of another.
First, the inner re.sub(r' +', ' ', s) searches for multiple spaces (r' +') in s to replace them with single spaces. Then the outer re.sub(r'\.(?! )', '. ', ...) looks for periods without following space character to replace them with '. '. I'm using a negative lookahead pattern to match only sections, that don't match the specified lookahead pattern (a normal space character in this case). You may want to play around with this pattern, this may help understanding it better.
The r string prefix changes the string to a raw string where backslash-escaping is disabled. Unnecessary in this case, but it's a habit of mine to use raw strings with regular expressions.
For a more basic answer, without regex:
>>> def remove_doublespace(string):
... if ' ' not in string:
... return string
... return remove_doublespace(string.replace(' ',' '))
...
>>> remove_doublespace('hi there how are you.i am fine. '.replace('.', '. '))
'hi there how are you. i am fine. '
You try the following code:
>>> s = 'This is weird.Indeed'
>>> def correction(s):
res = re.sub('\s+$', '', re.sub('\s+', ' ', re.sub('\.', '. ', s)))
if res[-1] != '.':
res += '.'
return res
>>> print correction(s)
This is weird. Indeed.
>>> s=raw_input()
hee ss.dk
>>> s
'hee ss.dk'
>>> correction(s)
'hee ss. dk.'
I am working with a text where all "\n"s have been deleted (which merges two words into one, like "I like bananasAnd this is a new line.And another one.") What I would like to do now is tell Python to look for combinations of a small letter followed by capital letter/punctuation followed by capital letter and insert a whitespace.
I thought this would be easy with reg. expressions, but it is not - I couldnt find an "insert" function or anything, and the string commands seem not to be helpful either. How do I do this?
Any help would be greatly appreciated, I am despairing over here...
Thanks, patrick
Try the following:
re.sub(r"([a-z\.!?])([A-Z])", r"\1 \2", your_string)
For example:
import re
lines = "I like bananasAnd this is a new line.And another one."
print re.sub(r"([a-z\.!?])([A-Z])", r"\1 \2", lines)
# I like bananas And this is a new line. And another one.
If you want to insert a newline instead of a space, change the replacement to r"\1\n\2".
Using re.sub you should be able to make a pattern that grabs a lowercase and uppercase letter and substitutes them for the same two letters, but with a space in between:
import re
re.sub(r'([a-z][.?]?)([A-Z])', '\\1\n\\2', mystring)
You're looking for the sub function. See http://docs.python.org/library/re.html for documentation.
Hmm, interesting. You can use regular expressions to replace text with the sub() function:
>>> import re
>>> string = 'fooBar'
>>> re.sub(r'([a-z][.!?]*)([A-Z])', r'\1 \2', string)
'foo Bar'
If you really don't have any caps except at the beginning of a sentence, it will probably be easiest to just loop through the string.
>>> import string
>>> s = "a word endsA new sentence"
>>> lastend = 0
>>> sentences = list()
>>> for i in range(0, len(s)):
... if s[i] in string.uppercase:
... sentences.append(s[lastend:i])
... lastend = i
>>> sentences.append(s[lastend:])
>>> print sentences
['a word ends', 'A new sentence']
Here's another approach, which avoids regular expressions and does not use any imported libraries, just built-ins...
s = "I like bananasAnd this is a new line.And another one."
with_whitespace = ''
last_was_upper = True
for c in s:
if c.isupper():
if not last_was_upper:
with_whitespace += ' '
last_was_upper = True
else:
last_was_upper = False
with_whitespace += c
print with_whitespace
Yields:
I like bananas And this is a new line. And another one.
I have a list of regexes in python, and a string. Is there an elegant way to check if the at least one regex in the list matches the string? By elegant, I mean something better than simply looping through all of the regexes and checking them against the string and stopping if a match is found.
Basically, I had this code:
list = ['something','another','thing','hello']
string = 'hi'
if string in list:
pass # do something
else:
pass # do something else
Now I would like to have some regular expressions in the list, rather than just strings, and I am wondering if there is an elegant solution to check for a match to replace if string in list:.
import re
regexes = [
"foo.*",
"bar.*",
"qu*x"
]
# Make a regex that matches if any of our regexes match.
combined = "(" + ")|(".join(regexes) + ")"
if re.match(combined, mystring):
print "Some regex matched!"
import re
regexes = [
# your regexes here
re.compile('hi'),
# re.compile(...),
# re.compile(...),
# re.compile(...),
]
mystring = 'hi'
if any(regex.match(mystring) for regex in regexes):
print 'Some regex matched!'
Here's what I went for based on the other answers:
raw_list = ["some_regex","some_regex","some_regex","some_regex"]
reg_list = map(re.compile, raw_list)
mystring = "some_string"
if any(regex.match(mystring) for regex in reg_list):
print("matched")
A mix of both Ned's and Nosklo's answers. Works guaranteed for any length of list... hope you enjoy
import re
raw_lst = ["foo.*",
"bar.*",
"(Spam.{0,3}){1,3}"]
reg_lst = []
for raw_regex in raw_lst:
reg_lst.append(re.compile(raw_regex))
mystring = "Spam, Spam, Spam!"
if any(compiled_reg.match(mystring) for compiled_reg in reg_lst):
print("something matched")
If you loop over the strings, the time complexity would be O(n). A better approach would be combine those regexes as a regex-trie.
I've got a file whose format I'm altering via a python script. I have several camel cased strings in this file where I just want to insert a single space before the capital letter - so "WordWordWord" becomes "Word Word Word".
My limited regex experience just stalled out on me - can someone think of a decent regex to do this, or (better yet) is there a more pythonic way to do this that I'm missing?
You could try:
>>> re.sub(r"(\w)([A-Z])", r"\1 \2", "WordWordWord")
'Word Word Word'
If there are consecutive capitals, then Gregs result could
not be what you look for, since the \w consumes the caracter
in front of the captial letter to be replaced.
>>> re.sub(r"(\w)([A-Z])", r"\1 \2", "WordWordWWWWWWWord")
'Word Word WW WW WW Word'
A look-behind would solve this:
>>> re.sub(r"(?<=\w)([A-Z])", r" \1", "WordWordWWWWWWWord")
'Word Word W W W W W W Word'
Perhaps shorter:
>>> re.sub(r"\B([A-Z])", r" \1", "DoIThinkThisIsABetterAnswer?")
Have a look at my answer on .NET - How can you split a “caps” delimited string into an array?
Edit: Maybe better to include it here.
re.sub(r'([a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z]))', r'\1 ', text)
For example:
"SimpleHTTPServer" => ["Simple", "HTTP", "Server"]
Maybe you would be interested in one-liner implementation without using regexp:
''.join(' ' + char if char.isupper() else char.strip() for char in text).strip()
With regexes you can do this:
re.sub('([A-Z])', r' \1', str)
Of course, that will only work for ASCII characters, if you want to do Unicode it's a whole new can of worms :-)
If you have acronyms, you probably do not want spaces between them. This two-stage regex will keep acronyms intact (and also treat punctuation and other non-uppercase letters as something to add a space on):
re_outer = re.compile(r'([^A-Z ])([A-Z])')
re_inner = re.compile(r'(?<!^)([A-Z])([^A-Z])')
re_outer.sub(r'\1 \2', re_inner.sub(r' \1\2', 'DaveIsAFKRightNow!Cool'))
The output will be: 'Dave Is AFK Right Now! Cool'
I agree that the regex solution is the easiest, but I wouldn't say it's the most pythonic.
How about:
text = 'WordWordWord'
new_text = ''
for i, letter in enumerate(text):
if i and letter.isupper():
new_text += ' '
new_text += letter
I think regexes are the way to go here, but just to give a pure python version without (hopefully) any of the problems ΤΖΩΤΖΙΟΥ has pointed out:
def splitCaps(s):
result = []
for ch, next in window(s+" ", 2):
result.append(ch)
if next.isupper() and not ch.isspace():
result.append(' ')
return ''.join(result)
window() is a utility function I use to operate on a sliding window of items, defined as:
import collections, itertools
def window(it, winsize, step=1):
it=iter(it) # Ensure we have an iterator
l=collections.deque(itertools.islice(it, winsize))
while 1: # Continue till StopIteration gets raised.
yield tuple(l)
for i in range(step):
l.append(it.next())
l.popleft()
To the old thread - wanted to try an option for one of my requirements. Of course the re.sub() is the cool solution, but also got a 1 liner if re module isn't (or shouldn't be) imported.
st = 'ThisIsTextStringToSplitWithSpace'
print(''.join([' '+ s if s.isupper() else s for s in st]).lstrip())