regex between some character - python

input:
[1] string1 [2] string 2[3]string3 [5]string4
output:
string1
string 2
string 4
how to resolve the case as "input" and the result "output" with "regex" python?

Your instructions are a bit vague, that said - using the example input you provided:
import re
line = '[1] string1 [2] string 2[3]string3 [5]string4'
matches = re.match(r'\[1\] (.*?)\[2\] (.*?)\[3\](.*?)\[5\](.*)', line)
print matches.group(1) # prints string1
print matches.group(2) # prints string 2
print matches.group(3) # prints string3
print matches.group(4) # prints string4

It seems like you don't want the value of an index if it is preceded by a character other than white space or a starting anchor ,
>>> s = "[1] string1 [2] string 2[3]string3 [5]string4"
>>> import regex
>>> m = regex.findall(r'(?<= |^)\[\d+\]\s*([^\[]*)', s)
>>> for i in m:
... print i
...
string1
string 2
string4

Related

Replacing spaces in one string with characters of other string

Say I have two strings, string1="A B C " and string2="abc". How do combine these two strings so string1 becomes "AaBbCc"? So basically I want all the spaces in string1 to be replaced by characters in string2. I tried using two for-loops like this:
string1="A B C "
string2="abc"
for char1 in string1:
if char1==" ":
for char2 in string2:
string1.replace(char1,char2)
else:
pass
print(string1)
But that doesn't work. I'm fairly new to Python so could somebody help me? I use version Python3. Thank you in advance.
You can use iter on String2 and replace ' ' with char in String2 like below:
>>> string1 = "A B C "
>>> string2 = "abc"
>>> itrStr2 = iter(string2)
>>> ''.join(st if st!=' ' else next(itrStr2) for st in string1)
'AaBbCc'
If maybe len in two String is different you can use itertools.cycle like below:
>>> from itertools import cycle
>>> string1 = "A B C A B C "
>>> string2 = "abc"
>>> itrStr2 = cycle(string2)
>>> ''.join(st if st!=' ' else next(itrStr2) for st in string1)
'AaBbCcAaBbCc'
string1 = "A B C "
string2 = "abc"
out, repl = '', list(string2)
for s in string1:
out += s if s != " " else repl.pop(0)
print(out) #AaBbCc

Spliting string after certain amount of characters

I have a lengthy string and want to split it after a certain number of characters. I already have done this:
if len(song.lyrics) > 2048:
string1 = string[:2048]
string2 = string[2049:]
The problem with this is that sometimes it breaks in the middle of text and I don't want to. Is there a way to get the last linebreak before the character limit is reached and break it there?
Thanks
Does this give you the result you're looking for? If not, could you please provide an example string with expected output?
import re
CHARACTER_LIMIT = 2048
for m in re.finditer(r'.{,%s}(?:\n|$)' % CHARACTER_LIMIT, string, re.DOTALL):
print(m.group(0))
Find the index of newline character just-left-of your length limit then use it to split.
if len(song.lyrics) > 2048:
index = string[:2048].rfind('\n')
string1 = string[:index]
string2 = string[index+1:]
Example:
>>> s = 'aaaaaaa\nbbbbbbbbbbbbbbbb\nccccccc\ndddddddddddddddd'
>>> limit = 31 # ↑
>>> index = s[:limit].rfind('\n')
>>> index
24
>>> s1,s2 = s[:index],s[index+1:]
>>> s1
'aaaaaaa\nbbbbbbbbbbbbbbbb'
>>> s2
'ccccccc\ndddddddddddddddd'
>>>

How to check if individual character in a string exists in another string for python

Is there any library that allows me to check If all the individual characters in one string exists in another string. When i try to use in what happens is the character has to be a substring. It only works for 1234 and 123. However i want something that checks individual characters. I want a library that gives me the output: string 2 is in string 1 for the following code.
string1 = '1234'
string2 = '24'
if string2 in string1:
print('string2 is in string1')
else:
print('string2 is not in string1')
You can use all() with a generator. This returns a true only if all conditions are a true else false:
string1 = '1234'
string2 = '24'
if all(x in string1 for x in string2):
print('string2 is in string1')
else:
print('string2 is not in string1')
Or, you can use set's issubset:
set(string2).issubset(string1)

python regex matching "ab" or "ba" words

I tried matching words including the letter "ab" or "ba" e.g. "ab"olition, f"ab"rics, pro"ba"ble. I came up with the following regular expression:
r"[Aa](?=[Bb])[Bb]|[Bb](?=[Aa])[Aa]"
But it includes words that start or end with ", (, ), / ....non-alphanumeric characters. How can I erase it? I just want to match words list.
import sys
import re
word=[]
dict={}
f = open('C:/Python27/brown_half.txt', 'rU')
w = open('C:/Python27/brown_halfout.txt', 'w')
data = f.read()
word = data.split() # word is list
f.close()
for num2 in word:
match2 = re.findall("\w*(ab|ba)\w*", num2)
if match2:
dict[num2] = (dict[num2] + 1) if num2 in dict.keys() else 1
for key2 in sorted(dict.iterkeys()):print "%s: %s" % (key2, dict[key2])
print len(dict.keys())
Here, I don't know how to mix it up with "re.compile~~" method that 1st comment said...
To match all the words with ab or ba (case insensitive):
import re
text = 'fabh, obar! (Abtt) yybA, kk'
pattern = re.compile(r"(\w*(ab|ba)\w*)", re.IGNORECASE)
# to print all the matches
for match in pattern.finditer(text):
print match.group(0)
# to print the first match
print pattern.search(text).group(0)
https://regex101.com/r/uH3xM9/1
Regular expressions are not the best tool for the job in this case. They'll complicate stuff way too much for such simple circumstances. You can instead use Python's builtin in operator (works for both Python 2 and 3)...
sentence = "There are no probable situations whereby that may happen, or so it seems since the Abolition."
words = [''.join(filter(lambda x: x.isalpha(), token)) for token in sentence.split()]
for word in words:
word = word.lower()
if 'ab' in word or 'ba' in word:
print('Word "{}" matches pattern!'.format(word))
As you can see, 'ab' in word evaluates to True if the string 'ab' is found as-is (that is, exactly) in word, or False otherwise. For example 'ba' in 'probable' == True and 'ab' in 'Abolition' == False. The second line takes take of dividing the sentence in words and taking out any punctuation character. word = word.lower() makes word lowercase before the comparisons, so that for word = 'Abolition', 'ab' in word == True.
I would do it this way:
Strip your string from unwanted chars using the below two
techniques, your choice:
a - By building a translation dictionary and using translate method:
>>> import string
>>> del_punc = dict.fromkeys(ord(c) for c in string.punctuation)
s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
>>> s = s.translate(del_punc)
>>> print(s)
'abolition fabrics probable test case bank halfback 1ablution'
b - using re.sub method:
>>> import string
>>> import re
>>> s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
>>> s = re.sub(r'[%s]'%string.punctuation, '', s)
>>> print(s)
'abolition fabrics probable test case bank halfback 1ablution'
Next will be finding your words containing 'ab' or 'ba':
a - Splitting over whitespaces and finding occurrences of your desired strings, which is the one I recommend to you:
>>> [x for x in s.split() if 'ab' in x.lower() or 'ba' in x.lower()]
['abolition', 'fabrics', 'probable', 'bank', 'halfback', '1ablution']
b -Using re.finditer method:
>>> pat
re.compile('\\b.*?(ab|ba).*?\\b', re.IGNORECASE)
>>> for m in pat.finditer(s):
print(m.group())
abolition
fabrics
probable
test case bank
halfback
1ablution
string = "your string here"
lowercase = string.lower()
if 'ab' in lowercase or 'ba' in lowercase:
print(true)
else:
print(false)
Try this one
[(),/]*([a-z]|(ba|ab))+[(),/]*

Python: Modify Part of a String

I am taking an input string that is all one continuous group of letters and splitting it into a sentence. The problem is that as a beginner I can't figure out how to modify the string to ONLY capitalize the first letter and convert the others to lowercase. I know the string.lower but that converts everything to lowercase. Any ideas?
# This program asks user for a string run together
# with each word capitalized and gives back the words
# separated and only the first word capitalized
import re
def main():
# ask the user for a string
string = input( 'Enter some words each one capitalized, run together without spaces ')
for ch in string:
if ch.isupper() and not ch.islower():
newstr = re.sub('[A-Z]',addspace,string)
print(newstr)
def addspace(m) :
return ' ' + m.group(0)
#call the main function
main()
You can use capitalize():
Return a copy of the string with its first character capitalized and
the rest lowercased.
>>> s = "hello world"
>>> s.capitalize()
'Hello world'
>>> s = "hello World"
>>> s.capitalize()
'Hello world'
>>> s = "hELLO WORLD"
>>> s.capitalize()
'Hello world'
Unrelated example. To capitalize only the first letter you can do:
>>> s = 'hello'
>>> s = s[0].upper()+s[1:]
>>> print s
Hello
>>> s = 'heLLO'
>>> s = s[0].upper()+s[1:]
>>> print s
HeLLO
For a whole string, you can do
>>> s = 'what is your name'
>>> print ' '.join(i[0].upper()+i[1:] for i in s.split())
What Is Your Name
[EDIT]
You can also do:
>>> s = 'Hello What Is Your Name'
>>> s = ''.join(j.lower() if i>0 else j for i,j in enumerate(s))
>>> print s
Hello what is your name
If you only want to capitalize the start of sentences (and your string has multiple sentences), you can do something like:
>>> sentences = "this is sentence one. this is sentence two. and SENTENCE three."
>>> split_sentences = sentences.split('.')
>>> '. '.join([s.strip().capitalize() for s in split_sentences])
'This is sentence one. This is sentence two. And sentence three. '
If you don't want to change the case of the letters that don't start the sentence, then you can define your own capitalize function:
>>> def my_capitalize(s):
if s: # check that s is not ''
return s[0].upper() + s[1:]
return s
and then:
>>> '. '.join([my_capitalize(s.strip()) for s in split_sentences])
'This is sentence one. This is sentence two. And SENTENCE three. '

Categories

Resources