Capture repeated characters and split using Python [duplicate] - python

This question already has answers here:
How can I tell if a string repeats itself in Python?
(13 answers)
Closed 3 years ago.
I need to split a string by using repeated characters.
For example:
My string is "howhowhow"
I need output as 'how,how,how'.
I cant use 'how' directly in my reg exp. because my input varies. I should check the string whether it is repeating the character and need to split that characters.

import re
string = "howhowhow"
print(','.join(re.findall(re.search(r"(.+?)\1", string).group(1), string)))
OUTPUT
howhowhow -> how,how,how
howhowhowhow -> how,how,how,how
testhowhowhow -> how,how,how # not clearly defined by OP
The pattern is non-greedy so that howhowhowhow doesn't map to howhow,howhow which is also legitimate. Remove the ? if you prefer the longest match.

lengthofRepeatedChar = 3
str1 = 'howhowhow'
HowmanyTimesRepeated = int(len(str1)/lengthofRepeatedChar)
((str1[:lengthofRepeatedChar]+',')*HowmanyTimesRepeated)[:-1]
'how,how,how'
Works When u know the length of repeated characters

Related

RegEx returns empty list when searching for words which begin with a number [duplicate]

This question already has answers here:
What do ^ and $ mean in a regular expression?
(2 answers)
Closed 2 years ago.
I've got a problem with carets and dollar signs in Python.
I want to find every word which starts with a number and ends with a letter
Here is what I've tried already:
import re
text = "Cell: 415kkk -555- 9999ll Work: 212-555jjj -0000"
phoneNumRegex = re.compile(r'^\d+\w+$')
print(phoneNumRegex.findall(text))
Result is an empty list:
[]
The result I want:
415kkk, 9999ll, 555jjj
Where is the problem?
Problems with your regex:
^...$ means you only want full matches over the whole string - get rid of that.
r'\w+' means "any word character" which means letters + numbers (case ignorant) plus underscore '_'. So this would match '5555' for '555' via
r'\d+' and another '5' as '\w+' hence add it to the result.
You need
import re
text = "Cell: 415kkk -555- 9999ll Work: 212-555jjj -0000"
phoneNumRegex = re.compile(r'\b\d+[a-zA-Z]+\b')
print(phoneNumRegex.findall(text))
instead:
['415kkk', '9999ll', '555jjj']
The '\b' are word boundaries so you do not match 'abcd1111' inside '_§$abcd1111+§$'.
Readup:
re-syntax
regex101.com - Regextester website that can handle python syntax

How to get the numbers from a string (contains no spaces between letters and numbers)? [duplicate]

This question already has answers here:
How to extract numbers from a string in Python?
(19 answers)
Closed 3 years ago.
So, I have a string "AB256+74POL". I want to extract the numbers only into a list say num = [256,74]. How to do this in python?
I have tried string.split('+') and followed by iterating over the two parts and adding the characters which satisfy isdigit(). But is there an easier way to that?
import re
a = 'AB256+74POL'
array = re.findall(r'[0-9]+', a)
"".join([c if c.isdigit() else " " for c in mystring]).split()
Explanation
Strings are iterable in python. So we iterate on each character in the string, and replace non digits with spaces, then split the result to get all sequences of digits in a list.

How to remove a character from a string until certain index? [duplicate]

This question already has answers here:
Remove characters from beginning and end or only end of line
(5 answers)
Closed 4 years ago.
So, I have the following string "........my.python.string" and I want to remove all the "." until it gets to the first alphanumeric character, is there a way to achieve this other than converting the string to a list and work it from there?
You can use re.sub:
import re
s = "........my.python.string"
new_s = re.sub('^\.+', '', s)
print(new_s)
Output:
my.python.string

NLTK RegexpTokenizer: Regex to retain just characters in Random text [duplicate]

This question already has answers here:
Using explicitly numbered repetition instead of question mark, star and plus
(4 answers)
Closed 5 years ago.
I used tokenizer = RegexpTokenizer(r'\w+') which retains alphanumeric characters
But how do I combine a regular expression to remove every other element retaining just characters greater than length 2
Below is one row in the dataframe which contains random text
0 [ANOTHER 2'' F/P SAMPLE 01:52 ...A13232 / AS OUTPUT MSG...
I think you need for find words with len>2:
RegexpTokenizer(r'\w{3,}')
Or if need only letters:
RegexpTokenizer(r'[a-zA-Z]{3,}')

Python: What is the Best way to split a string of 9 characters into 3 characters each and join them using delimiters? [duplicate]

This question already has answers here:
How to iterate over a list in chunks
(39 answers)
Closed 8 years ago.
I have a string "111222333" inside a CSV file. I would like to convert this into something like "\111\222\333"
Currently my python code is :
refcode = "111222333"
returnstring = "\\" + refcode[:3] + "\\" + refcode[3:6] + "\\" + refcode[-3:] + "\\"
I know there must be a better way to do this. May I know what are the better ways to do the same thing. Please help.
You could use re for that:
import re
refcode = "111222333"
returnstring = '\\'.join(re.match('()(\S{3})(\S{3})(\S{3})()', refcode).groups())
Explanation:
You have a string of 9 characters (let's say they are not any kind of whitespace chatacters, so we could represent it with \S).
We create a matching regexp using it, so (\S{3}) is a group of three sequential non-space characters (like letters, numbers, exclamation marks etc.).
(\S{3})(\S{3})(\S{3}) are three groups with 3 characters in each one.
If we call .groups() on it, we'll have a tuple of the matched groups, just like that:
In [1]: re.match('(\S{3})(\S{3})(\S{3})', refcode).groups()
Out[1]: ('111', '222', '333')
If we join it using a \ string, we'll get a:
In [29]: print "\\".join(re.match('(\S{3})(\S{3})(\S{3})', refcode).groups())
111\222\333
But you want to add the backslashes on the both sides of the string as well!
So we could create an empty group - () - on the each side of the regular expression.

Categories

Resources