This question already has answers here:
Using explicitly numbered repetition instead of question mark, star and plus
(4 answers)
Closed 5 years ago.
I used tokenizer = RegexpTokenizer(r'\w+') which retains alphanumeric characters
But how do I combine a regular expression to remove every other element retaining just characters greater than length 2
Below is one row in the dataframe which contains random text
0 [ANOTHER 2'' F/P SAMPLE 01:52 ...A13232 / AS OUTPUT MSG...
I think you need for find words with len>2:
RegexpTokenizer(r'\w{3,}')
Or if need only letters:
RegexpTokenizer(r'[a-zA-Z]{3,}')
Related
This question already has answers here:
How to extract numbers from a string in Python?
(19 answers)
Closed 3 years ago.
So, I have a string "AB256+74POL". I want to extract the numbers only into a list say num = [256,74]. How to do this in python?
I have tried string.split('+') and followed by iterating over the two parts and adding the characters which satisfy isdigit(). But is there an easier way to that?
import re
a = 'AB256+74POL'
array = re.findall(r'[0-9]+', a)
"".join([c if c.isdigit() else " " for c in mystring]).split()
Explanation
Strings are iterable in python. So we iterate on each character in the string, and replace non digits with spaces, then split the result to get all sequences of digits in a list.
This question already has answers here:
Escaping regex string
(4 answers)
Closed 3 years ago.
ı am trying to stemmize words in tex of dataframe
data is a dataframe , karma is text column , zargan is the dict of word and root of word
for a in range(1,100000):
for j in data.KARMA[a].split():
pattern = r'\b'+j+r'\b'
data.KARMA[a] = re.sub(pattern, str(zargan.get(j,j)),data.KARMA[a])
print(data.KARMA[1])
I want to change the word and root in the texts
Looks like j contains some regular expression special character like *. If you want it to be interpreted as literal text, you can say
pattern = r'\b'+re.escape(j)+r'\b'
and possibly the same for r if it should similarly be coerced into a literal string.
This question already has answers here:
Convert a list with strings all to lowercase or uppercase
(13 answers)
Closed 4 years ago.
I have a list that has 12 elements. I am getting an input and matching that input with the value of another variable. Now that means that case-sensitivity will be a problem. I know how to go through the list with a loop but how can I convert every character in each element to a lowercase character?
for i in sa:
# something here to convert element in sa to lowercase
A simple one liner:
lowercase_list = [ i.lower() for i in input_list ]
This question already has answers here:
How can I tell if a string repeats itself in Python?
(13 answers)
Closed 3 years ago.
I need to split a string by using repeated characters.
For example:
My string is "howhowhow"
I need output as 'how,how,how'.
I cant use 'how' directly in my reg exp. because my input varies. I should check the string whether it is repeating the character and need to split that characters.
import re
string = "howhowhow"
print(','.join(re.findall(re.search(r"(.+?)\1", string).group(1), string)))
OUTPUT
howhowhow -> how,how,how
howhowhowhow -> how,how,how,how
testhowhowhow -> how,how,how # not clearly defined by OP
The pattern is non-greedy so that howhowhowhow doesn't map to howhow,howhow which is also legitimate. Remove the ? if you prefer the longest match.
lengthofRepeatedChar = 3
str1 = 'howhowhow'
HowmanyTimesRepeated = int(len(str1)/lengthofRepeatedChar)
((str1[:lengthofRepeatedChar]+',')*HowmanyTimesRepeated)[:-1]
'how,how,how'
Works When u know the length of repeated characters
This question already has answers here:
Find index of last occurrence of a substring in a string
(12 answers)
Closed 8 years ago.
How would I find the last occurrence of a character in a string?
string = "abcd}def}"
string = string.find('}',last) # Want something like this
You can use rfind:
>>> "abcd}def}".rfind('}')
8