splitting a string by specific letters while preserving them in the string

splitting a string by specific letters while preserving them in the string - python

I'm trying to split a string by specific letters(in this case:'r','g' and'b') so that I can then later append them to a list. The catch here is that I want the letters to be copied to over to the list as well.
string = '1b24g55r44r'
What I want:
[[1b], [24g], [55r], [44r]]

You can use findall:
import re
print([match for match in re.findall('[^rgb]+?[rgb]', '1b24g55r44r')])
Output
['1b', '24g', '55r', '44r']
The regex match:
[^rgb]+? everything that is not rgb one or more times
followed by one of [rgb].
If you need the result to be singleton lists you can do it like this:
print([[match] for match in re.findall('[^rgb]+?[rgb]', '1b24g55r44r')])
Output
[['1b'], ['24g'], ['55r'], ['44r']]
Also if the string is only composed of digits and rgb you can do it like this:
import re
print([[match] for match in re.findall('\d+?[rgb]', '1b24g55r44r')])
The only change in the above regex is \d+?, that means match one or more digits.
Output
[['1b'], ['24g'], ['55r'], ['44r']]

Related

How to get all the string after and before two specific words?

I want to replace all the string after "my;encoded;image:" (which is the base64 data of the image) and i want to stop before the word "END" , but the following code is replacing also the two strings "my;encoded;image:" and "END". Any suggestions?
import re
re.sub("my;encoded;image:.*END","random_words",image,flags=re.DOTALL)
NB : a simple way could be to use replacement but i want to use regex in my case Thanks

You can use a non-greedy regex to split the string into three groups. Then replace the second group with your string:
import re
x = re.sub(r'(.*my;encoded;image:)(.*?)(END.*)', r"\1my string\3", image)
print(x)
You can use f-strings with Python 3.6 and higher:
replacement = "hello"
x = re.sub(r'(.*my;encoded;image:)(.*?)(END.*)', fr'\1{replacement}\3', image)

Python: how to split string by groups Alpha-numeric vs numeric

Lets say I have Strings like:
"H39_M1", "H3_M15", "H3M19", "H3M11", "D363_H3", "D_128_H17_M50"
How can I split them every single one into a list of substrings?
like this:
["H39", "M1"], "[H3, "Min15"], ["H3","M19"], ["H3","M11"], ["D363","H3"], ["D128","H17","M50"]
and afterwards: switch places of alphanumeric-group and numeric group,
like this:
["39H", "1M"], "[3H, "15Min"], ["3H","19M"], ["3H","11M"], ["363D","3H"],["128D","17H","50M"]
length of numbers-group and of alphanumeric group varys as you can see.
also "_" underscores can divide them.

I might suggest using re.findall here with re.sub:
inp = "H3M19"
inp = re.sub(r'([A-Z]+)([0-9]+)', r'\2\1', inp)
parts = re.findall(r'[0-9]+[A-Z]+', inp)
print(parts)
This prints:
['3H', '19M']
The first re.sub step converts H3M19 into 3H19M, by capturing the letter and numeric pairs and then swapping them. Then, we use re.findall to find all number/letter pairs in the swapped input.

Parsing Korean text into a list using regex

I have some data stored as pandas data frame and one of the columns contains text strings in Korean. I would like to process each of these text strings as follows:
my_string = '모질상태불량(피부상태불량, 심하게 야윔), 치석심함, 양측 수정체 백탁, 좌측 화농성 눈곱심함(7/22), 코로나음성(활력저하)'
Into a list like this:
parsed_text = '모질상태불량, 피부상태불량, 심하게 야윔, 치석심함, 양측 수정체 백탁, 좌측 화농성 눈곱심함(7/22), 코로나음성, 활력저하'
So the problem is to identify cases where a word (or several words) are followed by parentheses with text only (can be one words or several words separated by commas) and replace them by all the words (before and inside parentheses) separated by comma (for later processing). If a word is followed by parentheses containing numbers (as in this case 7/22), it should be kept as it is. If a word is not followed by any parentheses, it should also be kept as it is. Furthermore, I would like to preserve the order of words (as they appeared in the original string).
I can extract text in parentheses by using regex as follows:
corrected_string = re.findall(r'(\w+)\((\D.*?)\)', my_string)
which yields this:
[('모질상태불량', '피부상태불량, 심하게 야윔'), ('코로나음성', '활력저하')]
But I'm having difficulty creating my resulting string, i.e. replacing my original text with the pattern I've matched. Any suggestions? Thank you.

You can use re.findall with a pattern that optionally matches a number enclosed in parentheses:
corrected_string = re.findall(r'[^,()]+(?:\([^)]*\d[^)]*\))?', my_string)

It's little bit clumsy but you can try:
my_string_list = [x.strip() for x in re.split(r"\((?!\d)|(?<!\d)\)|,", my_string) if x]
# you can make string out of list then.

Add a number to the beginning of a string in particular locations

I have this string:
abc,12345,abc,abc,abc,abc,12345,98765443,xyz,zyx,123
What can I use to add a 0 to the beginning of each number in this string? So how can I turn that string into something like:
abc,012345,abc,abc,abc,abc,012345,098765443,xyz,zyx,0123
I've tried playing around with Regex but I'm unsure how I can use that effectively to yield the result I want. I need it to match with a string of numbers rather than a positive integer, but with only numbers in the string, so not something like:
1234abc567 into 01234abc567 as it has letters in it. Each value is always separated by a comma.

Use re.sub,
re.sub(r'(^|,)(\d)', r'\g<1>0\2', s)
or
re.sub(r'(^|,)(?=\d)', r'\g<1>0', s)
or
re.sub(r'\b(\d)', r'0\1', s)

Try following
re.sub(r'(?<=\b)(\d+)(?=\b)', r'\g<1>0', str)

If the numbers are always seperated by commas in your string, you can use basic list methods to achieve the result you want.
Let's say your string is called x
y=x.split(',')
x=''
for i in y:
if i.isdigit():
i='0'+i
x=x+i+','
What this piece of code does is the following:
Splits your string into pieces depending on where you have commas and returns a list of the pieces.
Checks if the pieces are actually numbers, and if they are a 0 is added using string concatenation.
Finally your string is rebuilt by concatenating the pieces along with the commas.

grep prefix python strings

I'm having problems using findall in python.
I have a text such as:
the name of 33e4853h45y45 is one of the 33e445a64b65 and we want all the 33e5c44598e46 to be matched
So i'm trying to find all occurrences of of those alphanumeric strings in the text. the thing is I know they all have the "33e" prefix.
Right now, I have strings = re.findall(r"(33e+)+", stdout_value) but it doesn't work.
I want to be able to return 33e445a64b65, 33e5c44598e46

try this
>>> x="the name of 33e4853h45y45 is one of the 33e445a64b65 and we want all the 33e5c44598e46 to be matched"
>>> re.findall("33e\w+",x)
['33e4853h45y45', '33e445a64b65', '33e5c44598e46']

Here's a slight variation:
>>> string = '''the name of 33e4853h45y45 is one of the 33e445a64b65 and we want all the 33e5c44598e46 to be matched'''
>>> re.findall(r"(33e[a-z0-9]+)", string)
['33e4853h45y45', '33e445a64b65', '33e5c44598e46']
Instead of matching any word characters, it will only match digits and lowercase numbers after the 33e -- that's what the [a-z0-9]+ means.
If you wanted to also match capital letters, you could replace that part with [a-zA-Z0-9]+ instead.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

splitting a string by specific letters while preserving them in the string - python

I'm trying to split a string by specific letters(in this case:'r','g' and'b') so that I can then later append them to a list. The catch here is that I want the letters to be copied to over to the list as well. string = '1b24g55r44r' What I want: [[1b], [24g], [55r], [44r]]

Related

How to get all the string after and before two specific words?

Python: how to split string by groups Alpha-numeric vs numeric

Parsing Korean text into a list using regex

Add a number to the beginning of a string in particular locations

grep prefix python strings

Categories

Resources