Split a string into segments in python [closed] - python

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I'm trying to split a molecule as a string into it's individual atom components. Each atom starts at a capital letter and ends at the last number.
For example, 'SO4' would become ['S', 'O4'].
And 'C6H12O6' would become ['C6', 'H12', 'O6'].
Pretty sure I need to use the regex module. This answer is close to what I'm looking for: Split a string at uppercase letters

Use re.findall() with the pattern:
[A-Z][a-z]?\d*
[A-Z] matches any uppercase character
[a-z]? matches zero or one lowercase character
\d* matches zero or more digits
Based on your example this should work, although you should look out for any specific library for this purpose.
Example:
>>> re.findall(r'[A-Z][a-z]?\d*', 'C6H12O6')
['C6', 'H12', 'O6']
>>> re.findall(r'[A-Z][a-z]?\d*', 'SO4')
['S', 'O4']
>>> re.findall(r'[A-Z][a-z]?\d*', 'HCl')
['H', 'Cl']

Related

negative lookbehind not working as expected [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I have strings of this form:
FPLBX(2x3)ZE(53x13)(4x7)ZGQO
I want to find the blocks in parenthesis but only when they're not preceded by another group.
The other way around works perfectly fine but I can't make it work with preceding.
current regex:
(\(\d*x\d*\))(?<!\))
You simply need to put the so-called negative lookbehind assertion, i.e. the (?<!\))-part, in front of your search re:
>>> import re
>>> txt = "FPLBX(2x3)ZE(53x13)(4x7)ZGQO"
>>> re.findall(r"(?<!\))(\(\d*x\d*\))", txt)
['(2x3)', '(53x13)']

Replace every caret with a superscript in a python string [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I want to replace every caret character with a unicode superscript, for nicer printing of equations in python. My problem is, every caret may be followed by a different exponent value, so in the unicode string u'\u00b*', the * wildcard needs to be the exponent I want to print in the string. I figured some regex would work for this, but my experience with that is very little.
For example, supposed I have a string
"x^3-x^2"
, I would then want this to be converted to the unicode string
u"x\u00b3-x\u00b2"
You can use re.sub and str.translate to catch exponents and change them to unicode superscripts.
import re
def to_superscript(num):
transl = str.maketrans(dict(zip('1234567890', '¹²³⁴⁵⁶⁷⁸⁹⁰')))
return num.translate(transl)
s = 'x^3-x^2'
out = re.sub('\^\s*(\d+)', lambda m: to_superscript(m[1]), s)
print(out)
Output
x³-x²

How to extract groups contains desired string from between quotes using regex? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I would like to extract some strings from between quotes using regular expression. The text is shown below:
CCKeyUpDomReady('test.asmx/asdasd', 'QMlPJZTOH09XOPCcbB2jcg==', '0OO6h+G2Tzhr5XWj1Upg0A==', '0OO6h+G2Tzhr5XWj1Upg0A==', '/qqwweq2.asmx/qqq')
Expected result must be:
test.asmx/asdasd
/qqwweq2.asmx/qqq
How can I do it? Here is the platform for testing:
https://regexr.com/3n142
The criteria: string which is between quotes must contains "asmx" word. The text is much more than showed above. You can think like that you are searching asmx urls in a website source code.
See regex in use here
'((?:[^'\\]|\\.)*asmx(?:[^'\\]|\\.)*)'
' Match this literally
((?:[^'\\]|\\.)*asmx(?:[^'\\]|\\.)*) Capture the following into capture group 1
(?:[^'\\]|\\.)* This is a beautiful trick gathered from PhiLho's answer to Regex for quoted string with escaping quotes. It matches escaped ' or any other character.
asmx The OP's search string/criterion
(?:[^'\\]|\\.)* This again
' Match this literally
The result is in capture group:
test.asmx/asdasd
/qqwweq2.asmx/qqq

regex for a character pattern that is scattered throughout the text [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'm a Python and regex noob. I managed to get a full page of html source into the command line by the following statement.
print (driver.page_source).encode('utf-8')
Cool. But there are some predictable strings in that text that I need to extract and store into an array. The string pattern being looked for is, [4 numbers] followed by a [hyphen] followed by between 1 and 5 numbers, e.g.:
2013-80324 or 2013-03 but not 2013-832888
Thanks for any help.
(?:^|(?<=\D))\d{4}-\d{1,5}(?=\D|$)
?: denotes a non capturing group
^ matches the pattern at start of string (though unlikely for HTML input)
$ mathces the pattern at the end of string
\d denotes a digit [0-9] and \D a non-digit
{n} is a quantifier for length n
{m,n} quantifies a length of range m to n (both inclusive)

find all possible overlapping prefixes in a word using python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
many Natural Languages have prefixes that adds some meaning to a word.
for example: anti for antivirus, co for coordinator, counter for counterpart
detecting the stem needs these prefixes to be separated. suppose having a list of prefixes for a certain language:
prefix_list = ['c', 'ca', 'ata', 'de']
How to mach all possible overlapping occurrence in a word "catastrophic"
the result should be:
['c', 'ca']
trials:
| character doesn't support overlapping
Otto's solution doesn't mach overlaps in the beginning of the word
I tried to backward assertion instead in the previous solution but look-behind requires fixed-width pattern
notes:
ata can't be a result as the word doesn't start with ata
Don't use a regular expression. Use a list comprehension instead:
[prefix for prefix in prefix_list if word.startswith(prefix)]
This creates a list of all entries in prefix_list that are a prefix of word.

Categories

Resources