Replace every caret with a superscript in a python string [closed] - python

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I want to replace every caret character with a unicode superscript, for nicer printing of equations in python. My problem is, every caret may be followed by a different exponent value, so in the unicode string u'\u00b*', the * wildcard needs to be the exponent I want to print in the string. I figured some regex would work for this, but my experience with that is very little.
For example, supposed I have a string
"x^3-x^2"
, I would then want this to be converted to the unicode string
u"x\u00b3-x\u00b2"

You can use re.sub and str.translate to catch exponents and change them to unicode superscripts.
import re
def to_superscript(num):
transl = str.maketrans(dict(zip('1234567890', '¹²³⁴⁵⁶⁷⁸⁹⁰')))
return num.translate(transl)
s = 'x^3-x^2'
out = re.sub('\^\s*(\d+)', lambda m: to_superscript(m[1]), s)
print(out)
Output
x³-x²

Related

negative lookbehind not working as expected [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I have strings of this form:
FPLBX(2x3)ZE(53x13)(4x7)ZGQO
I want to find the blocks in parenthesis but only when they're not preceded by another group.
The other way around works perfectly fine but I can't make it work with preceding.
current regex:
(\(\d*x\d*\))(?<!\))
You simply need to put the so-called negative lookbehind assertion, i.e. the (?<!\))-part, in front of your search re:
>>> import re
>>> txt = "FPLBX(2x3)ZE(53x13)(4x7)ZGQO"
>>> re.findall(r"(?<!\))(\(\d*x\d*\))", txt)
['(2x3)', '(53x13)']

How to extract groups contains desired string from between quotes using regex? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I would like to extract some strings from between quotes using regular expression. The text is shown below:
CCKeyUpDomReady('test.asmx/asdasd', 'QMlPJZTOH09XOPCcbB2jcg==', '0OO6h+G2Tzhr5XWj1Upg0A==', '0OO6h+G2Tzhr5XWj1Upg0A==', '/qqwweq2.asmx/qqq')
Expected result must be:
test.asmx/asdasd
/qqwweq2.asmx/qqq
How can I do it? Here is the platform for testing:
https://regexr.com/3n142
The criteria: string which is between quotes must contains "asmx" word. The text is much more than showed above. You can think like that you are searching asmx urls in a website source code.
See regex in use here
'((?:[^'\\]|\\.)*asmx(?:[^'\\]|\\.)*)'
' Match this literally
((?:[^'\\]|\\.)*asmx(?:[^'\\]|\\.)*) Capture the following into capture group 1
(?:[^'\\]|\\.)* This is a beautiful trick gathered from PhiLho's answer to Regex for quoted string with escaping quotes. It matches escaped ' or any other character.
asmx The OP's search string/criterion
(?:[^'\\]|\\.)* This again
' Match this literally
The result is in capture group:
test.asmx/asdasd
/qqwweq2.asmx/qqq

How do I escape '\x' in Python? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I'm using pymysql to query a database that has an entry like 'name':'Te\xtCorp', it's a name that I need to preserve. I'm sending it somewhere else with json.dumps() and when it hits this it fails to escape the \x.
What's the proper way to escape the \x without double escaping everything else?
Two options here:
You escape the backslash, like:
'Te\\xtCorp'
You can use a raw string:
r'Te\xtCorp'
Both generate:
>>> 'Te\\xtCorp'
'Te\\xtCorp'
>>> r'Te\xtCorp'
'Te\\xtCorp'
Or printed:
>>> print(r'Te\xtCorp')
Te\xtCorp
Note that in order to inspect the content of the string, you should use a print(..) statement, otherwise you get the repr(..)esentation of that string. For example:
>>> print(json.dumps(r'te\xt'))
"te\\xt"
>>> print(json.loads(json.dumps(r'te\xt')))
te\xt
As one can read in the documentation on String literals:
\xhh...: ASCII character with hex value hh...
So it is used to encode any ASCII character, by specifying the code as a hexadecimal value.

Split a string into segments in python [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I'm trying to split a molecule as a string into it's individual atom components. Each atom starts at a capital letter and ends at the last number.
For example, 'SO4' would become ['S', 'O4'].
And 'C6H12O6' would become ['C6', 'H12', 'O6'].
Pretty sure I need to use the regex module. This answer is close to what I'm looking for: Split a string at uppercase letters
Use re.findall() with the pattern:
[A-Z][a-z]?\d*
[A-Z] matches any uppercase character
[a-z]? matches zero or one lowercase character
\d* matches zero or more digits
Based on your example this should work, although you should look out for any specific library for this purpose.
Example:
>>> re.findall(r'[A-Z][a-z]?\d*', 'C6H12O6')
['C6', 'H12', 'O6']
>>> re.findall(r'[A-Z][a-z]?\d*', 'SO4')
['S', 'O4']
>>> re.findall(r'[A-Z][a-z]?\d*', 'HCl')
['H', 'Cl']

Regex not working when used on a Chinese text [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I created a small python function to remove some undesired elements from strings written in Chinese.
Those undesired elements feature an ampersand at the beginning (&Something).
The function uses a regex to spot them, remove them and return the longest part of the string without those undesired elements, but for some reason it's not working as expected.
I tested the function on strings in other languages and alphabets and it works as expected.
# -*- coding: utf-8 -*-
import re
def clean_sentence(my_text):
split_the_text = re.split(r'([&].*?\s)', my_text)
longest_sentence = max(split_the_text, key=len)
return longest_sentence
my_string = "一个神奇的鸭子飞在与&SOMETHING然后唱支歌给&PERSON"
print clean_sentence(my_string)
That's the output:
õ©Çõ©¬þÑ×ÕÑçþÜäÚ©¡Õ¡ÉÚú×Õ£¿õ©Ä&SOMETHINGþäÂÕÉÄÕö▒µö»µ¡îþ╗Ö&PERSON
Pretty simple:
There is no whitespace but you require one. If your SOMETHING or PERSON are only english characters or digits, you might be able to get along with:
import re
def clean_sentence(my_text):
split_the_text = re.split(r'&\w+', my_text)
longest_sentence = max(split_the_text, key=len)
return longest_sentence
my_string = "一个神奇的鸭子飞在与&SOMETHING然后唱支歌给&PERSON"
print(clean_sentence(my_string))
# 一个神奇的鸭子飞在与

Categories

Resources