Python 3 multiple replacements [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I don't see why this has been placed on hold as "off topic." I am asking for programming help, not a reference (which is what the explanation of why it was closed said). Here's my original question:
I have a python 3 script to send emails in HTML to a list of about 600 people. Sendmail apparently can't send non-ascii characters above 127 (decimal) unless I jump through hoops with MIME. So I'm considering doing a bulk replace of all accented characters with their HTML &#...; equivalents.
I'd rather not use regex, since I'm not proficient at them. Is there way to do this without using a loop, or at least not a complicated one?

Googled "python encode html entities", first result: https://wiki.python.org/moin/EscapingHtml:
Builtin HTML/XML escaping via ASCII encoding
A very easy way to transform non-ASCII characters like German umlauts or letters with accents into their HTML equivalents is simply encoding them from unicode to ASCII and use the xmlcharrefreplace encoding error handling:
>>> a = u"äöüßáà"
>>> a.encode('ascii', 'xmlcharrefreplace')
'äöüßáà'

You can use str.translate() and the entities from the html package:
import html
text = "a text with ä and ö"
ent = {k: '&{};'.format(v) for k, v in html.entities.codepoint2name.items()}
print(text.translate(ent))
Output:
a text with ä and ö

Related

Python command/package for researching substrings? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
Does anyone knows how to do similar in Python to Bash' seq ?
For example, I would like to search for a line in a document that contains the chain of characters "measurement = abc" and replace "abc" with something of my own (using "{}".format() or something like it).
Do I need to check each string/substring in the document with an iterative loop or is there a built-in command or package in Python that already does that ?
Thank you!
I would do this with the regex implementation in Python (built-in re module).
This can be done with a positive lookbehind assertion.
(?<=...) Matches if the current position in the string is preceded by a match for ... that ends at the current position.
import re
data = "measurement = abc some other text measurement = abc some other text"
data = re.sub(r'(?<=measurement\s=\s)abc', '123', data) # positive lookbehind assertion
print(data) # "measurement = 123 some other text measurement = 123 some other text"

I want to make \ into a string [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Edit= Some moderators recommended me to make my self more clear, so here we go.
As a personal project in python, I'm making a very simple software that asks the user for an email address and then checks if the syntaxis of the email is correct.
I made a tuple of special characters that are not allowed in an email address, one of those characters is "\". I was looking online like crazy for how to make \ into a str with no result. I try looking online for the use of the function \ with no result either.
V = "\" doesn't work, it gives me a syntax error. I know it is possible to make it into a string because I've done it with an Input() command.
Please help.
It's not clear to me what language you're using - but in most cases you need to escape the backslash, as it is an escape character itself.
V="\\"
This functionality exists that you can include special characters (in this case, a double quote) in the string:
V="The following will be in quotes: \"Hello, World\""
In this case, the escaped double quotes will be treated as literal characters in the string, and will not signal the end of the string as they would without the escape character.

python newspaper3k return a article_html by string, i want to delete matched string [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
Below is the code and output.
article = Article(url,keep_article_html=True)
article.download()
article.parse()
print(article.article_html)
<div><p class="tt_adsense_top">
</p>
<p> test test </p>
I want to delete this part from the string
<div><p class="tt_adsense_top">
</p>
<p>
only leave
<p> test test </p>
when i use python re to match it, i can only match this line i don't know how to match blank line and blank space.
<div><p class="tt_adsense_top">
who can give me a example to delete it
The 'replace' function for python strings would be the easiest way. So for example you would do
a = "some string removethis"
a = a.replace("removethis", "")
You could also use 'remove' function, and the important distinction between that and replace is that 'remove' only removes the first string it finds while replace will replace all found substrings with the 2nd argument you pass to the function. I would advise reading the python documentation on string methods it has a lot interesting and important functions you can play around with.

How do I escape '\x' in Python? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I'm using pymysql to query a database that has an entry like 'name':'Te\xtCorp', it's a name that I need to preserve. I'm sending it somewhere else with json.dumps() and when it hits this it fails to escape the \x.
What's the proper way to escape the \x without double escaping everything else?
Two options here:
You escape the backslash, like:
'Te\\xtCorp'
You can use a raw string:
r'Te\xtCorp'
Both generate:
>>> 'Te\\xtCorp'
'Te\\xtCorp'
>>> r'Te\xtCorp'
'Te\\xtCorp'
Or printed:
>>> print(r'Te\xtCorp')
Te\xtCorp
Note that in order to inspect the content of the string, you should use a print(..) statement, otherwise you get the repr(..)esentation of that string. For example:
>>> print(json.dumps(r'te\xt'))
"te\\xt"
>>> print(json.loads(json.dumps(r'te\xt')))
te\xt
As one can read in the documentation on String literals:
\xhh...: ASCII character with hex value hh...
So it is used to encode any ASCII character, by specifying the code as a hexadecimal value.

How to read Unicode file as Unicode string in Python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have a file that is encoded in Unicode or UTF-8 (I don't know which). When I read the file in Python 3.4, the resulting string is interpreted as an ASCII string. How do I convert it to a Unicode string like u"text"?
The term "Unicode" refers to the standard, not to a particular encoding.
Since files in computers are binary, there exist different ways of encoding Unicode data in binary files. One of them is "UTF-8".
You can consult https://docs.python.org/3/howto/unicode.html
An example taken from this document (in the section "Reading and Writing Unicode Data")
with open('unicode.txt', encoding='utf-8') as f:
for line in f:
print(repr(line))
In python 3, unlike python2, unicode string constants are not written with a "u".

Categories

Resources