Python command/package for researching substrings? [closed]

Python command/package for researching substrings? [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
Does anyone knows how to do similar in Python to Bash' seq ?
For example, I would like to search for a line in a document that contains the chain of characters "measurement = abc" and replace "abc" with something of my own (using "{}".format() or something like it).
Do I need to check each string/substring in the document with an iterative loop or is there a built-in command or package in Python that already does that ?
Thank you!

I would do this with the regex implementation in Python (built-in re module).
This can be done with a positive lookbehind assertion.
(?<=...) Matches if the current position in the string is preceded by a match for ... that ends at the current position.
import re
data = "measurement = abc some other text measurement = abc some other text"
data = re.sub(r'(?<=measurement\s=\s)abc', '123', data) # positive lookbehind assertion
print(data) # "measurement = 123 some other text measurement = 123 some other text"

Related

Python to detect dipthongs in English [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
Is there a Python library that can detect/highlight all the dipthongs (in normal spelling not IPA) in a given English text?

If you are defining 'dipthongs' as just letter pairs/combinations that typically indicate double vowel sounds ('ee', 'ou', etc.), then something like the following would work to go through and search for letter combinations from a pre-defined set:
while len(text) > 0:
# Iterate over dipthongs
for d in DIPTHONGS:
# If dipthong is in remaining text
if d in text:
# Partition remaining text
before, dip, after = text.partition(d)
# Append the part before and the highlighted dipthong to highlighted_text
highlighted_text = highlighted_text + before
highlighted_text = highlighted_text + f'*{d}*'
# Update text to the remaining text
text = after
else:
# No dipthongs found, so append remainder of text to highlighted_text
highlighted_text = highlighted_text + text
text = ''
print(highlighted_text)
Output:
I have used asterisks for the highlight as it's quick and easy but you could easily adapt this use colours, or whatever you need for your use case.
I can't think of any examples off the top of my head, but I suspect there are cases where the spelling is dipthong-like but the pronuncation is not a dipthong or vice-versa (because English is like that). To actually take the pronunciation into consideration you could use something like the NLTK CMUdict corpus - see section 4.2 of https://www.nltk.org/book/ch02.html to get started.

python newspaper3k return a article_html by string, i want to delete matched string [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
Below is the code and output.
article = Article(url,keep_article_html=True)
article.download()
article.parse()
print(article.article_html)
<div><p class="tt_adsense_top">
</p>
<p> test test </p>
I want to delete this part from the string
<div><p class="tt_adsense_top">
</p>
<p>
only leave
<p> test test </p>
when i use python re to match it, i can only match this line i don't know how to match blank line and blank space.
<div><p class="tt_adsense_top">
who can give me a example to delete it

The 'replace' function for python strings would be the easiest way. So for example you would do
a = "some string removethis"
a = a.replace("removethis", "")
You could also use 'remove' function, and the important distinction between that and replace is that 'remove' only removes the first string it finds while replace will replace all found substrings with the 2nd argument you pass to the function. I would advise reading the python documentation on string methods it has a lot interesting and important functions you can play around with.

Python 3 multiple replacements [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I don't see why this has been placed on hold as "off topic." I am asking for programming help, not a reference (which is what the explanation of why it was closed said). Here's my original question:
I have a python 3 script to send emails in HTML to a list of about 600 people. Sendmail apparently can't send non-ascii characters above 127 (decimal) unless I jump through hoops with MIME. So I'm considering doing a bulk replace of all accented characters with their HTML &#...; equivalents.
I'd rather not use regex, since I'm not proficient at them. Is there way to do this without using a loop, or at least not a complicated one?

Googled "python encode html entities", first result: https://wiki.python.org/moin/EscapingHtml:
Builtin HTML/XML escaping via ASCII encoding
A very easy way to transform non-ASCII characters like German umlauts or letters with accents into their HTML equivalents is simply encoding them from unicode to ASCII and use the xmlcharrefreplace encoding error handling:
>>> a = u"äöüßáà"
>>> a.encode('ascii', 'xmlcharrefreplace')
'äöüßáà'

You can use str.translate() and the entities from the html package:
import html
text = "a text with ä and ö"
ent = {k: '&{};'.format(v) for k, v in html.entities.codepoint2name.items()}
print(text.translate(ent))
Output:
a text with ä and ö

how to replace specific characters with different rules with python? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have a file and some substitution is needed: replace "，" with "," and all other characters not in rule2 with a whitespace, how can I do that？

What about this?
text = text.replace("，", ",")

You can use the regular expressions module:
text = re.sub('，', ',', text)
text = re.sub(negated_rule2, ' ', text)
where your negation of "rule2" is formatted using the regular expressions syntax (see link above).

Xpath builder in Python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I'm building relatively complicated xpath expressions in Python, in order to pass them to selenium. However, its pretty easy to make a mistake, so I'm looking for a library that allows me to build the expressions without messing about with strings. For example, instead of writing
locator='//ul[#class="comment-contents"][contains(., "West")]/li[contains(., "reply")]
I could write something like:
import xpathbuilder as xpb
locator = xpb.root("ul")
.filter(attr="class",value="comment-contents")
.filter(xpb.contains(".", "West")
.subclause("li")
.filter(xpb.contains (".", "reply"))
which is maybe not as readable, but is less error-prone. Does anything like this exist?

though this is not exactly what you want.. you can use css selector
...
import lxml.cssselect
csssel = 'div[class="main"]'
selobj = lxml.cssselect.CSSSelector(csssel)
elements = selobj(documenttree)
generated XPath expression is in selobj.path
>>> selobj.path
u"descendant-or-self::div[#class = 'main']"

You can use lxml.etree that allows to write code as the following:
from lxml.builder import ElementMaker # lxml only !
E = ElementMaker(namespace="http://my.de/fault/namespace", nsmap={'p' : "http://my.de/fault/namespace"})
DOC = E.doc
TITLE = E.title
SECTION = E.section
PAR = E.par
my_doc = DOC(
TITLE("The dog and the hog"),
SECTION(
TITLE("The dog"),
PAR("Once upon a time, ..."),
PAR("And then …")
),
SECTION(
TITLE("The hog"),
PAR("Sooner or later …")
)
)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python command/package for researching substrings? [closed] - python

Related

Python to detect dipthongs in English [closed]

python newspaper3k return a article_html by string, i want to delete matched string [closed]

Python 3 multiple replacements [closed]

how to replace specific characters with different rules with python? [closed]

Xpath builder in Python [closed]

Categories

Resources