How to get pronunciation (phonetics) of text (not speech, only text)? - python

I wanna get pronunciation of short messages using python. For example, message 'text' should be transformed to 'tekst' and message 'привет' (russian) should be transformed to 'privet'.
I have tried to use googletrans for it but there is no pronunciation in fact (pronunciation is None, my issue).
Does anybody know some package for this task? I have googled for it but there are no results. I've found over 5 packages for convert text-to-speech or text-translate-to-speech but I don't need an audio file, I need only text of pronunciation. The phonemizer is very good solution but I cannot run it's backends on windows.
Maybe does somebody know how to take some 'API' of this, this or this or this?

You can use selenium to get the texts from macmillandictionary.com.
With selenium you can navigate in the page, click, enter texts, etc. So your job will be hit the word in the search bar and get the result using selenium.
You may use oxfordlearnersdictionaries.com too.

Well there is a module named pronouncing in python which includes function to get pronunciations of words such as:
>>> import pronouncing
>>> pronouncing.phones_for_word("permit")
[u'P ER0 M IH1 T', u'P ER1 M IH2 T']
The pronouncing.phones_for_word() function returns a list of all pronunciations for the given word found in the CMU pronouncing dictionary.
Pronunciations are given using a special phonetic alphabet known as ARPAbet.
Here’s a list of ARPAbet symbols and what English sounds they stand for. Each token in a pronunciation string is called a “phone.”
The numbers after the vowels indicate the vowel’s stress. The number 1 indicates primary stress; 2 indicates secondary stress; and 0 indicates unstressed.
I got this from tutorial and cookbook page of pronouncing
There is another module named pysle which can help you

You might wanna check out the epitran python library

from googletrans import Translator
translator = Translator()
k = translator.translate("who are you", dest='hindi')
print(k)
print(k.text)
p = translator.translate(k.text,dest='hindi')#convert same language to same to get
#pronunciation
print(p)
print(p.pronunciation)
please try this for pronunciation
Output:
Translated(src=en, dest=hi, text=तुम कौन हो, pronunciation=None, extra_data="{'translat...")
तुम कौन हो
Translated(src=hi, dest=hi, text=तुम कौन हो, pronunciation=tum kaun ho, extra_data="{'translat...")
tum kaun ho
requirement to install googletrans

Related

Is it possible to set multiple strings in query for search method of tweepy? python

What I want is to search tweets that have multiple words I choose on twitter with python.
The official doc dose not say anything but it seems that the search method only takes 1 query.
source code
import tweepy
CK=
CS=
AT=
AS=
auth = tweepy.OAuthHandler(CK, CS)
auth.set_access_token(AT, AS)
api = tweepy.API(auth)
for status in api.search(q='word',count=100,): # I want to set multiple words in q but when I do.
print(status.user.id)
print(status.user.screen_name)
print(status.user.name)
print(status.text)
print(status.created_at)
What I have tried is below it didn't get any error but it searched only with the last word in the query in this case, the results were only tweets with the word "Python" it did not get tweets with both words.
for status in api.search(q='Java' and 'Python',count=100,)
Official doc
https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/api-reference/get-search-tweets
So my questions is that is it possible to set multiple words in query.
Is the way I wrote is simply wrong?
If so, please let me know.
If it can't set multiple words, I would appreciate if you could share simple python code that works for what I want to do.
Thank you in advance.
Use:
for status in api.search(q='Java Python', count=100)
From the Search Tweets: Standard v1.1 section Standard search operators:
watching now - containing both “watching” and “now”. This is the default operator.
As explained by Vlad Siv, just put each word you wish to look for in the speech marks for the query param. This should in turn look for tweets containing these words.

Use Python docx to create phonetic guide / 'Ruby text' in Word?

I want to add phonetic guides to words in MS Word. Inside of MS Word the original word is called 'Base text' and the phonetic guide is called 'Ruby text.'
Here's what I'm trying to create looks like in Word:
The docx documentation has page that talks about Run-level content with a reference to ruby: <xsd:element name="ruby" type="CT_Ruby"/> located here:
https://python-docx.readthedocs.io/en/latest/dev/analysis/features/text/run-content.html
I can not figure out how to access these in my code.
Here's an example of one of my attempts:
import docx
from docx import Document
document = Document()
base_text = '光栄'
ruby_text = 'こうえい'
p = document.add_paragraph(base_text)
p.add_run(ruby_text).ruby = True
document.save('ruby.docx')
But this code only returns the following:
光栄こうえい
I've tried to use ruby on the paragraph and p.text, removing the = True but I keep getting the error message 'X object has no attribute 'ruby'
Can someone please show me how to accomplish this?
Thank you!
The <xsd:element name="ruby" ... excerpt you mention is from the XML Schema type for a run. This means a child element of type CT_Ruby can be present on a run element (<w:r>) with the tag name <w:ruby>.
There is not yet any API support for this element in python-docx, so if you want to use it you'll need to manipulate the XML using low-level lxml calls. You can get access to the run element on run._r. If you search on "python-docx workaround function" and also perhaps "python-pptx workaround function" you'll find some examples of doing this to extend the functionality.

How to get the base form of an adj or adverb using lemma in spacy

For a project, I would like to be able to get the noun form of an adjective or adverb if there is one using NLP.
For example, "deathly" would return "death" and "dead" would return "death".
"lively" would return "life".
I've tried using the spacy lemmatizer but it does not manage to get the base radical form.
For example, if I'd do:
import spacy
nlp = spacy.load('en_core_web_sm')
z = nlp("deathly lively")
for token in z:
print(token.lemma_)
It would return:
>>> deathly lively
instead of:
>>> death life
Does anyone have any ideas?
Any answer is appreciated.
From what I've seen so far, SpaCy is not super-great at doing what you want it to do. Instead, I am using a 3rd party library called pyinflect, which is intended to be used as an extension to SpaCy.
While it isn't perfect, I think it will work better than your current approach.
I'm also considering another 3rd-party library called inflect, which might be worth checking out, as well.

Using custom POS tags for NLTK chunking?

Is it possible to use non-standard part of speech tags when making a grammar for chunking in the NLTK? For example, I have the following sentence to parse:
complication/patf associated/qlco with/prep breast/noun surgery/diap
independent/adj of/prep the/det use/inpr of/prep surgical/diap device/medd ./pd
Locating the phrases I need from the text is greatly assisted by specialized tags such as "medd" or "diap". I thought that because you can use RegEx for parsing, it would be independent of anything else, but when I try to run the following code, I get an error:
grammar = r'TEST: {<diap>}'
cp = nltk.RegexpParser(grammar)
cp.parse(sentence)
ValueError: Transformation generated invalid chunkstring:
<patf><qlco><prep><noun>{<diap>}<adj><prep><det><inpr><prep>{<diap>}<medd><pd>
I think this has to do with the tags themselves, because the NLTK can't generate a tree from them, but is it possible to skip that part and just get the chunked items returned? Maybe the NLTK isn't the best tool, and if so, can anyone recommend another module for chunking text?
I'm developing in python 2.7.6 with the Anaconda distribution.
Thanks in advance!
Yes it is possible to use custom tags for NLTK chunking. I have used the same.
Refer: How to parse custom tags using nltk.Regexp.parser()
The ValueError and the error description suggest that there is an error in the formation of your grammar and you need to check that. You can update the answer with the same for suggestions on corrections.
#POS Tagging
words=word_tokenize(example_sent)
pos=nltk.pos_tag(words)
print(pos)
#Chunking
chunk=r'Chunk: {<JJ.?>+<NN.?>+}'
par=nltk.RegexpParser(chunk)
par2=par.parse(pos)
print('Chunking - ',par2)
print('------------------------------ Parsing the filtered chunks')
# printing only the required chunks
for i in par2.subtrees():
if i.label()=='Chunk':
print(i)
print('------------------------------NER')
# NER
ner=nltk.ne_chunk(pos)
print(ner)

Using Pattern.web to search all Wikipedia is raising "'NoneType' object is not iterable" error

I am trying to use Pattern.web to search all of Wikipedia for words and phrases that include an apostrophe. This is my latest attempt:
from pattern.web import Wikipedia, plaintext
from pattern.web import SEARCH
engine = Wikipedia(language="en")
q = "\"cat's\""
for i in range(1, 2):
for result in engine.search(q, start=i, count=10, type=SEARCH, cached=True):
print plaintext(result.text)
print result.url
print result.date
print
But I get this error message:
for result in engine.search(q, start=i, count=10, type=SEARCH, cached=True):
TypeError: 'NoneType' object is not iterable
Question:
Is it even possible to do what I'm trying to do?
If it is, how do I fix this?
If you refer to the Wikipedia SearchEngine documentation, you'll notice that your attempt to iterate is misguided and your query may be erroneous as well:
Wikipedia.search() returns a single WikipediaArticle for the given (case-sensitive) query, which is the title of an article.
(Note that this means that start and count can only be 1.)
I would venture to guess, without downloading the pattern library and trying this myself, that since there is no Wikipedia article entitled "cat's", so you get None back.
So, is it possible to do what you are trying to do? Yes. Refer again to the documentation:
Wikipedia.index() returns an iterator over all article titles on Wikipedia.
You might do something like this:
for title in engine.index():
article = engine.search(title)
# do your string pattern searching here
I answered this to the best of my ability without downloading pattern and trying it myself, so YMMV.

Categories

Resources