i want to transliterate hindi english code mix language (popularly known as hinglish) to english .I tried google transliterate api ,but its not showing correct results is there any other alternative to that .for eg :- kya haal hai? to how are you?
Related
I know about the Google Translate API in python. However, in my data frame, there are only a few entries that have 'Hindi' language. How do I recognize the language of these records and then translate them to English.
Basically, I want to do the following.
if !hindi, continue else translate from hindi to english.
I am using this - https://pypi.org/project/googletrans/
So, I realized that default is English. The translate command automatically converts other languages to English and if it is English, it stays as it is.
I have one problem. Let me try to explain this little problem.
I use transliterate library in my Django project. User can write english (latin) or russian (cyrillic) letters in field. If user write russian words it change word to latin letters but if user write english words I see next error:
LanguageDetectionError: Can't detect language for the text "document" given.
I use this code:
transliterate.translit(field_value, reversed=True)
Also I notice that in that project its impossible to detect english language, isn't it?
transliterate.detect_language(field_value) return None when user enter english word.
My aim is to transliterate only if user wrote russion word, but don't touch it user wrote english word. What can you advice?
Right now I found library which can help me to detect language: https://pypi.python.org/pypi/langdetect
Who worked with this library?
Could you try detecting English and then moving on to assume Russian? I put some Russian news articles into the Python code listed below. It clearly detected that it is not English.
It's pretty simple code that can be easily applied.
isEnglish github
I'm trying to test out stemming Arabic text using the ISRIStemmer tool but the GNOME terminal doesn't properly render Arabic text which is RTL.
I assume this means I need to have the texts I need in external documents and reference them in the code.
Can any one show me an example of how I might go about doing this?
I'm using this method exactly, but when I try to specify just english with lang="en" and every other variation of that I could think of it doesn't work. This is what I'm putting in (even with keywords to limit it further) and it still isn't giving me just English. I've tried with and without keywords. I'm trying to build a 200,000+ Tweet searchable control corpus in only English for a research project and I do not want to go through that many Tweets by hand. Ideas?
>>> from nltk.twitter import Twitter
>>> tw = Twitter()
>>> tw.tweets(keywords='Delicacy, reptile, death, hold, dark, column, gifted, surgeon, brave, fashion, pearl, diamond, bent, sparkle, present, missing, shadow, holiday, glide, scanner, luster, immunity, devour, discipline, barbaric, fortunate, heart, puzzle, ache, crystal',
limit=10000, lang="en", to_screen=False)
Writing to /Users/rhiannalavalla/twitter-files/tweets.20170521-235221.json
Written 10000 Tweets
The lang option is passed to the twitter search API, so you're requesting "English" tweets. But have you used twitter? You don't have to declare the language of each and every tweet, so twitter can't restrict your results with accuracy. The lang option evidently matches the authors's choice of language for their UI, not the language of the individual tweets.
To restrict your results to tweets in English, search by hashtags and/or user ids that are likely to be of interest to English speakers only (the specifics will depend on what your corpus is for). Alternately (or perhaps in addition), you can try an automated language identification algorithm to filter out suspect tweets. The nltk comes with the langid corpus of language trigram statistics, which you could use to train a recognizer.
I am new to programming, and I am trying to understand transliteration - like the Google Input Tools that will allow the user to type from one language to another language.
How does transliteration work? Specifically, if I am translating from English to Hindi or English to Russsian, do I need to incorporate a dictionary of words for English, Hindi and Russian languages?
Does any one know of any tutorials showing how to write the code for transliteration? I have tried searching, but no luck.
Also, does the code have to be in JavaScript/JQuery (client side code)? My project is Python/django. Can I write the transliteration code in python/dgango?
Thanks.
Direct dictionary-to-dictionary automatic translation produces poor results due to differences in grammar and the presence of idiomatic sentences. The starting point in python, in my experience, should be NLTK (Natural Language ToolKit) libraries and tutorials.
Then, trying to provide you a working example you may start from here:
Machine Translation using babelize_shell() in NLTK
Translating human languages in Python
Google is your friend
Bing is your friend
The use of javascript/jquery depends on the UI you are planning, maybe you want to trigger an automatic translation after a few key pressed, or onblur or onchange in a input tag but is not relevant for the translation itself.
The process of translating is also really resource consuming, so I discourage you to do it inside a django view. My suggestion is to not reinvent the wheel, and use some already existing API like google or bing ones.
I found that the better search term is Input Method Editor not transliteration.
There is a project on github here: https://github.com/wikimedia/jquery.ime that deals with IME's and transliteration here.
I hope that this helps some one.
The typical way of implementing transliteration is to use a mapping dictionary. An example of this can be seen in the mapping.py file for the CyrTranslit Python package.
Word translation usages a database to convert English word into Hindi Word.
Some apps are based on this concept like:
English to Hindi Dictionary