How to parse answers with deepl api?

How to parse answers with deepl api? - python

I want to use deepl translate api for my university project, but I can't parse it. I want to use it wit PHP or with Python, because the argument I'll pass to a python script so it's indifferent to me which will be the end. I tried in php like this:
$original = $_GET['searchterm'];
$deeplTranslateURL='https://api-free.deepl.com/v2/translate?auth_key=MYKEY&text='.urlencode($original).'&target_lang=EN';
if (get_headers($deeplTranslateURL)[0]=='HTTP/1.1 200 OK') {
$translated = str_replace(' ', '', json_decode(file_get_contents($deeplTranslateURL))["translations"][0]["text"]);
}else{
echo("translate error");
}
$output = passthru("python search.py $original $translated");
and I tried also in search.py based this answer:
#!/usr/bin/env python
import sys
import requests
r = requests.post(url='https://api.deepl.com/v2/translate',
data = {
'target_lang' : 'EN',
'auth_key' : 'MYKEY',
'text': str(sys.argv)[1]
})
print 'Argument:', sys.argv[1]
print 'Argument List:', str(sys.argv)
print 'translated to: ', str(r.json()["translations"][0]["text"])
But neither got me any answer, how can I do correctly? Also I know I can do it somehow in cURL but I didn't used that lib ever.

DeepL now has a python library that makes translation with python much easier, and eliminates the need to use requests and parse a response.
Get started as such:
import deepl
translator = deepl.Translator(auth_key)
result = translator.translate_text(text_you_want_to_translate, target_lang="EN-US")
print(result)
Looking at your question, it looks like search.py might have a couple problems, namely that sys splits up every individual word into a single item in a list, so you're only passing a single word to DeepL. This is a problem because DeepL is a contextual translator: it builds a translation based on the words in a sentence - it doesn't simply act as a dictionary for individual words. If you want to translate single words, DeepL API probably isn't what you want to go with.
However, if you are actually trying to pass a sentence to DeepL, I have built out this new search.py that should work for you:
import sys
import deepl
auth_key="your_auth_key"
translator = deepl.Translator(auth_key)
"""
" ".join(sys.argv[1:]) converts all list items after item [0]
into a string separated by spaces
"""
result = translator.translate_text(" ".join(sys.argv[1:]), target_lang = "EN-US")
print('Argument:', sys.argv[1])
print('Argument List:', str(sys.argv))
print("String to translate: ", " ".join(sys.argv[1:]))
print("Translated String:", result)
I ran the program by entering this:
search.py Der Künstler wurde mit einem Preis ausgezeichnet.
and received this output:
Argument: Der
Argument List: ['search.py', 'Der', 'Künstler', 'wurde', 'mit', 'einem',
'Preis', 'ausgezeichnet.']
String to translate: Der Künstler wurde mit einem Preis ausgezeichnet.
Translated String: The artist was awarded a prize.
I hope this helps, and that it's not too far past the end of your University Project!

Related

How to make os.system read all the statements

import wikipedia
import os
while True:
input = raw_input("Ques: ")
#To get output in a particular language ,
#This prints the results on spanish
#wikipedia.set_lang("es")
wiki = wikipedia.summary(input, sentences = 2).encode('utf-8').strip()
os.system("say " + wiki)
print wiki
on the output console, it asks for
Ques: when I type Cristiano
It says "Cristiano is a Portuguese footballer"
But when I type anything other than Cristiano (Say Chelsea FC), it says
sh: -c: line 0: unexpected EOF while looking for matching `''
sh: -c: line 1: syntax error: unexpected end of file
OR
sh: -c: line 0: syntax error near unexpected token `('

The returning value of wikipedia.summary() may contain characters that the shell interprets with special meaning. You can escape such characters with shlex.quote():
import wikipedia
import os
import shlex
while True:
input = raw_input("Ques: ")
#To get output in a particular language ,
#This prints the results on spanish
#wikipedia.set_lang("es")
wiki = wikipedia.summary(input, sentences = 2).encode('utf-8').strip()
os.system("say " + shlex.quote(wiki))
print wiki

I haven't work with wikipedia third-party before. But when I try your code and I found that I just need to delete .encode('utf-8'). And it's work for me.
wiki = wikipedia.summary(i, sentences=2).strip()
import wikipedia
import os
while True:
i = input("Ques: ")
#To get output in a particular language ,
#This prints the results on spanish
#wikipedia.set_lang("es")
wiki = wikipedia.summary(i, sentences=2).strip()
os.system("say "+ wiki)
print(wiki)
Result: Chelsea Football Club is a professional football club in London, England, that competes in the Premier League. Founded in 1905, the club's home ground since then has been Stamford Bridge.Chelsea won the First Division title in 1955, ....
Or you can you use another third-party like pyttsx3: pip install pyttsx3.
And the code will be like this:
import wikipedia
import pyttsx3
engine = pyttsx3.init()
while True:
i = input("Ques: ")
wiki = wikipedia.summary(i, sentences=2).strip()
# os.system("say "+ wiki)
print(wiki)
engine.say(wiki)
engine.runAndWait()`
I hope it can help.

parsing API with Python - how to handle JSON with BOM

I'm using Python 2.7.11 on windows to get JSON data from API (data on trees in Warsaw, Poland, but nevermind that). I want to generate output csv file with all the data provided by the api, for further analysis. I started with a script I used for another project (also discussed here on Stackoverflow and corrected for me by #Martin Taylor).That script didn't work so I tried to modify it using my very basic understanding, googling around and applying pdb debugger. At the moment, the result looks like this:
import pdb
import json
import urllib2
import csv
pdb.set_trace()
url = "https://api.um.warszawa.pl/api/action/datastore_search/?resource_id=ed6217dd-c8d0-4f7b-8bed-3b7eb81a95ba"
myfile = 'C:/dane/drzewa.csv'
csv_myfile = csv.writer(open(myfile, 'wb'))
cols = ['numer_adres', 'stan_zdrowia', 'y_wgs84', 'dzielnica', 'adres', 'lokalizacja', 'wiek_w_dni', 'srednica_k', 'pnie_obwod', 'miasto', 'jednostka', 'x_pl2000', 'wysokosc', 'y_pl2000', 'numer_inw', 'x_wgs84', '_id', 'gatunek_1', 'gatunek', 'data_wyk_pom']
csv_myfile.writerow(cols)
def api_iterate(myfile):
while True:
global url
print url
json_page = urllib2.urlopen(url)
data = json.load(json_page)
json_page.close()
for data_object in data ['result']['records']:
csv_myfile.writerow([data_object[col] for col in cols])
try:
url = data['_links']['next']
except KeyError as e:
break
with open(myfile, 'wb'):
api_iterate(myfile)
I'm a very fresh Python user so I get confused all the time. Now I got to the point when, while reading the objects in json dictionary, I get a Keyerror message associated with the 'x_wgs84' element. I suppose it has something to do with the fact that in the source url this element is preceded by a U+FEFF unicode character. I tried to get around this but I got stuck and would appreciate assistance.
I suspect the code may be corrupt in several other ways - as I mentioned, I'm a very unskilled programmer (yet).

You need to put the key with the unicode character:
To know how to do it, one easy way is to print the keys:
>>> import requests
>>> res = requests.get('https://api.um.warszawa.pl/api/action/datastore_search/?resource_id=ed6217dd-c8d0-4f7b-8bed-3b7eb81a95ba')
>>> data = res.json()
>>> records = data['result']['records']
>>> records[0]
{u'numer_adres': u'', u'stan_zdrowia': u'dobry', u'y_wgs84': u'52.21865', u'y_pl2000': u'5787241.04475524', u'adres': u'ul. ALPEJSKA', u'x_pl2000': u'7511793.96937063', u'lokalizacja': u'Ulica ALPEJSKA', u'wiek_w_dni': u'60', u'miasto': u'Warszawa', u'jednostka': u'Dzielnica Wawer', u'pnie_obwod': u'73', u'wysokosc': u'14', u'data_wyk_pom': u'20130709', u'dzielnica': u'Wawer', u'\ufeffx_wgs84': u'21.172584', u'numer_inw': u'D386200', u'_id': 125435, u'gatunek_1': u'Quercus robur', u'gatunek': u'd\u0105b szypu\u0142kowy', u'srednica_k': u'7'}
>>> records[0].keys()
[u'numer_adres', u'stan_zdrowia', u'y_wgs84', u'y_pl2000', u'adres', u'x_pl2000', u'lokalizacja', u'wiek_w_dni', u'miasto', u'jednostka', u'pnie_obwod', u'wysokosc', u'data_wyk_pom', u'dzielnica', u'\ufeffx_wgs84', u'numer_inw', u'_id', u'gatunek_1', u'gatunek', u'srednica_k']
>>> records[0][u'\ufeffx_wgs84']
u'21.172584'
As you can see, to get your key, you need to write it as '\ufeffx_wgs84' with the unicode character that is causing trouble.
Note: I don't know if you are using python2 or 3, but you might need to put a u before your string declaration in python2 to declare it as unicode string.

Why is is this "Invalid syntax" in Python?

This isn't my code, it is a module I found on the internet which performs (or is supposed to perform) the task I want.
print '{'
for page in range (1,4):
rand = random.random()
id = str(long( rand*1000000000000000000 ))
query_params = { 'q':'a',
'include_entities':'true', 'lang':'en',
'show_user':'true',
'rpp': '100', 'page': page,
'result_type': 'mixed',
'max_id':id}
r = requests.get('http://search.twitter.com/search.json',
params=query_params)
tweets = json.loads(r.text)['results']
for tweet in tweets:
if tweet.get('text') :
print tweet
print '}'
print
The Python shell seems to indicate that the error is one Line 1. I know very little Python so have no idea why it isn't working.

This snippet is written for Python 2.x, but in Python 3.x (where print is now a proper function). Replace print SomeExp with print(SomeExpr) to solve this.
Here's a detailed description of this difference (along with other changes in 3.x).

How to detect the language type of a given text via Python? [duplicate]

I'm faced with a situation where I'm reading a string of text and I need to detect the language code (en, de, fr, es, etc).
Is there a simple way to do this in python?

If you need to detect language in response to a user action then you could use google ajax language API:
#!/usr/bin/env python
import json
import urllib, urllib2
def detect_language(text,
userip=None,
referrer="http://stackoverflow.com/q/4545977/4279",
api_key=None):
query = {'q': text.encode('utf-8') if isinstance(text, unicode) else text}
if userip: query.update(userip=userip)
if api_key: query.update(key=api_key)
url = 'https://ajax.googleapis.com/ajax/services/language/detect?v=1.0&%s'%(
urllib.urlencode(query))
request = urllib2.Request(url, None, headers=dict(Referer=referrer))
d = json.load(urllib2.urlopen(request))
if d['responseStatus'] != 200 or u'error' in d['responseData']:
raise IOError(d)
return d['responseData']['language']
print detect_language("Python - can I detect unicode string language code?")
Output
en
Google Translate API v2
Default limit 100000 characters/day (no more than 5000 at a time).
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import json
import urllib, urllib2
from operator import itemgetter
def detect_language_v2(chunks, api_key):
"""
chunks: either string or sequence of strings
Return list of corresponding language codes
"""
if isinstance(chunks, basestring):
chunks = [chunks]
url = 'https://www.googleapis.com/language/translate/v2'
data = urllib.urlencode(dict(
q=[t.encode('utf-8') if isinstance(t, unicode) else t
for t in chunks],
key=api_key,
target="en"), doseq=1)
# the request length MUST be < 5000
if len(data) > 5000:
raise ValueError("request is too long, see "
"http://code.google.com/apis/language/translate/terms.html")
#NOTE: use POST to allow more than 2K characters
request = urllib2.Request(url, data,
headers={'X-HTTP-Method-Override': 'GET'})
d = json.load(urllib2.urlopen(request))
if u'error' in d:
raise IOError(d)
return map(itemgetter('detectedSourceLanguage'), d['data']['translations'])
Now you could request detecting a language explicitly:
def detect_language_v2(chunks, api_key):
"""
chunks: either string or sequence of strings
Return list of corresponding language codes
"""
if isinstance(chunks, basestring):
chunks = [chunks]
url = 'https://www.googleapis.com/language/translate/v2/detect'
data = urllib.urlencode(dict(
q=[t.encode('utf-8') if isinstance(t, unicode) else t
for t in chunks],
key=api_key), doseq=True)
# the request length MUST be < 5000
if len(data) > 5000:
raise ValueError("request is too long, see "
"http://code.google.com/apis/language/translate/terms.html")
#NOTE: use POST to allow more than 2K characters
request = urllib2.Request(url, data,
headers={'X-HTTP-Method-Override': 'GET'})
d = json.load(urllib2.urlopen(request))
return [sorted(L, key=itemgetter('confidence'))[-1]['language']
for L in d['data']['detections']]
Example:
print detect_language_v2(
["Python - can I detect unicode string language code?",
u"матрёшка",
u"打水"], api_key=open('api_key.txt').read().strip())
Output
[u'en', u'ru', u'zh-CN']

In my case I only need to determine two languages so I just check the first character:
import unicodedata
def is_greek(term):
return 'GREEK' in unicodedata.name(term.strip()[0])
def is_hebrew(term):
return 'HEBREW' in unicodedata.name(term.strip()[0])

Have a look at guess-language:
Attempts to determine the natural language of a selection of Unicode (utf-8) text.
But as the name says, it guesses the language. You can't expect 100% correct results.
Edit:
guess-language is unmaintained. But there is a fork (that support python3): guess_language-spirit

Look at Natural Language Toolkit and Automatic Language Identification using Python for ideas.
I would like to know if a Bayesian filter can get language right but I can't write a proof of concept right now.

A useful article here suggests that this open source named CLD is the best bet for detecting language in python.
The article shows a comparison of speed and accuracy between 3 solutions :
language-detection or its python port langdetect
Tika
Chromium Language Detection (CLD)
I wasted my time with langdetect now I am switching to CLD which is 16x faster than langdetect and has 98.8% accuracy

Try Universal Encoding Detector its a port of chardet module from Firefox to Python.

If you only have a limited number of possible languages, you could use a set of dictionaries (possibly only including the most common words) of each language and then check the words in your input against the dictionaries.

Generating pdf-latex with python script

I'm a college guy, and in my college, to present any kind of homework, it has to have a standard coverpage (with the college logo, course name, professor's name, my name and bla bla bla).
So, I have a .tex document, which generate my standard coverpages pdfs. It goes something like:
...
\begin{document}
%% College logo
\vspace{5cm}
\begin{center}
\textbf{\huge "School and Program Name" \\}
\vspace{1cm}
\textbf{\Large "Homework Title" \\}
\vspace{1cm}
\textbf{\Large "Course Name" \\}
\end{center}
\vspace{2.5cm}
\begin{flushright}
{\large "My name" }
\end{flushright}
...
So, I was wondering if there's a way to make a Python script that asks me for the title of my homework, the course name and the rest of the strings and use them to generate the coverpage. After that, it should compile the .tex and generate the pdf with the information given.
Any opinions, advice, snippet, library, is accepted.

You can start by defining the template tex file as a string:
content = r'''\documentclass{article}
\begin{document}
...
\textbf{\huge %(school)s \\}
\vspace{1cm}
\textbf{\Large %(title)s \\}
...
\end{document}
'''
Next, use argparse to accept values for the course, title, name and school:
parser = argparse.ArgumentParser()
parser.add_argument('-c', '--course')
parser.add_argument('-t', '--title')
parser.add_argument('-n', '--name',)
parser.add_argument('-s', '--school', default='My U')
A bit of string formatting is all it takes to stick the args into content:
args = parser.parse_args()
content%args.__dict__
After writing the content out to a file, cover.tex,
with open('cover.tex','w') as f:
f.write(content%args.__dict__)
you could use subprocess to call pdflatex cover.tex.
proc = subprocess.Popen(['pdflatex', 'cover.tex'])
proc.communicate()
You could add an lpr command here too to add printing to the workflow.
Remove unneeded files:
os.unlink('cover.tex')
os.unlink('cover.log')
The script could then be called like this:
make_cover.py -c "Hardest Class Ever" -t "Theoretical Theory" -n Me
Putting it all together,
import argparse
import os
import subprocess
content = r'''\documentclass{article}
\begin{document}
... P \& B
\textbf{\huge %(school)s \\}
\vspace{1cm}
\textbf{\Large %(title)s \\}
...
\end{document}
'''
parser = argparse.ArgumentParser()
parser.add_argument('-c', '--course')
parser.add_argument('-t', '--title')
parser.add_argument('-n', '--name',)
parser.add_argument('-s', '--school', default='My U')
args = parser.parse_args()
with open('cover.tex','w') as f:
f.write(content%args.__dict__)
cmd = ['pdflatex', '-interaction', 'nonstopmode', 'cover.tex']
proc = subprocess.Popen(cmd)
proc.communicate()
retcode = proc.returncode
if not retcode == 0:
os.unlink('cover.pdf')
raise ValueError('Error {} executing command: {}'.format(retcode, ' '.join(cmd)))
os.unlink('cover.tex')
os.unlink('cover.log')

There are of course templating systems like Jinja, but they're probably overkill for what you're asking. You can also format the page using RST and use that to generate LaTeX, but again that's probably overkill. Heck, auto-generating the page is probably overkill for the number of fields you've got to define, but since when did overkill stop us! :)
I've done something similar with Python's string formatting. Take your LaTeX document above and "tokenize" the file by placing %(placeholder_name1)s tokens into the document. For example, where you want your class name to go, use %(course_name)s
\textbf{\Large "%(homework_title)s" \\}
\vspace{1cm}
\textbf{\Large "%(course_name)s" \\}
Then, from Python, you can load in that template and format it as:
template = file('template.tex', 'r').read()
page = template % {'course_name' : 'Computer Science 500',
'homework_title' : 'NP-Complete'}
file('result.tex', 'w').write(page)
If you want to find those tokens automatically, the following should do pretty well:
import sys
import re
import subprocess
template = file('template.tex', 'r').read()
pattern = re.compile('%\(([^}]+)\)[bcdeEfFgGnosxX%]')
tokens = pattern.findall(template)
token_values = dict()
for token in tokens:
sys.stdout.write('Enter value for ' + token + ': ')
token_values[token] = sys.stdin.readline().strip()
page = template % token_values
file('result.tex', 'w').write(page)
subprocess.call('pdflatex result.tex')
The code will iterate across the tokens and print a prompt to the console asking you for an input for each token. In the above example, you'll get two prompts (with example answers):
Enter value for homework_title: NP-Complete
Enter value for course_name: Computer Science 500
The final line calls pdflatex on the resulting file and generates a PDF from it. If you want to go further, you could also ask the user for an output file name or take it as an command line option.

There is also a Template class (since 2.4) allowing to use $that token instead of %(thi)s one.

There's a Python library exactly for that: PyLaTeX. The following code was taken directly from the documentation:
from pylatex import Document, Section, Subsection, Command
from pylatex.utils import italic, NoEscape
def fill_document(doc):
"""Add a section, a subsection and some text to the document.
:param doc: the document
:type doc: :class:`pylatex.document.Document` instance
"""
with doc.create(Section('A section')):
doc.append('Some regular text and some ')
doc.append(italic('italic text. '))
with doc.create(Subsection('A subsection')):
doc.append('Also some crazy characters: $&#{}')
if __name__ == '__main__':
# Basic document
doc = Document('basic')
fill_document(doc)
doc.generate_pdf(clean_tex=False)
doc.generate_tex()
# Document with `\maketitle` command activated
doc = Document()
doc.preamble.append(Command('title', 'Awesome Title'))
doc.preamble.append(Command('author', 'Anonymous author'))
doc.preamble.append(Command('date', NoEscape(r'\today')))
doc.append(NoEscape(r'\maketitle'))
fill_document(doc)
doc.generate_pdf('basic_maketitle', clean_tex=False)
# Add stuff to the document
with doc.create(Section('A second section')):
doc.append('Some text.')
doc.generate_pdf('basic_maketitle2', clean_tex=False)
tex = doc.dumps() # The document as string in LaTeX syntax
It's particularly useful for generating automatic reports or slides.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to parse answers with deepl api? - python

Related

How to make os.system read all the statements

parsing API with Python - how to handle JSON with BOM

Why is is this "Invalid syntax" in Python?

How to detect the language type of a given text via Python? [duplicate]

Generating pdf-latex with python script

Categories

Resources