I'm trying to use the Google Speech API in Python. I load a .flac file like this:
url = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US"
audio = open('temp_voice.flac','rb').read()
headers = {'Content-Type': 'audio/x-flac; rate=44100', 'User-Agent':'Mozilla/5.0'}
req = urllib2.Request(url, data=audio, headers=headers)
resp = urllib2.urlopen(req)
system("rm temp_voice.wav; rm temp_voice.flac")
print resp.read()
Output:
{"status":0,"id":"","hypotheses":[{"utterance":"Today is Wednesday","confidence":0.75135982}]}
Can someone please teach me how I can extract and save the text "Today is Wednesday" as a variable and print it?
You can use json.loads to convert the JSON data to a dict, like this
data = '{"status":0,"id":"","hypotheses":[{"utterance":"Today is Wednesday","confidence":0.75135982}]}'
import json
data = json.loads(data)
print data["hypotheses"][0]["utterance"]
If the response is coming as a string then you can just eval it to a dictionary, (for safety it is preferable to use literal_eval from the ast library instead):
>>> d=eval('{"status":0,"id":"","hypotheses":[{"utterance":"Today is Wednesday","confidence":0.75135982}]}')
>>> d
{'status': 0, 'hypotheses': [{'confidence': 0.75135982, 'utterance': 'Today is Wednesday'}], 'id': ''}
>>> h=d.get('hypotheses')
>>> h
[{'confidence': 0.75135982, 'utterance': 'Today is Wednesday'}]
>>> for i in h:
... print i.get('utterance')
...
Today is Wednesday
Of course if it is already a dictionary then you do not need to do the evaluate, try using print type(response) where response is the result you are getting.
The problem with retrieve output is bit more complicate that looks. At first resp is type of instance, however if you copy the output manually is dictionary->list->dictionary. If you assign the resp.read() to new variable you will get type string with length 0. It happens, because the all output vanish into air once is used (print). Therefore the json decoding has to be done as soon the respond from google api is granted. As follow:
resp = urllib2.urlopen(req)
text = json.loads(resp.read())["hypotheses"][0]["utterance"]
Works like a charm in my case ;)
Related
I have been trying to figure out how to use python-requests to send a request that the url looks like:
http://example.com/api/add.json?name='hello'&data[]='hello'&data[]='world'
Normally I can build a dictionary and do:
data = {'name': 'hello', 'data': 'world'}
response = requests.get('http://example.com/api/add.json', params=data)
That works fine for most everything that I do. However, I have hit the url structure from above, and I am not sure how to do that in python without manually building strings. I can do that, but would rather not.
Is there something in the requests library I am missing or some python feature I am unaware of?
Also what do you even call that type of parameter so I can better google it?
All you need to do is putting it on a list and making the key as list like string:
data = {'name': 'hello', 'data[]': ['hello', 'world']}
response = requests.get('http://example.com/api/add.json', params=data)
What u are doing is correct only. The resultant url is same what u are expecting.
>>> payload = {'name': 'hello', 'data': 'hello'}
>>> r = requests.get("http://example.com/api/params", params=payload)
u can see the resultant url:
>>> print(r.url)
http://example.com/api/params?name=hello&data=hello
According to url format:
In particular, encoding the query string uses the following rules:
Letters (A–Z and a–z), numbers (0–9) and the characters .,-,~ and _ are left as-is
SPACE is encoded as + or %20
All other characters are encoded as %HH hex representation with any non-ASCII characters first encoded as UTF-8 (or other specified encoding)
So array[] will not be as expected and will be automatically replaced according to the rules:
If you build a url like :
`Build URL: http://example.com/api/add.json?name='hello'&data[]='hello'&data[]='world'`
OutPut will be:
>>> payload = {'name': 'hello', "data[]": 'hello','data[]':'world'}
>>> r = requests.get("http://example.com/api/params", params=payload)
>>> r.url
u'http://example.com/api/params?data%5B%5D=world&name=hello'
This is because Duplication will be replaced by the last value of the key in url and data[] will be replaced by data%5B%5D.
If data%5B%5D is not the problem(If server is able to parse it correctly),then u can go ahead with it.
Source Link
One solution if using the requests module is not compulsory, is using the urllib/urllib2 combination:
payload = [('name', 'hello'), ('data[]', ('hello', 'world'))]
params = urllib.urlencode(payload, doseq=True)
sampleRequest = urllib2.Request('http://example.com/api/add.json?' + params)
response = urllib2.urlopen(sampleRequest)
Its a little more verbose and uses the doseq(uence) trick to encode the url parameters but I had used it when I did not know about the requests module.
For the requests module the answer provided by #Tomer should work.
Some api-servers expect json-array as value in the url query string. The requests params doesn't create json array as value for parameters.
The way I fixed this on a similar problem was to use urllib.parse.urlencode to encode the query string, add it to the url and pass it to requests
e.g.
from urllib.parse import urlencode
query_str = urlencode(params)
url = "?" + query_str
response = requests.get(url, params={}, headers=headers)
The solution is simply using the famous function: urlencode
>>> import urllib.parse
>>> params = {'q': 'Python URL encoding', 'as_sitesearch': 'www.urlencoder.io'}
>>> urllib.parse.urlencode(params)
'q=Python+URL+encoding&as_sitesearch=www.urlencoder.io'
Here, i am trying to do that, i have a data from ngrok tunnel (http://127.0.0.1//api/tunnels) and i want to print only 'public_url' : 'https://.....ngrok.io' which i have collected from that site, that data looks like this
{'tunnels': [{'name': 'command_line', 'uri': '/api/tunnels/command_line', 'public_url': 'https://a28e4c77.ngrok.io', 'proto': 'https', 'config': {'addr': 'http://localhost:80', 'inspect': True}....Something more
This is the part of that data.
I have use this code to collect that data.
import requests
url = "http://127.0.0.1:4040/api/tunnels"
r = requests.get(url)
data = r.json()
I have also save this into a ngrok.txt but i have absolutely no idea to find...To write this data i use this code : -
import requests
url = "http://127.0.0.1:4040/api/tunnels"
r = requests.get(url)
data = r.json()
f = open('ngrok.txt', 'w')
f.write(data)
f.close()
You need to convert your json string to a json object. You can do it with function loads() from json library.
Here the code for your example:
import json
json.loads(data)["tunnels"][0]["public_url"]
json.loads(data) converts a string to a json object
["tunnels"] gets the object associated with the name "tunnels"
The resulting object is a list, indeed you need to get the first element with [0]
Finally you get "public_url"
I am unable to parse the JSON data using python.
A webpage url is returning JSON Data
import requests
import json
BASE_URL = "https://www.codechef.com/api/ratings/all"
data = {'page': page, 'sortBy':'global_rank', 'order':'asc', 'itemsPerPage':'40' }
r = requests.get(BASE_URL, data = data)
receivedData = (r.text)
print ((receivedData))
when I printed this, I got large text and when I validated using https://jsonlint.com/ it showed VALID JSON
Later I used
import requests
import json
BASE_URL = "https://www.codechef.com/api/ratings/all"
data = {'page': page, 'sortBy':'global_rank', 'order':'asc', 'itemsPerPage':'40' }
r = requests.get(BASE_URL, data = data)
receivedData = (r.text)
print (json.loads(receivedData))
When I validated the large printed text using https://jsonlint.com/ it showed INVALID JSON
Even if I don't print and directly use the data. It is working properly. So I am sure even internally it is not loading correctly.
is python unable to parse the text to JSON properly?
in short, json.loads converts from a Json (thing, objcet, array, whatever) into a Python object - in this case, a Json Dictionary. When you print that, it will print as a itterative and therefore print with single quotes..
Effectively your code can be expanded:
some_dictionary = json.loads(a_string_which_is_a_json_object)
print(some_dictionary)
to make sure that you're printing json-safe, you would need to re-encode with json.dumps
When you use python's json.loads(text) it returns a python dictionary. When you print that dictionary out it is not in json format.
If you want a json output you should use json.dumps(json_object).
I was wondering how to use the requests library to pull the text from a field in a Json? I wouldn't need beautiful soup for that right?
If your response is indeed a json format, you can simply use requests .json() to access the fields, example like this:
import requests
url = 'http://time.jsontest.com/'
r = requests.get(url)
# use .json() for json response data
r.json()
{u'date': u'03-28-2015',
u'milliseconds_since_epoch': 1427574682933,
u'time': u'08:31:22 PM'}
# to access the field
r.json()['date']
u'03-28-2015'
This will automatically parse the json response into Python's dictionary:
type(r.json())
dict
You can read more about response.json here.
Alternatively just use Python's json module:
import json
d = json.loads(r.content)
print d['date']
03-28-2015
type(d)
dict
What version of Python are you using ? From 2.6 onwards you can do this:
import json
json_data=open(file_directory).read()
data = json.loads(json_data)
print(data)
When i sending some data on host:
r = urllib2.Request(url, data = data, headers = headers)
page = urllib2.urlopen(r)
soup = BeautifulSoup(page.read(), fromEncoding="cp-1251")
print page.read()
i have something like this:
[{"command":"settings","settings":{"basePath":"\/","ajaxPageState":{"theme":"spsr","theme_token":"kRHUhchUVpxAMYL8Y8IoyYIcX0cPrUstziAi8gSmMYk","css":[]},"ajax":{"edit-submit":{"callback":"spsr_calculator_form_ajax","wrapper":"calculator_form","method":"replaceWith","event":"mousedown","keypress":true,"url":"\/ru\/system\/ajax","submit":{"_triggering_element_name":"submit"}}}},"merge":true},{"command":"insert","method":null,"selector":null,"data":"\u003cdiv id=\"calculator_form\"\u003e\u003cform action=\"\/ru\/service\/calculator\" method=\"post\" id=\"spsr-calculator-form\" accept-charset=\"UTF-8\"\u003e\u003cdiv\u003e\u003cinput id=\"edit-from-ship-region-id\" type=\"hidden\" name=\"from_ship_region_id\" value=\"\" \/\u003e\n\u003cinput type=\"hidden\" name=\"form_build_id\" value=\"form-0RK_WFli4b2kUDTxpoqsGPp14B_0yf6Fz9x7UK-T3w8\" \/\u003e\n\u003cinput type=\"hidden\" name=\"form_id\" value=\"spsr_calculator_form\" \/\u003e\n\u003c\/div\u003e\n\u003cdiv class=\"bg_p\"\u003e \n\u0421\u0435\u0439\u0447\u0430\u0441 \u0412\u044b... bla bla bla
but i want have something, like this:
<html><h1>bla bla bla</h1></html>
How can i do it?
The answer you are getting is very likely encoded in JSON. If this is true then using BeautifulSoup doesn't make any sense (it is a HTML/XML parser). If you have JSON data you will need to use a JSON parser. Calling page.read() twice doesn't make any sense either since it won't return you anything sane after the first call.
Rewriting your request part we get:
r = urllib2.Request(url, data = data, headers = headers)
page = urllib2.urlopen(r)
data = page.read()
Now instead of an HTML parser, we need to use a JSON parser. This can be done with json library (in Python since 2.6):
import json
decoded_data = json.loads(data)
Now, just locate which part of the model you want to extract. Considering your example and give you want to print out the section with "blabla", you can write:
result = unicode(decoded_data[1][u'data'])
For debugging try:
print result