I'm using Tornado Webserver and the jQuery Webcam Plugin.
Everything is going fine except that I don't think i'm getting the raw data properly. I'm getting "FFD8FFE000104A46494600010100000100010000FFDB0084000503040404030504040405050506070C08070707070F0B0B090C110F1212110F111113161C1713141A1511111821181A1D1D1F1F1F13172224221E241C1E1F1E010505050706070E08080E1E1411141E1E1E1E1E1E1E1E1E1E1E1E1E1E1E1E1E1E1E1E1E1E1E1E1E1E1E1E1E1E1E1E1E1E1" for my data.
frontend:
$("#camera").webcam({width: 320,
height: 240,
mode: "save",
swffile: "/static/js/jscam.swf",
onTick: function() {
alert('OnTick');},
onCapture: function() {
webcam.capture();
var x = webcam.save('/saveimage');
},
onDebug: function(type, string) {
alert('error');
alert(type + ": " + string);},
});
backend:
filecontent = self.request.body
f = open('static/studentphotos/'+ filename +'.jpg','w')
f.write(filecontent)
f.close()"
Using your data as x, notice the JFIF in the output from unhexlify:
In [88]: binascii.unhexlify(x[:-1])
Out[88]: '\xff\xd8\xff\xe0\x00\x10JFIF...'
So it appears the data is a JPEG that needs to be unhexlified. Therefore try:
import binascii
filecontent = self.request.body
with open('static/studentphotos/'+ filename +'.jpg','w') as f:
f.write(binascii.unhexlify(filecontent))
Related
I have html document embeded with pdf document in base64 encoded format. I like to extract the string and save it as pdf file. using below code to save it as pdf file.
but its on opening in adobe reader, saying invalid format. looking to fix this issue.
I think pdf file encoded using Javascript encodeURIComponent function. need to convert using Python.
sample embed tag
<embed type="application/pdf" src="data:application/pdf;base64,JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC9D">
Code
import base64
def decode_b64():
b64 = "JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC9D"
buffer = BytesIO.BytesIO()
content = base64.b64decode(b64)
buffer.write(content)
with open(Path(Path.home(), 'Downloads', 'mytest.pdf'), "wb") as f:
f.write(buffer.getvalue())
if __name__ == "__main__":
decode_b64()
=== Update 1:
found the way to convert using JavaScript: It will be nice if we can port this code to Python.
const {readFileSync, writeFile, promises: fsPromises} = require('fs');
var data=readFileSync("pdf-file.html", 'utf-8')
var DOMParser = require('xmldom').DOMParser;
var parser = new DOMParser();
const virtualDoc = parser.parseFromString(data, 'text/html');
var elem = virtualDoc.getElementsByTagName('embed')[0];
for (var i = 0; i < elem.attributes.length; i++) {
var attrib = elem.attributes[i];
if (attrib.specified) {
if( attrib.name == "src") {
var result =attrib.value
result=result.replace('data:application/pdf;base64,','');
let buff = Buffer.from(decodeURIComponent(result), 'base64');
writeFile('pdf-file.pdf', buff, err => {
if (err) {
console.error(err);
}
});
}
}
}
This is a situation that you should have been able to chase down yourself. I wasn't 100% sure how Javascript encoded those two characters, so I wrote up a simple HTML page:
<script>
var s = "abcde++defgh//";
alert(encodeURIComponent(s));
</script>
When I ran that page, the result was "abcde%2B%2Bdefgh%2F%2F", and that is all the information you need to fix up those strings.
def decode_b64():
b64 = "JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC9D......"
b64 = b64.replace('%2B','+').replace('%2F','/')
content = base64.b64decode(b64)
with open(Path(Path.home(), 'Downloads', 'mytest.pdf'), "wb") as f:
f.write(content)
The problem is I'm unable to access the information from config.json file to my python file
I have provided the JSON data and python code bellow
I have tried everything in the request module
but I can access the response without the config file but,
I need with config file
The following is a json file
{
"api_data": {
"request_url": "https://newapi.zivame.com/api/v1/catalog/list",
"post_data" : {"category_ids" : "948",
"limit" : "10000"},
"my_headers":{"Content-Type": "application/json"}
},
"redshift":{
"host":"XXX.XXXX.XXX",
"user":"XXXX",
"password":"XXXXXXXX",
"port": 8080,
"db":"XXXX"
},
"s3":{
"access_key":"XXXXXXXXX",
"secret_key":"XXXXXXXXXX",
"region":"XX-XXXXX-1",
"path":"XXXXXXXXXXXX/XXX",
"table":"XXXXXX",
"bucket":"XXXX",
"file": "XXXXXX",
"copy_column": "XXX",
"local_path": "XXXXX"
},
"csv_file": {
"promo_zivame": ""
}
}
and this is the program
#!/usr/bin/python
import json
import psycopg2
import requests
import os
BASE_PATH = os.path.dirname(os.path.realpath(__file__))
with open(BASE_PATH+'/config.json') as json_data_file:
data = json.load(json_data_file)
#api_config = data['api_data']
#redshift = data['redshift']
s3_config = data['s3']
#x = print(api_config.get('request_url'))
class ApiResponse:
#api response
def api_data(self, api_config):
print("starting api_data")
try:
self.ApiResponse = requests.post(api_config['request_url'], api_config['post_data'], api_config['my_headers'])
data_1 = self.ApiResponse
#data = json.dump(self.ApiResponse)
print("API Result Response")
print(())
print(self.ApiResponse)
return (self.ApiResponse)
except Exception:
print("response not found")
return False
def redshift_connect(self, redshift):
try:
# Amazon Redshift connect string
self.con = psycopg2.connect(
host=redshift['host'],
user=redshift['user'],
port=redshift['port'],
password=redshift['password'],
dbname=redshift['db'])
print(self.con)
return self.con
except Exception:
print("Error in Redshift connection")
return False
def main():
c1 = ApiResponse()
api_config = data['api_data']
redshift = data['redshift']
c1.api_data(api_config)
c1.api_data(data)
c1.redshift_connect(redshift)
if __name__=='__main__':
main()
Third argument to requests.post() is json. To provide headers, you need to use the name of the argument explicitly as #JustinEzequiel suggested. See the requests doc here: 2.python-requests.org/en/v1.1.0/user/quickstart/#custom-headers
requests.post(api_config['request_url'], json=api_config['post_data'], headers=api_config['my_headers'])
Borrowing code from https://stackoverflow.com/a/16696317/5386938
import requests
api_config = {
"request_url": "https://newapi.zivame.com/api/v1/catalog/list",
"post_data" : {"category_ids" : "948", "limit" : "10000"},
"my_headers":{"Content-Type": "application/json"}
}
local_filename = 'the_response.json'
with requests.post(api_config['request_url'], json=api_config['post_data'], headers=api_config['my_headers'], stream=True) as r:
r.raise_for_status()
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
saves the response into a file ('the_response.json') you can then pass around. Note the stream=True passed to requests.post
I have a django app, I received binary audio data from a javascript client, and I'm trying to send it to the google cloud speech to text API. The problem is that, python is not writing the binary audio data to a file. So I'm getting
with io.open(file_name, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: '.........\\gcp_cloud\\blog\\audio_file.wav'
I replaced the first part of the path with ...........
Here is the client side code
rec.ondataavailable = e => {
audioChunks.push(e.data);
if (rec.state == "inactive"){
let blob = new Blob(audioChunks,{type:'audio/wav; codecs=MS_PCM'});
recordedAudio.src = URL.createObjectURL(blob);
recordedAudio.controls=true;
recordedAudio.autoplay=true;
sendData(blob)
}
}
and here is my sendData function
function sendData(data) {
let csrftoken = getCookie('csrftoken');
let response=fetch("/voice_request", {
method: "post",
body: data,
headers: { "X-CSRFToken": csrftoken },
})
console.log('got a response from the server')
console.log(response)
}
and here is the DJango view that handles the binary audio data from the client
def voice_request(request):
#print(request.body)
fw = open('audio_file.wav', 'wb')
fw.write(request.body)
file_name = os.path.join(current_folder, 'audio_file.wav')
#file_name = os.path.join(current_folder, 'Recording.m4a')
client = speech.SpeechClient()
# The language of the supplied audio
language_code = "en-US"
# Sample rate in Hertz of the audio data sent
sample_rate_hertz = 16000
encoding = enums.RecognitionConfig.AudioEncoding.LINEAR16
config = {
"language_code": language_code,
#"sample_rate_hertz": sample_rate_hertz,
#"encoding": 'm4a',
}
with io.open(file_name, "rb") as f:
content = f.read()
audio = {"content": content}
fw.close()
response = client.recognize(config, audio)
print('response')
print(response)
for result in response.results:
# First alternative is the most probable result
alternative = result.alternatives[0]
print(u"Transcript: {}".format(alternative.transcript))
return HttpResponse(response)
you should use with io.open(file_name, "wb") as f: or with io.open(file_name, "ab") as f:
because r mean read-only, w mean "rewrite"
you can refer: https://docs.python.org/3.7/library/functions.html?highlight=open#open
I've been having some trouble sending files via python's rest module. I can send emails without attachments just fine but as soon as I try and add a files parameter, the call fails and I get a 415 error.
I've looked through the site and found out it was maybe because I wasn't sending the content type of the files when building that array of data so altered it to query the content type with mimetypes; still 415.
This thread: python requests file upload made a couple of more edits but still 415.
The error message says:
"A supported MIME type could not be found that matches the content type of the response. None of the supported type(s)"
Then lists a bunch of json types e.g: "'application/json;odata.metadata=minimal;odata.streaming=true;IEEE754Compatible=false"
then says:
"matches the content type 'multipart/form-data; boundary=0e5485079df745cf0d07777a88aeb8fd'"
Which of course makes me think I'm still not handling the content type correctly somewhere.
Can anyone see where I'm going wrong in my code?
Thanks!
Here's the function:
def send_email(access_token):
import requests
import json
import pandas as pd
import mimetypes
url = "https://outlook.office.com/api/v2.0/me/sendmail"
headers = {
'Authorization': 'Bearer '+access_token,
}
data = {}
data['Message'] = {
'Subject': "Test",
'Body': {
'ContentType': 'Text',
'Content': 'This is a test'
},
'ToRecipients': [
{
'EmailAddress':{
'Address': 'MY TEST EMAIL ADDRESS'
}
}
]
}
data['SaveToSentItems'] = "true"
json_data = json.dumps(data)
#need to convert the above json_data to dict, otherwise it won't work
json_data = json.loads(json_data)
###ATTACHMENT WORK
file_list = ['test_files/test.xlsx', 'test_files/test.docx']
files = {}
pos = 1
for file in file_list:
x = file.split('/') #seperate file name from file path
files['file'+str(pos)] = ( #give the file a unique name
x[1], #actual filename
open(file,'rb'), #open the file
mimetypes.MimeTypes().guess_type(file)[0] #add in the contents type
)
pos += 1 #increase the naming iteration
#print(files)
r = requests.post(url, headers=headers, json=json_data, files=files)
print("")
print(r)
print("")
print(r.text)
I've figured it out! Took a look at the outlook API documentation and realised I should be adding attachments as encoded lists within the message Json, not within the request.post function. Here's my working example:
import requests
import json
import pandas as pd
import mimetypes
import base64
url = "https://outlook.office.com/api/v2.0/me/sendmail"
headers = {
'Authorization': 'Bearer '+access_token,
}
Attachments = []
file_list = ['test_files/image.png', 'test_files/test.xlsx']
for file in file_list:
x = file.split('/') #file the file path so we can get it's na,e
filename = x[1] #get the filename
content = open(file,'rb') #load the content
#encode the file into bytes then turn those bytes into a string
encoded_string = ''
with open(file, "rb") as image_file:
encoded_string = base64.b64encode(image_file.read())
encoded_string = encoded_string.decode("utf-8")
#append the file to the attachments list
Attachments.append({
"#odata.type": "#Microsoft.OutlookServices.FileAttachment",
"Name": filename,
"ContentBytes": encoded_string
})
data = {}
data['Message'] = {
'Subject': "Test",
'Body': {
'ContentType': 'Text',
'Content': 'This is a test'
},
'ToRecipients': [
{
'EmailAddress':{
'Address': 'EMAIL_ADDRESS'
}
}
],
"Attachments": Attachments
}
data['SaveToSentItems'] = "true"
json_data = json.dumps(data)
json_data = json.loads(json_data)
r = requests.post(url, headers=headers, json=json_data)
print(r)
Trying to collect data on book price fluctuations for a school project. I'm using Python to scrape from a book buyback aggregator (in this case, bookscouter), but I find that since the site has to load in the data, grabbing the source code through the urllib2 package gives me the source code from before the data is loaded. How do I pull from after the data is loaded?
Example: http://bookscouter.com/prices.php?isbn=9788498383621&searchbutton=Sell
You cannot this with Python only. You need a JavaScript engine API like PhantomJS
With Phantom, will be very easy to setup the web scraping of all the page contents, static and dynamic JavaScript contents (like Ajax calls results in your case). Infact you can register page event handlers to your page parser like (this is a node.js + phantom.js example)
/*
* Register Page Handlers as functions
{
onLoadStarted : onLoadStarted,
onLoadFinished: onLoadFinished,
onError : onError,
onResourceRequested : onResourceRequested,
onResourceReceived : onResourceReceived,
onNavigationRequested : onNavigationRequested,
onResourceError : onResourceError
}
*/
registerHandlers : function(page, handlers) {
if(handlers.onLoadStarted) page.set('onLoadStarted',handlers.onLoadStarted)
if(handlers.onLoadFinished) page.set('onLoadFinished',handlers.onLoadFinished)
if(handlers.resourceError) page.set('onResourceError', handlers.resourceError)
if(handlers.onResourceRequested) page.set('onResourceRequested',handlers.onResourceRequested)
if(handlers.onResourceReceived) page.set('onResourceReceived',handlers.onResourceReceived)
if(handlers.onNavigationRequested) page.set('onNavigationRequested',handlers.onNavigationRequested)
if(handlers.onError) page.set('onError',handlers.onError)
}
At this point you have full control of what is going on and when in the page you have to download like:
var onResourceError = function(resourceError) {
var errorReason = resourceError.errorString;
var errorPageUrl = resourceError.url;
}
var onResourceRequested = function (request) {
var msg = ' request: ' + JSON.stringify(request, undefined, 4);
};
var onResourceReceived = function(response) {
var msg = ' id: ' + response.id + ', stage: "' + response.stage + '", response: ' + JSON.stringify(response);
};
var onNavigationRequested = function(url, type, willNavigate, main) {
var msg = ' destination_url: ' + url;
msg += ' type (cause): ' + type;
msg += ' will navigate: ' + willNavigate;
msg += ' from page\'s main frame: ' + main;
};
page.onResourceRequested(
function(requestData, request) {
//request.abort()
//request.changeUrl(url)
//request.setHeader(key,value)
var msg = ' request: ' + JSON.stringify(request, undefined, 4);
//console.log( msg )
},
function(requestData) {
//console.log(requestData.url)
})
PageHelper.registerHandlers(page,
{
onLoadStarted : onLoadStarted,
onLoadFinished: onLoadFinished,
onError : null, // onError THIS HANDLER CRASHES PHANTOM-NODE
onResourceRequested : null, // MUST BE ON PAGE OBJECT
onResourceReceived : onResourceReceived,
onNavigationRequested : onNavigationRequested,
onResourceError : onResourceError
});
As you can see you can define you page handlers and take control of the flow and so of the resources loaded on that page. So you can be sure that all data are ready and set, before you take the whole page source like:
var Parser = {
parse : function(page) {
var onSuccess = function (page) { // page loaded
var pageContents=page.evaluate(function() {
return document.body.innerText;
});
}
var onError = function (page,elapsed) { // error
}
page.evaluate(function(func) {
return func(document);
}, function(dom) {
return true;
});
}
} // Parser
Here you can see the whole page contents loaded in the onSuccess callback:
var pageContents=page.evaluate(function() {
return document.body.innerText;
});
The page comes from Phantomjs directly like in the following snippet:
phantom.create(function (ph) {
ph.createPage(function (page) {
Parser.parse(page)
})
},options)
Of course this to give you and idea of what you can do with node.js + Phantomjs, that are super powerful when combined together.
You can run phantomjs in a Python env, calling it like
try:
output = ''
for result in runProcess([self.runProcess,
self.runScript,
self.jobId,
self.protocol,
self.hostname,
self.queryString]):
output += '' + result
print output
except Exception as e:
print e
print(traceback.format_exc())
where you use subprocess Popen to execute the binary:
def runProcess(exe):
p = subprocess.Popen(exe, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
while(True):
retcode = p.poll() #returns None while subprocess is running
line = p.stdout.readline()
yield line
if(retcode is not None):
break
Of course the process to run is node.js in this case
self.runProcess='node'
with the args you need as params.
The challenge is reading the data once its been rendered by a web browser, which will require some extra tricks to do. If you can see if the site has a pre-rendered version* or an API.
This article (linked from the Web archive) has a pretty good breakdown of what you'll need to do. It can be summed up however as:
Pick a good python-webkit renderer (in the case of the article PyQT)
Use a windowing widget to fetch and render the page
Fetch the rendered HTML from the widget
Parse this HTML as normal using a library like lXML or BeautifulSoup.
* Minor rant - the idea of having to hope for a pre-rendered version ofwhat should be a static webpage angers me.