I am doing a POST request to an API that returns an Excel file.
When I try the process without Python - in Postman - it works just fine : I see the garbled output, but if I click on Save response and Save to a file, it saves the file as an xlsx file that I can open just fine:
When I try to do the same in Python, I can also print the (garbled) response, but I do not manage to save the file as something that I can open.
First part of code (runs without issue):
import requests
for i in range (1,3):
url = "myurl"
payload={}
headers = {}
response = requests.request("POST", url, headers=headers, data=payload)
And now for the crucial part of the code.
If I do A:
with open('C:\\Users\\mypath\\exportdata.xlsx', "w") as o:
o.write(response.text)
print(response.text)
...then I get this error when I run the code:
File "C:\Users\Username\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 10-11: character maps to <undefined>
If I do B:
with open('C:\\Users\\mypath\\exportdata.xlsx', "w", encoding="utf-8") as o:
o.write(response.text)
print(response.text)
...then the code runs without error, but I get an extension/format error in excel when I open the file.
How do I save the excel file with python so that I can open and view it correctly after?
This is not a standard text/csv to excel conversion issue, you can see from the garbled output that all the XML hallmarks of an excel file are there.
Excel isn't Text. Excel is binary. Try response.content:
with open(filename, "wb") as o:
o.write(response.content)
Related
my jupyter notebook is saving a dataframe(having styles) to an excel file. then I have created a link to download this excel file:
df=df.to_excel('ABC.xlsx', index=True)
filename ='ABC.xlsx'
file_link = "<a href='{href}' download='ABC.xlsx'> Download ABC.xlsx</a>"
html = HTML(file_link.format(href=filename))
dispaly(html)
but when i click on link-Download ABC.xlsx, I am getting- Failed: Network error.
On the contrary it is working fine when i am downloading CSV file the same way
Adding csv code, there is some base64 encoding added in csv code without which csv code is also not working:
def func(df,title="Download csv file",filename="ABC.csv"):
csv=df.to_csv(index=True)
b64 =base64.b64encode(csv.encode())
payload=b64.decode()
html = "{title}"
html = html.format(payload=payload,title=title,filename=filename)
return HTML(html)
i tried editing this function for excel file:
def func(df,title="Download excel file",filename="ABC.xlsx"):
xls=df.to_excel("xyz.xlsx",index=True)
b64 =base64.b64encode(xls.encode())
payload=b64.decode()
html = "{title}"
html = html.format(payload=payload,title=title,filename=filename)
return HTML(html)
for excel code it giving error: 'NoneType' object has no attribute 'encode'
In you csv code, you use csv=df.to_csv(index=True), according to docs
If path_or_buf is None, returns the resulting csv format as a string.
Otherwise returns None.
here you didn't specify path_or_buf, so return value is csv content. this is why you can download csv.
Now to_excel doc desn't say it has any return value. so your payload don't contain anything at all.
To solve, you can manually open file again and read as base64 format string:
def file_to_base64(file):
#file should be the actual file name you wrote
with open(file, "rb") as image_file:
encoded_string = base64.b64encode(image_file.read())
return encoded_string.decode()
replace the two lines
b64 =base64.b64encode(xls.encode())
payload=b64.decode()
with:
payload = file_to_base64(file)
I am trying to use an ocr API with python to convert pdf to text. The API i'm using is : https://www.convertapi.com/pdf-to-txt . When i upload the file through the website it works perfectly but the API call has the following issue:
Python code:
import requests
url ='https://v2.convertapi.com/convert/pdf/to/txt?Secret=mykey'
files = {'file': open('C:\<some_url>\filename.pdf', 'rb')}
r = requests.post(url, files=files)
The API call works fine, but it when i try to access the response through
r.text
it returns giberish: (Notice the FileData section)
'{"ConversionCost":4,"Files":[{"FileName":"stateoftheartKWextraction.txt","FileExt":"txt","FileSize":60179,"FileData":"QXV0b21hdGljIEtleXBocmFzZSBFeHRyYWN0aW9uOiBBIFN1cnZleSBvZiB0aGUgU3RhdGUgb2YgdGhlIEFydA0KDQpLYXppIFNhaWR1bCBIYXNhbiAgYW5kICBWaW5jZW50IE5nDQpIdW1hbiBMYW5ndWFnZSBUZWNobm9sb2d5IFJlc2VhcmNoIEluc3RpdHV0ZSBVbml2ZXJzaXR5IG9mIFRleGFzIGF0IERhbGxhcyBSaWNoYXJkc29uLCBUWCA3NTA4My0wNjg4DQp7c2FpZHVsLHZpbmNlfUBobHQudXRkYWxsYXMuZW...
Even if i use json load to convert it into a dict, it still prints the text in giberish.
I've tried to upload the file as not binary but that doesn't work(it throws an exception).
I've tried many pdf files and they all were in english.
Thank you.
The text is decoded, so you need to decode it. Let's take the first file as an example.
import base64
r = r.json()
text = r['Files'][0]['FileData']
print(base64.b64decode(text))
By the way, they seem to have a Python library as well, you might want to check that out: https://github.com/ConvertAPI/convertapi-python
I want to download config file from my router via web scraping. The procedure I want to achieve is this:
Save the config file into disk
Send a factory reset
Load the config file previously downloaded.
So far, I have this code:
with requests.Session() as s: # To login into the modem
pagePostBackUp = 'https://192.168.1.1/goform/BackUp'
s.post(urlLogin, data=loginCredentials, verify=False, timeout=5)
dataBackUp = {'dir': 'admin/','file': 'cmconfig.cfg'}
resultBackUp = s.post(pagePostBackUp, data=dataBackUp, verify=False, timeout=10)
print(resultBackUp.text)
The last line is what I want to save into a file. But, when I try to do it with this code:
f = open('/Users/user/Desktop/file.cfg', 'w')
Throws an error that ascii codec can't encode character. If I save the file with, for example, encode='utf16', differs from what I originally download manually.
So, the question is, How can I save this file with the same encoding the router gives me via web? (As unicode). The content of the file looks like this:
�����g���m��� ������Z������ofpqJ
U\V,.o/����zf��v���~W3=,�D};y�tL�cJ
Change the last line of your code to the following:
with open('/Users/user/Desktop/file.cfg', 'wb') as f:
f.write(resultBackUp.content)
This will treat the payload as data (bytes), not text: the file is opened in binary mode, and the content is taken as-is.
There's no encoding/decoding happening.
When I hit post API, it is returning a zip file content as an output (which is in unicode form) and I want to save those content in zipfile locally.
How can I save the same?
Trials :
Try 1:
`//variable data containing API response. (i.e data = response.text)
f = open('test.zip', 'wb')
f.write(data.encode('utf8'))
f.close()`
Above code creating zip file. But the file is corrupted one.
Try 2
with zipfile.ZipFile('spam.zip', 'w') as myzip:
myzip.write(data.decode("utf8"))
Above code giving me an error: UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 97: ordinal not in range(128)
Can anyone help me to resolve the same?
I found the answer for above problem. May be someone in future wants the same. So writing answer for my own question.
response.content instead of response.text resolved my problem.
import requests
response = requests.request("POST", <<url>>, <<payload>>, <<headers>>, verify=False)
data = response.content
f = open('test.zip', 'w')
f.write(data)
f.close()
I'm using Requests to upload a PDF to an API. It is stored as "response" below. I'm trying to write that out to Excel.
import requests
files = {'f': ('1.pdf', open('1.pdf', 'rb'))}
response = requests.post("https://pdftables.com/api?&format=xlsx-single",files=files)
response.raise_for_status() # ensure we notice bad responses
file = open("out.xls", "w")
file.write(response)
file.close()
I'm getting the error:
file.write(response)
TypeError: expected a character buffer object
I believe all the existing answers contain the relevant information, but I would like to summarize.
The response object that is returned by requests get and post operations contains two useful attributes:
Response attributes
response.text - Contains str with the response text.
response.content - Contains bytes with the raw response content.
You should choose one or other of these attributes depending on the type of response you expect.
For text-based responses (html, json, yaml, etc) you would use response.text
For binary-based responses (jpg, png, zip, xls, etc) you would use response.content.
Writing response to file
When writing responses to file you need to use the open function with the appropriate file write mode.
For text responses you need to use "w" - plain write mode.
For binary responses you need to use "wb" - binary write mode.
Examples
Text request and save
# Request the HTML for this web page:
response = requests.get("https://stackoverflow.com/questions/31126596/saving-response-from-requests-to-file")
with open("response.txt", "w") as f:
f.write(response.text)
Binary request and save
# Request the profile picture of the OP:
response = requests.get("https://i.stack.imgur.com/iysmF.jpg?s=32&g=1")
with open("response.jpg", "wb") as f:
f.write(response.content)
Answering the original question
The original code should work by using wb and response.content:
import requests
files = {'f': ('1.pdf', open('1.pdf', 'rb'))}
response = requests.post("https://pdftables.com/api?&format=xlsx-single",files=files)
response.raise_for_status() # ensure we notice bad responses
file = open("out.xls", "wb")
file.write(response.content)
file.close()
But I would go further and use the with context manager for open.
import requests
with open('1.pdf', 'rb') as file:
files = {'f': ('1.pdf', file)}
response = requests.post("https://pdftables.com/api?&format=xlsx-single",files=files)
response.raise_for_status() # ensure we notice bad responses
with open("out.xls", "wb") as file:
file.write(response.content)
You can use the response.text to write to a file:
import requests
files = {'f': ('1.pdf', open('1.pdf', 'rb'))}
response = requests.post("https://pdftables.com/api?&format=xlsx-single",files=files)
response.raise_for_status() # ensure we notice bad responses
with open("resp_text.txt", "w") as file:
file.write(response.text)
As Peter already pointed out:
In [1]: import requests
In [2]: r = requests.get('https://api.github.com/events')
In [3]: type(r)
Out[3]: requests.models.Response
In [4]: type(r.content)
Out[4]: str
You may also want to check r.text.
Also: https://2.python-requests.org/en/latest/user/quickstart/