Save unicode text from response without encoding into file - python

I want to download config file from my router via web scraping. The procedure I want to achieve is this:
Save the config file into disk
Send a factory reset
Load the config file previously downloaded.
So far, I have this code:
with requests.Session() as s: # To login into the modem
pagePostBackUp = 'https://192.168.1.1/goform/BackUp'
s.post(urlLogin, data=loginCredentials, verify=False, timeout=5)
dataBackUp = {'dir': 'admin/','file': 'cmconfig.cfg'}
resultBackUp = s.post(pagePostBackUp, data=dataBackUp, verify=False, timeout=10)
print(resultBackUp.text)
The last line is what I want to save into a file. But, when I try to do it with this code:
f = open('/Users/user/Desktop/file.cfg', 'w')
Throws an error that ascii codec can't encode character. If I save the file with, for example, encode='utf16', differs from what I originally download manually.
So, the question is, How can I save this file with the same encoding the router gives me via web? (As unicode). The content of the file looks like this:
�����g���m��� ������Z������ofpqJ
U\V,.o/����zf��v���~W3=,�D};y�tL�cJ

Change the last line of your code to the following:
with open('/Users/user/Desktop/file.cfg', 'wb') as f:
f.write(resultBackUp.content)
This will treat the payload as data (bytes), not text: the file is opened in binary mode, and the content is taken as-is.
There's no encoding/decoding happening.

Related

Search for a word in webpage and save to TXT in Python

I am trying to: Load links from a .txt file, search for a specific Word, and if the word exists on that webpage, save the link to another .txt file but i am getting error: No scheme supplied. Perhaps you meant http://<_io.TextIOWrapper name='import.txt' mode='r' encoding='cp1250'>?
Note: the links has HTTPS://
The code:
import requests
list_of_pages = open('import.txt', 'r+')
save = open('output.txt', 'a+')
word = "Word"
save.truncate(0)
for page_link in list_of_pages:
res = requests.get(list_of_pages)
if word in res.text:
response = requests.request("POST", url)
save.write(str(response) + "\n")
Can anyone explain why ? thank you in advance !
Try putting http:// behind the links.
When you use res = requests.get(list_of_pages) you're creating HTTP connection to list_of_pages. But requests.get takes URL string as a parameter (e.g. http://localhost:8080/static/image01.jpg), and look what list_of_pages is - it's an already opened file. Not a string. You have to either use requests library, or file IO API, not both.
If you have an already opened file, you don't need to create HTTP request at all. You don't need this request.get(). Parse list_of_pages like a normal, local file.
Or, if you would like to go the other way, don't open this text file in list_of_arguments, make it a string with URL of that file.

Generate file from Base64 encoded django file read

During a file upload, i decided to read the file and save as base64 until s3 becomes available to our team. I use the code below to convert the file to bs64.
def upload_file_handler(file):
"""file -> uploaded bytestream from django"""
bs4 = base64.b64encode(file.read())
return {'binary': bs4, 'name': file.name}
I store the binary derived from the above in a str to a db. Now the challenge is getting the file back and uploading to s3.
I attempted to run bs64.decode on the file string from the db and write to a file. But when i open the file, it seems broken, I've attempted with breakthrough.
q = Report.objects.first()
data = q.report_binary
f = base64.b64decode(data)
content_file = ContentFile(f, name="hello.docx")
instance = TemporaryFile(image=content_file)
instance.save()
This is one of the files i am trying to recreate from the binary.
https://gist.github.com/saviour123/38300b3ff2c7a0d1a01c15332c583e20
How can i generate the file from the base64 binary?

Cannot save Excel response from API

I am doing a POST request to an API that returns an Excel file.
When I try the process without Python - in Postman - it works just fine : I see the garbled output, but if I click on Save response and Save to a file, it saves the file as an xlsx file that I can open just fine:
When I try to do the same in Python, I can also print the (garbled) response, but I do not manage to save the file as something that I can open.
First part of code (runs without issue):
import requests
for i in range (1,3):
url = "myurl"
payload={}
headers = {}
response = requests.request("POST", url, headers=headers, data=payload)
And now for the crucial part of the code.
If I do A:
with open('C:\\Users\\mypath\\exportdata.xlsx', "w") as o:
o.write(response.text)
print(response.text)
...then I get this error when I run the code:
File "C:\Users\Username\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 10-11: character maps to <undefined>
If I do B:
with open('C:\\Users\\mypath\\exportdata.xlsx', "w", encoding="utf-8") as o:
o.write(response.text)
print(response.text)
...then the code runs without error, but I get an extension/format error in excel when I open the file.
How do I save the excel file with python so that I can open and view it correctly after?
This is not a standard text/csv to excel conversion issue, you can see from the garbled output that all the XML hallmarks of an excel file are there.
Excel isn't Text. Excel is binary. Try response.content:
with open(filename, "wb") as o:
o.write(response.content)

File corrupted when using send_file() from flask, data from pymongo gridfs

Well my English is not good, and the title may looks weird.
Anyway, I'm now using flask to build a website that can store files, and mongodb is the database.
The file upload, document insert functions have no problems, the weird thing is that the file sent from flask send_file() was truncated for no reasons. Here's my code
from flask import ..., send_file, ...
import pymongo
import gridfs
#...
#app.route("/record/download/<record_id>")
def api_softwares_record_download(record_id):
try:
#...
file = files_gridfs.find_one({"_id": record_id})
file_ext = filetype.guess_extension(file.read(2048))
filename = "{}-{}{}".format(
app["name"],
record["version"],
".{}".format(file_ext) if file_ext else "",
)
response = send_file(file, as_attachment=True, attachment_filename=filename)
return response
except ...
The original image file, for example, is 553KB. But the response body returns 549.61KB, and the image was broken. But if I just directly write the file to my disk
#...
with open('test.png', 'wb+') as file:
file.write(files_gridfs.find_one({"_id": record_id}).read())
The image file size is 553KB and the image is readable.
When I compare the two files with VS Code's text editor, I found that the correct file starts with �PNG, but the corrupted file starts with �ϟ8���>�L�y
search the corrupted file head in the correct file
And I tried to use Blob object and download it from the browser. No difference.
Is there any wrong with my code or I misused send_file()? Or should I use flask_pymongo?
And it's interesting that I have found what is wrong with my code.
This is how I solved it
...file.read(2048)
file.seek(0)
...
file.read(2048)
file.seek(0)
...
response = send_file(file, ...)
return response
And here's why:
For some reasons, I use filetype to detect the file's extension name and mime type, so I sent 2048B to filetype for detection.
file_ext = filetype.guess_extension(file.read(2048))
file_mime = filetype.guess_mime(file.read(2048)) #this line wasn't copied in my question. My fault.
And I have just learned from the pymongo API that python (or pymongo or gridfs, completely unknown to this before) reads file by using a cursor. When I try to find the cursor's position using file.seek(), it returns 4096. So when I call file.read() again in send_file(), the cursor reads from 4096B away to the file head. 549+4=553, and here's the problem.
Finally I set the cursor to position 0 after every read() operation, and it returns the correct file.
Hope this can help if you made the same mistake just like me.

Python: Serving files, All carriage returns lost in text file

I'm using the method described in the link https://stackoverflow.com/a/8601118/2497977
import os
import mimetypes
from django.core.servers.basehttp import FileWrapper
def download_file(request):
the_file = '/some/file/name.png'
filename = os.path.basename(the_file)
response = HttpResponse(FileWrapper(open(the_file)),
content_type=mimetypes.guess_type(the_file)[0])
response['Content-Length'] = os.path.getsize(the_file)
response['Content-Disposition'] = "attachment; filename=%s" % filename
return response
Initially get data in a form, when submitted, i process the data to generate a "config" and write it out to a file. then when valid, pass the file back to the user as a download.
It works great except I'm running into the problem that in my situation the file is text, so when the file is downloaded, its coming as a blob of text without CR/LF.
Any suggestions on how to address this?
Open with binary mode.
open(the_file, 'rb')
http://docs.python.org/2/library/functions.html#open
The default is to use text mode, which may convert '\n' characters to
a platform-specific representation on writing and back on reading.
Thus, when opening a binary file, you should append 'b' to the mode
value to open the file in binary mode, which will improve portability.
(Appending 'b' is useful even on systems that don’t treat binary and
text files differently, where it serves as documentation.)

Categories

Resources