Generate file from Base64 encoded django file read - python

During a file upload, i decided to read the file and save as base64 until s3 becomes available to our team. I use the code below to convert the file to bs64.
def upload_file_handler(file):
"""file -> uploaded bytestream from django"""
bs4 = base64.b64encode(file.read())
return {'binary': bs4, 'name': file.name}
I store the binary derived from the above in a str to a db. Now the challenge is getting the file back and uploading to s3.
I attempted to run bs64.decode on the file string from the db and write to a file. But when i open the file, it seems broken, I've attempted with breakthrough.
q = Report.objects.first()
data = q.report_binary
f = base64.b64decode(data)
content_file = ContentFile(f, name="hello.docx")
instance = TemporaryFile(image=content_file)
instance.save()
This is one of the files i am trying to recreate from the binary.
https://gist.github.com/saviour123/38300b3ff2c7a0d1a01c15332c583e20
How can i generate the file from the base64 binary?

Related

how to save file as zip without saving it to local folder

I'm trying to create a download function for my streamlit app. But what I currently have allows me to download a zip file via a button on my streamlit app but unfortunately it also saves it to my local folder. I don't want it to save to my local folder. The problem is when I initialize the file_zip object. I want the zip file in a specific name ideally the same name of the file that the user upload with a '.zip' extension (i.e datafile that contains the string file name as a parameter in the function). But everytime I do that it keeps saving the zip file in my local folder. Is there an alternative to this? BTW I'm trying to save list of pandas dataframe into one zip file.
def downloader(list_df, datafile, file_type):
file = datafile.name.split(".")[0]
#create zip file
with zipfile.ZipFile("{}.zip".format(file), 'w', zipfile.ZIP_DEFLATED) as file_zip:
for i in range(len(list_df)):
file_zip.writestr(file+"_group_{}".format(i)+".csv", pd.DataFrame(list_df[i]).to_csv())
file_zip.close()
#pass it to front end for download
zip_name = "{}.zip".format(file)
with open(zip_name, "rb") as f:
bytes=f.read()
b64 = base64.b64encode(bytes).decode()
href = f'Click Here To Download'
st.markdown(href, unsafe_allow_html=True)
It sounds like you want to create the zip file in memory and use it later to build a base64 encoding. You can use an io.BytesIO() object with ZipFile, rewind it, and read the data back for base64 encoding.
import io
def downloader(list_df, datafile, file_type):
file = datafile.name.split(".")[0]
#create zip file
zip_buf = io.BytesIO()
with zipfile.ZipFile(zip_buf, 'w', zipfile.ZIP_DEFLATED) as file_zip:
for i in range(len(list_df)):
file_zip.writestr(file+"_group_{}".format(i)+".csv", pd.DataFrame(list_df[i]).to_csv())
zip_buf.seek(0)
#pass it to front end for download
zip_name = "{}.zip".format(file)
b64 = base64.b64encode(zip_buf.read()).decode()
del zip_buf
href = f'Click Here To download'
st.markdown(href, unsafe_allow_html=True)

failed: Network error while downloading Excel file generated by jupyter notebook

my jupyter notebook is saving a dataframe(having styles) to an excel file. then I have created a link to download this excel file:
df=df.to_excel('ABC.xlsx', index=True)
filename ='ABC.xlsx'
file_link = "<a href='{href}' download='ABC.xlsx'> Download ABC.xlsx</a>"
html = HTML(file_link.format(href=filename))
dispaly(html)
but when i click on link-Download ABC.xlsx, I am getting- Failed: Network error.
On the contrary it is working fine when i am downloading CSV file the same way
Adding csv code, there is some base64 encoding added in csv code without which csv code is also not working:
def func(df,title="Download csv file",filename="ABC.csv"):
csv=df.to_csv(index=True)
b64 =base64.b64encode(csv.encode())
payload=b64.decode()
html = "{title}"
html = html.format(payload=payload,title=title,filename=filename)
return HTML(html)
i tried editing this function for excel file:
def func(df,title="Download excel file",filename="ABC.xlsx"):
xls=df.to_excel("xyz.xlsx",index=True)
b64 =base64.b64encode(xls.encode())
payload=b64.decode()
html = "{title}"
html = html.format(payload=payload,title=title,filename=filename)
return HTML(html)
for excel code it giving error: 'NoneType' object has no attribute 'encode'
In you csv code, you use csv=df.to_csv(index=True), according to docs
If path_or_buf is None, returns the resulting csv format as a string.
Otherwise returns None.
here you didn't specify path_or_buf, so return value is csv content. this is why you can download csv.
Now to_excel doc desn't say it has any return value. so your payload don't contain anything at all.
To solve, you can manually open file again and read as base64 format string:
def file_to_base64(file):
#file should be the actual file name you wrote
with open(file, "rb") as image_file:
encoded_string = base64.b64encode(image_file.read())
return encoded_string.decode()
replace the two lines
b64 =base64.b64encode(xls.encode())
payload=b64.decode()
with:
payload = file_to_base64(file)

python - read csv from s3 and identify its encoding info for pandas dataframe

I am making a service to download csv files from s3 bucket.
The bucket contains csv with various encodings (which I may not know before hand), since users are uploading these files.
This is what I am trying:
...
obj = s3c.get_object(Bucket= BUCKET_NAME , Key = KEY)
content = io.BytesIO(obj['Body'].read())
df_s3_file = pd.read_csv(content)
...
This works fine for utf-8, however for other format it fails (obviously!).
I have found an independent code which can help me identify the encoding of a csv file on a netwrok drive.
It looks like this:
...
def find_encoding(fname):
r_file = open(fname, 'rb').read()
result = chardet.detect(r_file)
charenc = result['encoding']
return charenc
my_encoding = find_encoding(content)
print('detected csv encoding: ',my_encoding)
df_s3_file = pd.read_csv(content, encoding=my_encoding)
...
This snippet works absolutely fine for a file on a drive(local), but how do I do this for a file on s3 bucket? Since I am reading the s3 file as io.BytesIO object.
I think if I write the file on a drive and then execute the function find_encoding, its going to work, since that function takes csv file as input as opoosed to BytesIO object.
Is there a way to do this without having to download the file on a drive, within memory?
Note: the files size is not very big (<10 mb).
According to their docs s3c.get_object(Bucket= BUCKET_NAME , Key = KEY) will return a dict where one of the keys is ContentEncoding so I would try:
obj = s3c.get_object(Bucket= BUCKET_NAME , Key = KEY)
print(obj["ContentEncoding"])

Save unicode text from response without encoding into file

I want to download config file from my router via web scraping. The procedure I want to achieve is this:
Save the config file into disk
Send a factory reset
Load the config file previously downloaded.
So far, I have this code:
with requests.Session() as s: # To login into the modem
pagePostBackUp = 'https://192.168.1.1/goform/BackUp'
s.post(urlLogin, data=loginCredentials, verify=False, timeout=5)
dataBackUp = {'dir': 'admin/','file': 'cmconfig.cfg'}
resultBackUp = s.post(pagePostBackUp, data=dataBackUp, verify=False, timeout=10)
print(resultBackUp.text)
The last line is what I want to save into a file. But, when I try to do it with this code:
f = open('/Users/user/Desktop/file.cfg', 'w')
Throws an error that ascii codec can't encode character. If I save the file with, for example, encode='utf16', differs from what I originally download manually.
So, the question is, How can I save this file with the same encoding the router gives me via web? (As unicode). The content of the file looks like this:
�����g���m��� ������Z������ofpqJ
U\V,.o/����zf��v���~W3=,�D};y�tL�cJ
Change the last line of your code to the following:
with open('/Users/user/Desktop/file.cfg', 'wb') as f:
f.write(resultBackUp.content)
This will treat the payload as data (bytes), not text: the file is opened in binary mode, and the content is taken as-is.
There's no encoding/decoding happening.

GAE/python: How to parse multipart data file from Cloud Storage (GCS)?

I am uploading file to GCS (input file from webform) by following gae/python code:-
fx = self.request.body_file
gcs_file = gcs.open(_GCS_BUCKET_NAME + "new_file_name", 'w')
gcs_file.write(fx.read())
I am able to retrieve this uploaded data from GCS with following code
gcs_file = gcs.open(_GCS_BUCKET_NAME + "new_file_name", 'r')
self.response.write(gcs_file.read())
Since the uploaded data is multiform data, how do I extract original file name, and original file (binary data) itself from gcs_file.read() ?
You can save the file in GCS with its original file name
or
You can create a small DataStore DB where with an entity having the data of your original file name and a reference to your file in GCS
Why should the original binary be different?
If you have managed to download it from GCS you can verify the binary is actually the same.

Categories

Resources