I am reading an Excel file (.xlsx) with pysmb.
import tempfile
from smb.SMBConnection import SMBConnection
conn = SMBConnection(userID, password, client_machine_name, server_name, use_ntlm_v2 = True)
conn.connect(server_ip, 139)
file_obj = tempfile.TemporaryFile()
file_attributes, filesize = conn.retrieveFile(service_name, test.xlsx, file_obj)
This step works, I am able to transform the file in pandas.DataFrame
import pandas as pd
pd.read_excel(file_obj)
Next, I want to save the file, the file is saved but if I want to open it with Excel, I have an error message "Excel has run into an error"
Here the code to save the file
conn.storeFile(service_name, 'test_save.xlsx', file_obj)
file_obj.close()
How can I save correctly the file and open it with excel ?
Thank you
I tried with a .txt file file and it is working. An error occurs with .xlsx, .xls and .pdf files. I have also tried without extension, same issue, imossible to open the file.
I would like to save the file with .pdf and .xlsx extension, and open it.
Thank you.
I found a solution an I will post it here in case someone face a similar issue.
Excel can be save as a binary stream.
from io import BytesIO
df = pd.read_excel(file_obj)
output = BytesIO()
writer = pd.ExcelWriter(output, engine='xlsxwriter')
df.to_excel(writer, sheet_name='data', index = False)
writer.save()
output.seek(0)
conn.storeFile(service_name, 'test_save.xlsx', output)
Related
I have this dataframe, and I want to save it as a excel file in a sharepoint folder.
This is my code:
from office365.runtime.auth.client_credential import ClientCredential
from office365.sharepoint.client_context import ClientContext
# auth
client_credentials = ClientCredential(var_client_id, var_client_secret)
ctx = ClientContext(var_sp_site).with_credentials(client_credentials)
df = pd.DataFrame(sql_table)
var_relative_url = "sharepoint_path/sharepoint_path"
target_folder = ctx.web.get_folder_by_server_relative_url(var_relative_url)
target_folder.upload_file(content=df.to_excel(excel_writer='teste.xlsx'), file_name='teste.xlsx').execute_query() # Here is my problem
When I execute this code, the excel file is created at the folder, but when I try to open the file on sharepoint interface it raises a error ("cannot be opened").
This code will run on a cloud function, so I can't use local files to upload.
I'm investigating this issue right now. Not solved yet buy I can give you a work around: use .save()
wb = pd.ExcelWriter( outputFile, mode='w', engine="openpyxl" )
myDataFrame.to_excel( wb, sheet_name='sheet1', index=False )
wb.save()
From error to warning ;)
Right now my final output is in excel format. I wanted to compressed my excel file using gzip. Is there a way to do it ?
import pandas as pd
import gzip
import re
def renaming_ad_unit():
with gzip.open('weekly_direct_house.xlsx.gz') as f:
df = pd.read_excel(f)
result = df['Ad unit'].to_list()
for index, a_string in enumerate(result):
modified_string = re.sub(r"\([^()]*\)", "", a_string)
df.at[index,'Ad unit'] = modified_string
return df.to_excel('weekly_direct_house.xlsx',index=False)
Yes, this is possible.
To create a gzip file, you can open the file like this:
with gzip.open('filename.xlsx.gz', 'wb') as f:
...
Unfortunately, when I tried this, I found that I get the error OSError: Negative seek in write mode. This is because the Pandas excel writer moves backwards in the file when writing, and uses multiple passes to write the file. This is not allowed by the gzip module.
To fix this, I created a temporary file, and wrote the excel file there. Then, I read the file back, and write it to the compressed archive.
I wrote a short program to demonstrate this. It reads an excel file from a gzip archive, prints it out, and writes it back to another gzip file.
import pandas as pd
import gzip
import tempfile
def main():
with gzip.open('apportionment-2020-table02.xlsx.gz') as f:
df = pd.read_excel(f)
print(df)
with tempfile.TemporaryFile() as excel_f:
df.to_excel(excel_f, index=False)
with gzip.open('output.xlsx.gz', 'wb') as gzip_f:
excel_f.seek(0)
gzip_f.write(excel_f.read())
if __name__ == '__main__':
main()
Here's the file I'm using to demonstrate this: Link
You could also use io.BytesIO to create file in memory and write excel in this file and next write this file as gzip on disk.
I used link to excel file from Nick ODell answer.
import pandas as pd
import gzip
import io
df = pd.read_excel('https://www2.census.gov/programs-surveys/decennial/2020/data/apportionment/apportionment-2020-table02.xlsx')
buf = io.BytesIO()
df.to_excel(buf)
buf.seek(0) # move to the beginning of file
with gzip.open('output.xlsx.gz', 'wb') as f:
f.write(buf.read())
Similar to Nick ODell answer.
import pandas as pd
import gzip
import io
df = pd.read_excel('https://www2.census.gov/programs-surveys/decennial/2020/data/apportionment/apportionment-2020-table02.xlsx')
with io.BytesIO() as buf:
df.to_excel(buf)
buf.seek(0) # move to the beginning of file
with gzip.open('output.xlsx.gz', 'wb') as f:
f.write(buf.read())
Tested on Linux
I found to have problem with conversion of .xlsx file to .csv using pandas library.
Here is the code:
import pandas as pd
# If pandas is not installed: pip install pandas
class Program:
def __init__(self):
# file = input("Insert file name (without extension): ")
file = "Daty"
self.namexlsx = "D:\\" + file + ".xlsx"
self.namecsv = "D:\\" + file + ".csv"
Program.export(self.namexlsx, self.namecsv)
def export(namexlsx, namecsv):
try:
read_file = pd.read_excel(namexlsx, sheet_name='Sheet1', index_col=0)
read_file.to_csv(namecsv, index=False, sep=',')
print("Conversion to .csv file has been successful.")
except FileNotFoundError:
print("File not found, check file name again.")
print("Conversion to .csv file has failed.")
Program()
After running the code the console shows the ValueError: File is not a recognized excel file error
File i have in that directory is "Daty.xlsx". Tried couple of thigns like looking up to documentation and other examples around internet but most had similar code.
Edit&Update
What i intend afterwards is use the created csv file for conversion to .db file. So in the end the line of import will go .xlsx -> .csv -> .db. The idea of such program came as a training, but i cant get past point described above.
You can use like this-
import pandas as pd
data_xls = pd.read_excel('excelfile.xlsx', 'Sheet1', index_col=None)
data_xls.to_csv('csvfile.csv', encoding='utf-8', index=False)
I checked the xlsx itself, and apparently for some reason it was corrupted with columns in initial file being merged into one column. After opening and correcting the cells in the file everything runs smoothly.
Thank you for your time and apologise for inconvenience.
I'd like to upload an excel file in my web app, read the contents of it and display some cells. So basically I don't need to save the file as it's a waste of time.
Relevant code:
if form.validate_on_submit():
f = form.xml_file.data.stream
xml = f.read()
workbook = xlrd.open_workbook(xml)
sheet = workbook.sheet_by_index(0)
I can't wrap my mind around this as I keep getting filetype errors no matter what I try. I'm using Flask Uploads, WTF.file and xlrd for reading the file.
Reading the file works okay if I save it previously with f.save
To answer my own question, I solved it with
if form.validate_on_submit():
# Put the file object(stream) into a var
xls_object = form.xml_file.data.stream
# Open it as a workbook
workbook = xlrd.open_workbook(file_contents=xls_object.read())
I have created a flask application where it take excel file and it cleans the data and gives the output in excel file. basically what happens is user uploads the excel file after submitting browser should download the filtered excel file.
can someone suggest me references? I need to know how to set the path. I tried converting it into the HTML by using but this code doesn't download but it automatically saves the cleaned file as HTML.
data1 = df.to_html()
#write html to file
text_file = open("data1.html", "w")
text_file.write(data1)
text_file.close()
return render_template("success.html", name = text_file)
I have an app that receive an input file, read it with pandas, process it (with a make_processing() function I created) and return it as .csv. Is almost the same for an excel file.
file = request.files['file']
content = file.read()
df = pd.read_csv(io.BytesIO(content))
df2 = make_processing(df)
si = io.StringIO()
df2.to_csv(si, index=False, encoding='utf8')
output = flask.make_response(si.getvalue())
output.headers["Content-Disposition"] = f"attachment; filename=periodicidad.csv"
output.headers["Content-type"] = "text/csv"
return output