I have a huge string that contains CSV data. I want to convert it to an Excel file (.xslx) and save it as an UploadedFile/SimpleUploadedFile. I googled as best as I could and came up with the following. result_data being the huge string, obviously.
from io import StringIO
import pandas
from django.core.files.uploadedfile import SimpleUploadedFile
### irrelevant code
result_data = StringIO(result_data)
df = pandas.DataFrame.from_csv(result_data, sep=';')
writer = pandas.ExcelWriter('file.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
result_file = writer.book
result_data.seek(0)
mimetype = 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
object.xls_file = SimpleUploadedFile('filename.xlsx', result_data.read(), content_type=mimetype)
object.save()
I've tried numerous replacements for result_data.read() such as result_data, result_file, result_file.read(), but so far none of them has worked.
EDIT: I modified my code according to jmcnamara's suggestions, but got an error from writer.save().
output = StringIO()
result_data = StringIO(result_data)
df = pandas.DataFrame.from_csv(result_data, sep=';')
writer = pandas.ExcelWriter(output, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
Traceback:
Traceback (most recent call last):
File "manage.py", line 10, in <module>
execute_from_command_line(sys.argv)
File "/venv/lib/python3.4/site-packages/django/core/management/__init__.py", line 338, in execute_from_command_line
utility.execute()
File "/venv/lib/python3.4/site-packages/django/core/management/__init__.py", line 330, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/venv/lib/python3.4/site-packages/django/core/management/base.py", line 390, in run_from_argv
self.execute(*args, **cmd_options)
File "/venv/lib/python3.4/site-packages/django/core/management/base.py", line 441, in execute
output = self.handle(*args, **options)
File "/commands/create.py", line 67, in handle
writer.save()
File "/venv/lib/python3.4/site-packages/pandas/io/excel.py", line 1413, in save
return self.book.close()
File "/venv/lib/python3.4/site-packages/xlsxwriter/workbook.py", line 296, in close
self._store_workbook()
File "/venv/lib/python3.4/site-packages/xlsxwriter/workbook.py", line 541, in _store_workbook
xlsx_file.write(os_filename, xml_filename)
File "/usr/lib/python3.4/zipfile.py", line 1373, in write
self.fp.write(zinfo.FileHeader(zip64))
TypeError: string argument expected, got 'bytes'
Exception ignored in: <bound method ZipFile.__del__ of <zipfile.ZipFile object at 0x7fe5fa2077f0>>
Traceback (most recent call last):
File "/usr/lib/python3.4/zipfile.py", line 1466, in __del__
self.close()
File "/usr/lib/python3.4/zipfile.py", line 1573, in close
self.fp.write(endrec)
TypeError: string argument expected, got 'bytes'
You probably need to close/save the xlsx file created by pandas before trying to read the data:
writer.save()
Also, with Pandas 0.17+ you can use a StringIO/BytesIO object as a filehandle to pd.ExcelWriter. For example:
import pandas as pd
import StringIO
output = StringIO.StringIO()
# Use the StringIO object as the filehandle.
writer = pd.ExcelWriter(output, engine='xlsxwriter')
# Write the data frame to the StringIO object.
pd.DataFrame().to_excel(writer, sheet_name='Sheet1')
writer.save()
xlsx_data = output.getvalue()
# Do something with the data...
Related
I'm trying to use some data that I have in an excel file. However, I'm getting an error saying that it doesn't find the file. I've looked up and the directory and the file name are correct, What am I doing wrong?
Here is the code:
import os
import pandas as pd
print(os.getcwd())
df = pd.read_excel(r'C:/Users/Eder/Desktop/TFG/Data/Interpolation_sample.xlsx',
index_col =0,parse_dates=True, sheet_name='sheet3')
And the answer from the console:
runcell(0, 'C:/Users/Eder/untitled0.py')
C:\Users\Eder\Desktop\TFG\Data
Traceback (most recent call last):
File "C:\Users\Eder\untitled0.py", line 14, in <module>
index_col =0,parse_dates=True, sheet_name='sheet3')
File "E:\Anaconda3\lib\site-packages\pandas\util\_decorators.py", line 299, in wrapper
return func(*args, **kwargs)
File "E:\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 336, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "E:\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 1072, in __init__
content=path_or_buffer, storage_options=storage_options
File "E:\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 950, in inspect_excel_format
content_or_path, "rb", storage_options=storage_options, is_text=False
File "E:\Anaconda3\lib\site-packages\pandas\io\common.py", line 651, in get_handle
handle = open(handle, ioargs.mode)
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Eder\\Desktop\\TFG\\Data\\Interpolation_sample.xlsx'
I've figured out a way to solve the problem. I just changed the name of the file from 'Interpolation_sample' to 'Interpolation sample'. I don't know why, but the underscore in the file name is what was causing this error.
All,
I've been trying to use pandas in python to load a table from access then write the data to an excel file see following code:
When running it the code (python 3.5.2) I receive the following output:
<!-- language: Python -->
import pandas as pd
import pypyodbc
conn = 'DSN=MyDSNTest'
cnxn = pypyodbc.connect(conn)
crsr = cnxn.cursor()
qy = """select * from mytbl;"""
df = pd.read_sql(qy, cnxn)
cnxn.commit()
crsr.close()
cnxn.close()
print ("read into dataframe")
#writer = pd.ExcelWriter('c:/tmp/test.xlsx')
#df.to_excel(writer, 'Data')
df.to_excel('E:/Reports/AnalyticsInput/tblHistoryAC.xlsx', Data',index=False)
# Close the Pandas Excel writer and output the Excel file.
#writer.save()
read into dataframe 199966 Traceback (most recent call last):
File "C:\Users\jeff\test.py", line 23, in
df.to_excel('c:/tmp/MyTest.xlsx', 'Data', index=False) File "C:\Python35-32\lib\site-packages\pandas\core\frame.py", line 1466, in
to_excel
excel_writer.save() File "C:\Python35-32\lib\site-packages\pandas\io\excel.py", line 790, in
save
return self.book.save(self.path) File "C:\Python35-32\lib\site-packages\openpyxl\workbook\workbook.py", line
345, in save
save_workbook(self, filename) File "C:\Python35-32\lib\site-packages\openpyxl\writer\excel.py", line 266,
in save_workbook
writer.save(filename) File "C:\Python35-32\lib\site-packages\openpyxl\writer\excel.py", line 248,
in save
self.write_data() File "C:\Python35-32\lib\site-packages\openpyxl\writer\excel.py", line 81,
in write_data
self._write_worksheets() File "C:\Python35-32\lib\site-packages\openpyxl\writer\excel.py", line 197,
in _write_worksheets
xml = ws._write() File "C:\Python35-32\lib\site-packages\openpyxl\worksheet\worksheet.py",
line 870, in _write
return write_worksheet(self) File "C:\Python35-32\lib\site-packages\openpyxl\writer\worksheet.py", line
107, in write_worksheet
write_rows(xf, ws) MemoryError
While the file is 200,000 rows I'd have to believe there is something else or another way to produce the xlsx file without getting a memory error.
Any ideas? Thanks!
Jeff
I am getting the following error from the command prompt:
Traceback (most recent call last):
File "C:\Python27\Scripts\read_csv.py", line 14, in <module>
df3.to_excel(writer, sheet_name="Section 3")
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1464, in to_ex
cel
startrow=startrow, startcol=startcol)
File "C:\Python27\lib\site-packages\pandas\io\excel.py", line 1517, in write_cells
for cell in cells:
File "C:\Python27\lib\site-packages\pandas\formats\format.py", line 1893, in get_formatted_cells
self._format_body()):
File "C:\Python27\lib\site-packages\pandas\formats\format.py", line 1864, in _format_hierarchical_rows
fill_value=True)
File "C:\Python27\lib\site-packages\pandas\indexes\base.py", line 1515, in take
na_value=self._na_value)
File "C:\Python27\lib\site-packages\pandas\indexes\base.py", line 1534, in _assert_take_fillable
taken = values.take(indices)
IndexError: cannot do a non-empty take from an empty axes.
Exception Exception: Exception('Exception caught in workbook destructor. Explicit close() may be required for workbook.',) in <bound method Workbook.__del__ of
<xlsxwriter.workbook.Workbook object at 0x024BF5B0>> ignored
What I'm trying to do is take data from 3 different CSV files and write each to a respective tab on an Excel file I saved. Current code below:
import pandas as pd
df1=pd.read_csv("File1.csv")
df2=pd.read_csv("File2.csv")
df3=pd.read_csv("File3.csv")
# Creates a Pandas Excel writer using XlsxWriter as the engine
writer = pd.ExcelWriter("Excel File", engine="xlsxwriter")
# Convert data frames to an XlsxWriter Excel Object
df1.to_excel(writer, sheet_name="Section 1")
df2.to_excel(writer, sheet_name="Section 2")
df3.to_excel(writer, sheet_name="Section 3")
# Close the Pandas Excel writer and output the Excel file.
writer.save()
writer.close()
I need to read a few xls files into Python.The sample data file can be found through Link:data.file. I tried:
import pandas as pd
pd.read_excel('data.xls',sheet=1)
But it gives an error message:
ERROR *** codepage 21010 -> encoding 'unknown_codepage_21010' ->
LookupError: unknown encoding: unknown_codepage_21010 Traceback (most
recent call last):
File "", line 1, in
pd.read_excel('data.xls',sheet=1)
File "C:\Anaconda3\lib\site-packages\pandas\io\excel.py", line 113,
in read_excel
return ExcelFile(io, engine=engine).parse(sheetname=sheetname, **kwds)
File "C:\Anaconda3\lib\site-packages\pandas\io\excel.py", line 150,
in init
self.book = xlrd.open_workbook(io)
File "C:\Anaconda3\lib\site-packages\xlrd__init__.py", line 435, in
open_workbook
ragged_rows=ragged_rows,
File "C:\Anaconda3\lib\site-packages\xlrd\book.py", line 116, in
open_workbook_xls
bk.parse_globals()
File "C:\Anaconda3\lib\site-packages\xlrd\book.py", line 1170, in
parse_globals
self.handle_codepage(data)
File "C:\Anaconda3\lib\site-packages\xlrd\book.py", line 794, in
handle_codepage
self.derive_encoding()
File "C:\Anaconda3\lib\site-packages\xlrd\book.py", line 775, in
derive_encoding
_unused = unicode(b'trial', self.encoding)
File "C:\Anaconda3\lib\site-packages\xlrd\timemachine.py", line 30,
in
unicode = lambda b, enc: b.decode(enc)
LookupError: unknown encoding: unknown_codepage_21010
Anyone could help with this problem?
PS: I know if I open the file in windows excel, and resave it, the code could work, but I am looking for a solution without manual adjustment.
using the ExcelFile class, I was successfully able to read the file into python.
let me know if this helps!
import xlrd
import pandas as pd
xls = pd.ExcelFile(’C:\data.xls’)
xls.parse(’Index Constituents Data’, index_col=None, na_values=[’NA’])
The below worked for me.
import xlrd
my_xls = xlrd.open_workbook('//myshareddrive/something/test.xls',encoding_override="gb2312")
I'm trying to use openpyxl to open and modify an existing excel workbook, but I can't even open the file without getting an error.
from openpyxl import load_workbook
ws = load_workbook('PO-Copy.xlsx')
I get a long TypeError as a result:
Traceback (most recent call last):
File "<module1>", line 6, in <module>
File "C:\Python27\Lib\site-packages\openpyxl\reader\excel.py", line 151, in load_workbook
_load_workbook(wb, archive, filename, read_only, keep_vba)
File "C:\Python27\Lib\site-packages\openpyxl\reader\excel.py", line 224, in _load_workbook
keep_vba=keep_vba)
File "C:\Python27\Lib\site-packages\openpyxl\reader\worksheet.py", line 308, in read_worksheet
fast_parse(ws, xml_source, shared_strings, style_table, color_index)
File "C:\Python27\Lib\site-packages\openpyxl\reader\worksheet.py", line 296, in fast_parse
parser.parse()
File "C:\Python27\Lib\site-packages\openpyxl\reader\worksheet.py", line 84, in parse
dispatcher[tag_name](element)
File "C:\Python27\Lib\site-packages\openpyxl\reader\worksheet.py", line 282, in parse_data_validation
dv = parser(tag)
File "C:\Python27\Lib\site-packages\openpyxl\worksheet\datavalidation.py", line 179, in parser
dv = DataValidation(**element.attrib)
TypeError: __init__() got an unexpected keyword argument 'errorStyle'
Has anyone else ran into this error? is there a fix I can use to keep going?
The ability to read DataValidation in existing files was added in openpyxl 2.1 but was limited to what DataValidation in Python supported. Work has started on supporting DataValidation fully and is available in the 2.2 branch at https://bitbucket.org/habub68/openpyxl