Python: Iteratively Writing to Excel Files - python

Python 2.7. I'm using xlsxwriter.
Let's say I have myDict = {1: 'One', 2: 'Two', 3: 'Three'}
I need to perform some transformation on the value and write the result to a spreadsheet.
So I write a function to create a new file and put some headers in there and do formatting, but don't close it so I can write further with my next function.
Then I write another function for transforming my dict values and writing them to the worksheet.
I'm a noob when it comes to classes so please forgive me if this looks silly.
import xlsxwriter
class ReadWriteSpreadsheet(object):
def __init__(self, outputFile=None, writeWorkbook=None, writeWorksheet=None):
self.outputFile = outputFile
self.writeWorksheet = writeWorksheet
self.writeWorkbook = writeWorkbook
# This function works fine
def setup_new_spreadsheet(self):
self.writeWorkbook = xlsxwriter.Workbook(self.outputFile)
self.writeWorksheet = self.writeWorkbook.add_worksheet('My Worksheet')
self.writeWorksheet.write('A1', 'TEST')
# This one does not
def write_data(self):
# Forget iterating through the dict for now
self.writeWorksheet.write('A5', myDict[1])
x = ReadWriteSpreadsheet(outputFile='test.xlsx')
x.setup_new_spreadsheet()
x.write_data()
I get:
Exception Exception: Exception('Exception caught in workbook destructor. Explicit close() may be required for workbook.',) in <bound method Workbook.__del__ of <xlsxwriter.workbook.Workbook object at 0x00000000023FDF28>> ignored
The docs say this error is due to not closing the workbook, but if I close it then I can't write to it further...
How do I structure this class so that the workbook and worksheet from setup_new_spreadsheet() is able to be written to by write_data()?

The exception mentioned in your question is triggered when python realises you will not need to use your Workbook any more in the rest of your code and therefore decides to delete it from his memory (garbage collection). When doing so, it will realise you haven't closed your workbook yet and so will not have persisted your excel spreadsheet at all on the disk (only happen on close I assume) and will raise that exception.
If you had another method close on your class that did: self.writeWorkbook.close() and made sure to call it last you would not have that error.

When you do ReadWriteSpreadsheet() you get a new instance of the class you've defined. That new instance doesn't have any knowledge of any workbooks that were set up in a different instance.
It looks like what you want to do is get a single instance, and then issue the methods on that one instance:
x = ReadWriteSpreadsheet(outputFile='test.xlsx')
x.setup_new_spreadsheet()
x.write_data()
To address your new concern:
The docs say this error is due to not closing the workbook, but if I close it then I can't write to it further...
Yes, that's true, you can't write to it further. That is one of the fundamental properties of Excel files. At the level we're working with here, there's no such thing as "appending" or "updating" an Excel file. Even the Excel program itself cannot do it. You only have two viable approaches:
Keep all data in memory and only commit to disk at the very end.
Reopen the file, reading the data into memory; modify the in-memory data; and write all the in-memory data back out to a new disk file (which can have the same name as the original if you want to overwrite).
The second approach requires using a package that can read Excel files. The main choices there are xlrd and OpenPyXL. The latter will handle both reading and writing, so if you use that one, you don't need XlsxWriter.

Related

Saving list of many python variables into excel sheet while simultaneously keeping variable types defined?

It's possible with xlsxwriter to save variables to existing excel files and read them after, though the problem is that the variables are stored as strings in my excel file.
Let's say I have a list of many different variables with various types (pd.datetimerange, pd.df, np.arrays, etc.), if I save them to the excel file the variable type would be lost.
Also the Python script is called from my Excel file so I can't change anything in it without writing a VBA script. Which would temporarily close my workbook, dump the chars (with say pickle strings) and reopen it.
Is it possible to retrieve the types from the excel file without writing a parsing function first (which would take the excel string and yield me with the equivalent variable type)?
As per your comment, you can get eval to correctly process the symbols that are local to some module by passing the appropriate dict of locals into eval, along with your string. Here's a workable solution:
import pandas as pd
def getlocals(obj, lcls=None):
if lcls is None: lcls = dict(locals().items())
objlcls = {k:v for k,v in obj.__dict__.items() if not k.startswith('_')}
lcls.update(objlcls)
return lcls
x = "[123,DatetimeIndex(['2018-12-04','2018-12-05', '2018-12-06'],dtype='datetime64[ns]', freq='D')]"
lcls = getlocals(pd)
result = eval(x, globals(), lcls)
print(result)
Output:
[123, DatetimeIndex(['2018-12-04', '2018-12-05', '2018-12-06'], dtype='datetime64[ns]', freq='D')]
As a Responsible Person, it is also my duty to warn you that using eval for your application is ridiculously unsafe. There are many discussions of the dangers of eval, and none of them suggest there's a way to completely mitigate those dangers. Be careful if you choose to use this code.

Best Way To Save/Load Data To/From A .CSV File

I'm trying to save and load a list of tuples of 2 ndarrays and an int to and from a .csv file.
In my current implementation, when I save and load a list l, there is some error in the recovered list on the order of 10^-10. Is there a way to save and recover values more precisely? I would also appreciate comments on my code in general. Thanks!
This is what I have now:
def save_l(l,path):
tup=()
for X in l:
u=X[0].reshape(784*9)
v=X[2]*np.ones(1)
w=np.concatenate((u,X[1],v))
tup+=(w,)
L=np.row_stack(tup)
df=pd.DataFrame(L)
df.to_csv(path)
def load_l(path):
df=pd.read_csv(path)
L=df.values
l=[]
for v in L:
tup=()
for i in range(784):
tup+=(v[9*i+1:9*(i+1)+1],)
T=np.row_stack(tup)
Q=v[9*784+1:10*784+1]
i=v[7841]
l.append((T,Q,i))
return(l)
It may be possible that the issue you are experiencing is due to absence of .csv file protection during save and load.
a good way to make sure that your file is locked until all data are saved/loaded completely is using a context manager. In this way, you won't lose any data in case your system stops execution due to whatever reason, because all results are saved in the moment when they are available.
I recommend to use the with-statement, whose primary use is an exception-safe cleanup of the object used inside (in this case your .csv). In other words, with makes sure that files are closed, locks released, contexts restored etc.
with open("myfile.csv", "a") as reference: # Drop to csv w/ context manager
df.to_csv(reference, sep = ",", index = False) # Same goes for read_csv
# As soon as you are here, reference is closed
If you try this and still see your error, it's not due to save/load issues.

Constant first row of a .csv file?

I have a Python code which is logging some data into a .csv file.
logging_file = 'test.csv'
dt = datetime.datetime.now()
f = open(logging_file, 'a')
f.write('\n "{:%H:%M:%S}",{},{}'.format(dt,x,y,))
The above code is the core part and this produces continuous data in .csv file as
"00:34:09" ,23.05,23.05
"00:36:09" ,24.05,24.05
"00:38:09" ,26.05,26.05
... etc.,
Now I wish to add the following lines in first row of this data. time, data1,data2.I expect output as
time, data1, data2
"00:34:09" ,23.05,23.05
"00:36:09" ,24.05,24.05
"00:38:09" ,26.05,26.05
... etc.,
I tried many ways. Those ways not produced me the result as preferred format.But I am unable to get my expected result.
Please help me to solve the problem.
I would recommend writing a class specifically for creating and managing logs.Have it initialize a file, on creation, with the expected first line (don't forget a \n character!), and keep track of any necessary information about that log(the name of the log it created, where it is, etc). You can then have the class 'write' to the log (append the log, really), you can create new logs as necessary, and, you can have it check for existing logs, and make decisions about either updating what is existing, or scrapping it and starting over.

change keyword value in the header of a FITS file

I am trying to change the value of a keyword in the header of a FITS file.
Quite simple, this is the code:
import pyfits
hdulist = pyfits.open('test.fits') # open a FITS file
prihdr = hdulist[1].header
print prihdr['AREASCAL']
effarea = prihdr['AREASCAL']/5.
print effarea
prihdr['AREASCAL'] = effarea
print prihdr['AREASCAL']
I print the steps many times to check the values are correct. And they are.
The problem is that, when I check the FITS file afterwards, the keyword value in the header is not changed. Why does this happen?
You are opening the file in read-only mode. This won't prevent you from modifying any of the in-memory objects, but closing or flushing to the file (as suggested in other answers to this question) won't make any changes to the file. You need to open the file in update mode:
hdul = pyfits.open(filename, mode='update')
Or better yet use the with statement:
with pyfits.open(filename, mode='update') as hdul:
# Make changes to the file...
# The changes will be saved and the underlying file object closed when exiting
# the 'with' block
You need to close the file, or explicitly flush it, in order to write the changes back:
hdulist.close()
or
hdulist.flush()
Interestingly, there's a tutorial for that in the astropy tutorials github. Here is the ipython notebook viewer version of that tutorial that explains it all.
Basically, you are noticing that the python instance does not interact with disk instance. You have to save a new file or overwrite the old one explicitly.

Formatting a single row as CSV

I'm creating a script to convert a whole lot of data into CSV format. It runs on Google AppEngine using the mapreduce API, which is only relevant in that it means each row of data is formatted and output separately, in a callback function.
I want to take advantage of the logic that already exists in the csv module to convert my data into the correct format, but because the CSV writer expects a file-like object, I'm having to instantiate a StringIO for each row, write the row to the object, then return the content of the object, each time.
This seems silly, and I'm wondering if there is any way to access the internal CSV formatting logic of the csv module without the writing part.
The csv module wraps the _csv module, which is written in C. You could grab the source for it and modify it to not require the file-like object, but poking around in the module, I don't see any clear way to do it without recompiling.
One option could be having your own "file-like" object. Actually, cvs.writer requires for the object only to have a write method, so:
class PseudoFile(object):
def write(self, string):
# Do whatever with your string
csv.writer(PseudoFile()).writerow(row)
You're skipping a couple steps in there, but maybe it's just what you want.

Categories

Resources