PyFITS: hdulist.writeto() - python

I'm extracting extensions from a multi-extension FITS file, manipulate the data, and save the data (with the extension's header information) to a new FITS file.
To my knowledge pyfits.writeto() does the task. However, when I give it a data parameter in the form of an array, it gives me the error:
'AttributeError: 'numpy.ndarray' object has no attribute 'lower''
Here is a sample of my code:
'file = 'hst_11166_54_wfc3_ir_f110w_drz.fits'
hdulist = pyfits.open(dir + file)'
sci = hdulist[1].data # science image data
exp = hdulist[5].data # exposure time data
sci = sci*exp # converts electrons/second to electrons
file = 'test_counts.fits'
hdulist.writeto(file,sci,clobber=True)
hdulist.close()
I appreciate any help with this. Thanks in advance.

You're confusing the HDUList.writeto method, and the writeto function.
What you're calling is a method on the HDUList object that is returned when you call pyfits.open. You can think of this object as something like a file handle to your original drizzled FITS file. You can manipulate this object in place and either write it out to a new file or save updates in place (if you open the file in mode='update').
The writeto function on the other hand is not tied to any existing file. It's just a high-level function for writing an array out to a file. In your example you could write your array of electron counts out like:
pyfits.writeto(filename, data)
This will create a single-HDU FITS file with the array data in the PRIMARY HDU.
Do be aware of the admonishment at the top of this section of the docs: http://docs.astropy.org/en/v1.0.3/io/fits/index.html#convenience-functions
The functions like pyfits.writeto are there for convenience in interactive work, but are not recommendable for use in code that will be run repeatedly, as in a script. Instead have a look at these instructions to start.

It is probably because you should use hdulist.writeto(file, clobber=True). There is only one required argument:
https://pythonhosted.org/pyfits/api_docs/api_hdulists.html#pyfits.HDUList.writeto
If you give a second argument, it is used for output_verify which should be a string, not a numpy array. This probably explains your AttributeError ....

Related

h5py file subset taking more space than parent file?

I have an existing h5py file that I downloaded which is ~18G in size. It has a number of nested datasets within it:
h5f = h5py.File('input.h5', 'r')
data = h5f['data']
latlong_data = data['lat_long'].value
I want to be able to some basic min/max scaling of the numerical data within latlong, so i want to put it in its own h5py file for easier use and lower memory usage.
However, when i try to write it out to its own file:
out = h5py.File('latlong_only.h5', 'w')
out.create_dataset('latlong', data=latlong)
out.close()
The output file is incredibly large. It's still not done writing to disk and is ~85GB in space. Why is the data being written to the new file not compressed?
Could be h5f['data/lat_long'] is using compression filters (and you aren't). To check the original dataset's compression settings, use this line:
print (h5f['data/latlong'].compression, h5f['data/latlong'].compression_opts)
After writing my answer, it occurred to me that you don't need to copy the data to another file to reduce the memory footprint. Your code reads the dataset into an array, which is not necessary in most use cases. A h5py dataset object behaves similar to a NumPy array. Instead, use this line: ds = h5f1['data/latlong'] to create a dataset object (instead of an array) and use it "like" it's a NumPy array. FYI, .value is a deprecated method to return the dataset as an array. Use this syntax instead arr = h5f1['data/latlong'][()]. Loading the dataset into an array also requires more memory than using an h5py object (which could be an issue with large datasets).
There are other ways to access the data. My suggestion to use dataset objects is 1 way. Your method (extracting data to a new file) is another way. I am not found of that approach because you now have 2 copies of the data; a bookkeeping nightmare. Another alternative is to create external links from the new file to the existing 18GB file. That way you have a small file that links to the big file (and no duplicate data). I describe that method in this post: [How can I combine multiple .h5 file?][1] Method 1: Create External Links.
If you still want to copy the data, here is what I would do. Your code reads the dataset into an array then writes the array to the new file (uncompressed). Instead, copy the dataset using h5py's group .copy() method, it will retain compression settings and attributes.
See below:
with h5py.File('input.h5', 'r') as h5f1, \
h5py.File('latlong_only.h5', 'w') as h5f2:
h5f1.copy(h5f1['data/latlong'], h5f2,'latlong')

Storing data globally in Python

Django and Python newbie here. Ok, so I want to make a webpage where the user can enter a number between 1 and 10. Then, I want to display an image corresponding to that number. Each number is associated with an image filename, and these 10 pairs are stored in a list in a .txt file.
One way to retrieve the appropriate filename is to create a NumToImage model, which has an integer field and a string field, and store all 10 NumToImage objects in the SQL database. I could then retrieve the filename for any query number. However, this does not seem like such a great solution for storing a simple .txt file which I know is not going to change.
So, what is the way to do this in Python, without using a database? I am used to C++, where I would create an array of strings, one for each of the numbers, and load these from the .txt file when the application starts. This vector would then lie within a static object such that I can access it from anywhere in my application.
How can a similar thing be done in Python? I don't know how to instantiate a Python object and then enable it to be accessible from other Python scripts. The only way I can think of doing this is to pass the object instance as an argument for every single function that I call, which is just silly.
What's the standard solution to this?
Thank you.
The Python way is quite similar: you run code at the module level, and create objects in the module namespace that can be imported by other modules.
In your case it might look something like this:
myimage.py
imagemap = {}
# Now read the (image_num, image_path) pairs from the
# file one line at a time and do:
# imagemap[image_num] = image_path
views.py
from myimage import imagemap
def my_view(image_num)
image_path = imagemap[image_num]
# do something with image_path

How to set win32clipboard data on CF_HDROP format?

I am confronted to the loss of alpha channel when I try to send image to clipboard, none of the solutions described here worked with the software I am working with but when I copy paste png files into this software, the alpha channel seems to be preserved.
Under this consideration, I want to simulate the Ctrl+C on files allowed by Windows Explorer. Using Clipview I found that the field 15 : CF_HDROP is relevant to my goal. tried to set this field using win32clipboard
import win32clipboard
win32clipboard.OpenClipboard(0)
file1="C:\\Users\\User\\Desktop\\test.png"
win32clipboard.SetClipboardData(15, file1)
win32clipboard.CloseClipboard()
I don't get any error doing this, but it does not work when I try to use this new clipboard content, because as described there tuple of unicode filenames must be stored in the CF_HDROP field.
I have no clue how to proceed. I also tried with
file1= (unicode('C:\\Users\\User\\Desktop\\CANEVAS\\test.png'),)
but I got this error:
TypeError: expected a readable buffer object.
The documentation for CF_HDROP says
The data consists of an STGMEDIUM structure that contains a global memory object. The structure's hGlobal member points to a DROPFILES structure as its hGlobal member.
win32clipboard.GetClipboardData has built-in support for CF_HDROP. It decodes the STGMEDIUM and DROPFILES structures to produce a tuple of file names.
The documentation does not state that SetClipboardData has the corresponding code to construct the STGMEDIUM and DROPFILES structures from a tuple of file names.
I don't know enough about Python or its FFI to know how straightforward it is to construct the structures and pass them to the SetClipboardData function. Or if there is an existing library that will do this for you.

How to output a Numpy array to a file object for Picloud

I have a matrix-factorization process that I'm running on picloud. The output is a set of numpy arrays (ndarray).
Now, I want to save it to my bucket, but I'm not able to zero in on the right way to do it. Let's assume that the array to be saved is P.
I tried:
cloud.bucket.putf(P,'p.csv')
but that returned an error: "IOError: File object is not seekable. Cannot transmit".
I tried
numpy.ndarray.tofile(P,f, sep=",", format="%s") #outputing the array to a file object f
cloud.bucket.putf(f,'p.csv') #saving the file object f in the bucket.
I tried a couple of other things, including using using numpy.savetext (as I would if I ran it locally) but I'm not able to solve this between the picloud documentation and stackexchange questions. I haven't tried pickle yet, though. I felt this was something straightforward, but I'm feeling quite silly after spending a few hours on this.
As you guessed, you want to pickle the array as follows:
import cloud
import cPickle as pickle
# to write
cloud.bucket.putf(pickle.dumps(P), 'p.csv')
# to read
obj = pickle.loads(cloud.bucket.getf('p.csv').read())
This is a general way to serialize and store any Python object in your PiCloud Bucket. I also recommend that you store your csv files under a prefix to keep it organized [1].
[1] http://docs.picloud.com/bucket.html#namespacing-with-prefix

Saving python data for an application

I need to save multiple numpy arrays along with the user input that was used to compute the data these arrays contain in a single file. I'm having a hard time finding a good procedure to use to achieve this or even what file type to use. The only thing i can think of is too put the computed arrays along with the user input into one single array and then save it using numpy.save. Does anybody know any better alternatives or good file types for my use?
You could try using Pickle to serialize your arrays.
How about using pickle and then storing pickled array objects in a storage of your choice, like database or files?
I had this problem long ago so i dont have the code near to show you, but i used a binary write in a tmp file to get that done.
EDIT: Thats is, pickle is what i used. Thanks SpankMe and RoboInventor
Numpy provides functions to save arrays to files, e.g. savez():
outfile = '/tmp/data.dat'
x = np.arange(10)
y = np.sin(x)
np.savez(outfile, x=x, y=y)
npzfile = np.load(outfile)
print npzfile['x']
print npzfile['y']

Categories

Resources