Unable to load pickle file - python

I was previously able to load a pickle file. I saved a new file under a different name. I am unable to load either the old or the new file. Which is a bummer as it contains data which I have worked hard to scrub.
Here is the code that I use to save:
def pickleStore():
pickle.dump(store, open("...shelf3.p", "wb"))
Here is the code that I use to re-load:
def pickleLoad():
store = pickle.load(open(".../shelf3.p","rb" ) )
The created file exists, and I run pickleLoad() no errors come up, neither does any variables show in the panel variable explorer. If I try to load a non-existent file, I get a error message.
I am running spyder, python 3.5.
Any suggestions?

If you want to write to a module-level variable from a function, you need to use the global keyword:
store = None
def pickleLoad():
global store
store = pickle.load(open(".../shelf3.p","rb" ) )
...or return the value and perform the assignment from module-level code:
store = None
def pickleLoad():
return pickle.load(open(".../shelf3.p","rb" ) )
store = pickleLoad()

As a general and more versatile approach I would suggest something like this:
def load(file_name):
with open(simulation, 'rb') as pickle_file:
return pickle.load(pickle_file)
def save(file_name, data):
with open(file_name, 'wb') as f:
pickle.dump(data, f)
I have added this snippet to several projects in order to reduce rewriting same code several times.

Related

function execution problem when called, python

I have a small python program to edit a row only in a csv file, I do this by getting all the rows and columns of the file and replacing the selected row with what the user entered.
The point is that I have a function that is activated depends on which row has been selected but it cannot be activated, I have already tried many ways, I have already called it clearly and this function is not executing yet.
I already have the code but I just need to activate the function.
This is a small example of my code:
import pandas as pd
my_variable = "ninguna"
new_variable1 = "hello"
new_variable2 = "How are you"
new_variable3 = "brother"
if my_variable == "none":
with open("priv/productos.csv", 'r') as edit:
edit = edit.read()
def edit_row(self):
def tryloc(df, col, idx, default=None):
try:
return df.iloc[col, idx]
except IndexError:
return default
print("the variable if it is running")
edit = pd.read_csv("priv/products.csv")
product_2 = tryloc(edit, 1, 0)
brand_2 = tryloc(edit, 1, 1)
price_2 = tryloc(edit, 1, 2)
with open("priv/products.csv", 'w') as login_main:
login_main.write("PRODUCTS" +",")
login_main.write("BRANDS" +",")
login_main.write("PRICES" +"\n")
login_main.write(str(new_variable1) +",")
login_main.write(str(new_variable2) +",")
login_main.write(str(new_variable3) +"\n")
login_main.write(str(product_2) +",")
login_main.write(str(brand_2) +",")
login_main.write(str(price_2) +"\n")
edit_row(self)
When I run the program everything works except the function :(
Does anyone know what this problem is?
Thank you
Your function have to be written before your condition. Inside the condition should be only edit_row(data) and this will be a call of your function. Also self is something that you can use in functions when you define Classes (I don't see classes in your code). Also I don't understand why did you include one function inside another one. What it will return in the case of error. What is default?
Call of function have to be outside of function.
I'd recommend laying out your code more like this:
import pandas as pd
def edit_row(filename):
with open(filename, 'r') as edit:
data = edit.read()
# lines to edit data here
# lines to save edited data here
edit_row("priv/productos.csv")
What you have right now is that my_variable contains the string 'ninguna' and the lines to define and call edit_row() only execute if my_variable contains the string 'none'. So when you run your code, the definition and call to edit_row() are all skipped.
Your code also assumes that edit_row() is a method on an object named edit whose class is never defined. For that to work, you'd need to define a class with edit_row() as a method. For now, just go procedural.

How to create a variable whose value persists across file reload?

Common Lisp has defvar which
creates a global variable but only sets it if it is new: if it already
exists, it is not reset. This is useful when reloading a file from a long running interactive process, because it keeps the data.
I want the same in Python.
I have file foo.py which contains something like this:
cache = {}
def expensive(x):
try:
return cache[x]
except KeyError:
# do a lot of work
cache[x] = res
return res
When I do imp.reload(foo), the value of cache is lost which I want
to avoid.
How do I keep cache across reload?
PS. I guess I can follow How do I check if a variable exists? :
if 'cache' not in globals():
cache = {}
but it does not look "Pythonic" for some reason...
If it is TRT, please tell me so!
Answering comments:
I am not interested in cross-invocation persistence; I am already handling that.
I am painfully aware that reloading changes class meta-objects and I am already handling that.
The values in cache are huge, I cannot go to disk every time I need them.
Here are a couple of options. One is to use a temporary file as persistent storage for your cache, and try to load every time you load the module:
# foo.py
import tempfile
import pathlib
import pickle
_CACHE_TEMP_FILE_NAME = '__foo_cache__.pickle'
_CACHE = {}
def expensive(x):
try:
return _CACHE[x]
except KeyError:
# do a lot of work
_CACHE[x] = res
_save_cache()
return res
def _save_cache():
tmp = pathlib.Path(tempfile.gettempdir(), _CACHE_TEMP_FILE_NAME)
with tmp.open('wb') as f:
pickle.dump(_CACHE, f)
def _load_cache():
global _CACHE
tmp = pathlib.Path(tempfile.gettempdir(), _CACHE_TEMP_FILE_NAME)
if not tmp.is_file():
return
try:
with tmp.open('rb') as f:
_CACHE = pickle.load(f)
except pickle.UnpicklingError:
pass
_load_cache()
The only issue with this is that you need to trust the environment not to write anything malicious in place of the temporary file (the pickle module is not secure against erroneous or maliciously constructed data).
Another option is to use another module for the cache, one that does not get reloaded:
# foo_cache.py
Cache = {}
And then:
# foo.py
import foo_cache
def expensive(x):
try:
return foo_cache.Cache[x]
except KeyError:
# do a lot of work
foo_cache.Cache[x] = res
return res
Since the whole point of a reload is to ensure that the executed module's code is run a second time, there is essentially no way to avoid some kind of "reload detection."
The code you use appears to be the best answer from those given in the question you reference.

Install issues with 'lr_utils' in python

I am trying to complete some homework in a DeepLearning.ai course assignment.
When I try the assignment in Coursera platform everything works fine, however, when I try to do the same imports on my local machine it gives me an error,
ModuleNotFoundError: No module named 'lr_utils'
I have tried resolving the issue by installing lr_utils but to no avail.
There is no mention of this module online, and now I started to wonder if that's a proprietary to deeplearning.ai?
Or can we can resolve this issue in any other way?
You will be able to find the lr_utils.py and all the other .py files (and thus the code inside them) required by the assignments:
Go to the first assignment (ie. Python Basics with numpy) - which you can always access whether you are a paid user or not
And then click on 'Open' button in the Menu bar above. (see the image below)
.
Then you can include the code of the modules directly in your code.
As per the answer above, lr_utils is a part of the deep learning course and is a utility to download the data sets. It should readily work with the paid version of the course but in case you 'lost' access to it, I noticed this github project has the lr_utils.py as well as some data sets
https://github.com/andersy005/deep-learning-specialization-coursera/tree/master/01-Neural-Networks-and-Deep-Learning/week2/Programming-Assignments
Note:
The chinese website links did not work when I looked at them. Maybe the server storing the files expired. I did see that this github project had some datasets though as well as the lr_utils file.
EDIT: The link no longer seems to work. Maybe this one will do?
https://github.com/knazeri/coursera/blob/master/deep-learning/1-neural-networks-and-deep-learning/2-logistic-regression-as-a-neural-network/lr_utils.py
Download the datasets from the answer above.
And use this code (It's better than the above since it closes the files after usage):
def load_dataset():
with h5py.File('datasets/train_catvnoncat.h5', "r") as train_dataset:
train_set_x_orig = np.array(train_dataset["train_set_x"][:])
train_set_y_orig = np.array(train_dataset["train_set_y"][:])
with h5py.File('datasets/test_catvnoncat.h5', "r") as test_dataset:
test_set_x_orig = np.array(test_dataset["test_set_x"][:])
test_set_y_orig = np.array(test_dataset["test_set_y"][:])
classes = np.array(test_dataset["list_classes"][:])
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
"lr_utils" is not official library or something like that.
Purpose of "lr_utils" is to fetch the dataset that is required for course.
option (didn't work for me): go to this page and there is a python code for downloading dataset and creating "lr_utils"
I had a problem with fetching data from provided url (but at least you can try to run it, maybe it will work)
option (worked for me): in the comments (at the same page 1) there are links for manually downloading dataset and "lr_utils.py", so here they are:
link for dataset download
link for lr_utils.py script download
Remember to extract dataset when you download it and you have to put dataset folder and "lr_utils.py" in the same folder as your python script that is using it (script with this line "import lr_utils").
The way I fixed this problem was by:
clicking File -> Open -> You will see the lr_utils.py file ( it does not matter whether you have paid/free version of the course).
opening the lr_utils.py file in Jupyter Notebooks and clicking File -> Download ( store it in your own folder ), rerun importing the modules. It will work like magic.
I did the same process for the datasets folder.
You can download train and test dataset directly here: https://github.com/berkayalan/Deep-Learning/tree/master/datasets
And you need to add this code to the beginning:
import numpy as np
import h5py
import os
def load_dataset():
train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels
test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels
classes = np.array(test_dataset["list_classes"][:]) # the list of classes
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
I faced similar problem and I had followed the following steps:
1. import the following library
import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
2. download the train_catvnoncat.h5 and test_catvnoncat.h5 from any of the below link:
[https://github.com/berkayalan/Neural-Networks-and-Deep-Learning/tree/master/datasets]
or
[https://github.com/JudasDie/deeplearning.ai/tree/master/Improving%20Deep%20Neural%20Networks/Week1/Regularization/datasets]
3. create a folder named datasets and paste these two files in this folder.
[ Note: datasets folder and your source code file should be in same directory]
4. run the following code
def load_dataset():
with h5py.File('datasets1/train_catvnoncat.h5', "r") as train_dataset:
train_set_x_orig = np.array(train_dataset["train_set_x"][:])
train_set_y_orig = np.array(train_dataset["train_set_y"][:])
with h5py.File('datasets1/test_catvnoncat.h5', "r") as test_dataset:
test_set_x_orig = np.array(test_dataset["test_set_x"][:])
test_set_y_orig = np.array(test_dataset["test_set_y"][:])
classes = np.array(test_dataset["list_classes"][:])
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
5. Load the data:
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
check datasets
print(len(train_set_x_orig))
print(len(test_set_x_orig))
your data set is ready, you may check the len of the train_set_x_orig, train_set_y variable. For mine, it was 209 and 50
I could download the dataset directly from coursera page.
Once you open the Coursera notebook you go to File -> Open and the following window will be display:
enter image description here
Here the notebooks and datasets are displayed, you can go to the datasets folder and download the required data for the assignment. The package lr_utils.py is also available for downloading.
below is your code, just save your file named "lr_utils.py" and now you can use it.
import numpy as np
import h5py
def load_dataset():
train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels
test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels
classes = np.array(test_dataset["list_classes"][:]) # the list of classes
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
if your code file can not find you newly created lr_utils.py file just write this code:
import sys
sys.path.append("full path of the directory where you saved Ir_utils.py file")
Here is the way to get dataset from as #ThinkBonobo:
https://github.com/andersy005/deep-learning-specialization-coursera/tree/master/01-Neural-Networks-and-Deep-Learning/week2/Programming-Assignments/datasets
write a lr_utils.py file, as above answer #StationaryTraveller, put it into any of sys.path() directory.
def load_dataset():
with h5py.File('datasets/train_catvnoncat.h5', "r") as train_dataset:
....
!!! BUT make sure that you delete 'datasets/', cuz now the name of your data file is train_catvnoncat.h5
restart kernel and good luck.
I may add to the answers that you can save the file with lr_utils script on the disc and import that as a module using importlib util function in the following way.
The below code came from the general thread about import functions from external files into the current user session:
How to import a module given the full path?
### Source load_dataset() function from a file
# Specify a name (I think it can be whatever) and path to the lr_utils.py script locally on your PC:
util_script = importlib.util.spec_from_file_location("utils function", "D:/analytics/Deep_Learning_AI/functions/lr_utils.py")
# Make a module
load_utils = importlib.util.module_from_spec(util_script)
# Execute it on the fly
util_script.loader.exec_module(load_utils)
# Load your function
load_utils.load_dataset()
# Then you can use your load_dataset() coming from above specified 'module' called load_utils
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_utils.load_dataset()
# This could be a general way of calling different user specified modules so I did the same for the rest of the neural network function and put them into separate file to keep my script clean.
# Just remember that Python treat it like a module so you need to prefix the function name with a 'module' name eg.:
# d = nnet_utils.model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1000, learning_rate = 0.005, print_cost = True)
nnet_script = importlib.util.spec_from_file_location("utils function", "D:/analytics/Deep_Learning_AI/functions/lr_nnet.py")
nnet_utils = importlib.util.module_from_spec(nnet_script)
nnet_script.loader.exec_module(nnet_utils)
That was the most convenient way for me to source functions/methods from different files in Python so far.
I am coming from the R background where you can call just one line function source() to bring external scripts contents into your current session.
The above answers didn't help, some links had expired.
So, lr_utils is not a pip library but a file in the same notebook as the CourseEra website.
You can click on "Open", and it'll open the explorer where you can download everything that you would want to run in another environment.
(I used this on a browser.)
This is how i solved mine, i copied the lir_utils file and paste it in my notebook thereafter i downloaded the dataset by zipping the file and extracting it. With the following code. Note: Run the code on coursera notebook and select only the zipped file in the directory to download.
!pip install zipfile36
zf = zipfile.ZipFile('datasets/train_catvnoncat_h5.zip', mode='w')
try:
zf.write('datasets/train_catvnoncat.h5')
zf.write('datasets/test_catvnoncat.h5')
finally:
zf.close()

pickle.load - EOFError: Ran out of input

I have an .obj file in which, previously, I transformed an image to base64 and saved with pickle.
The problem is when I try to load the .obj file with pickle, convert the code into image from base64, and load it with pygame.
The function that loads the image:
def mainDisplay_load(self):
main_folder = path.dirname(__file__)
img_main_folder = path.join(main_folder, "sd_graphics")
# loadImg
self.mainTerminal = pg.image.load(path.join(img_main_folder, self.main_uncode("tr.obj"))).convert_alpha()
The function that decodes the file:
def main_uncode(self, object):
openFile = open(object, "rb")
str = pickle.load(openFile)
openFile.close()
fileData = base64.b64decode(str)
return fileData
The error I get when the code is run:
str = pickle.load(openFile)
EOFError: Ran out of input
How can I fix it?
Python version: 3.6.2
Pygame version: 1.9.3
Update 1
This is the code I used to create the .obj file:
import base64, pickle
with open("terminal.png", "rb") as imageFile:
str = base64.b64encode(imageFile.read())
print(str)
file_pi = open("tr.obj","wb")
pickle.dump(str,file_pi)
file_pi.close()
file_pi2 = open("tr.obj","rb")
str2 = pickle.load(file_pi2)
file_pi2.close()
imgdata = base64.b64decode(str2)
filename = 'some_image.jpg' # I assume you have a way of picking unique filenames
with open(filename, 'wb') as f:
f.write(imgdata)
Once the file is created, it is loaded and a second image is created. This is to check if the image is the same or there are errors in the conversion.
As you can see, I used part of the code to load the image, but instead of saving it, it is loaded into pygame. And that's where the mistake occurs.
Update 2
I finally managed to solve it.
In the main code:
def mainDisplay_load(self):
self.all_graphics = pg.sprite.Group()
self.graphics_menu = pg.sprite.Group()
# loadImg
self.img_mainTerminal = mainGraphics(self, 0, 0, "sd_graphics/tr.obj")
In the library containing graphics classes:
import pygame as pg
import base64 as bs
import pickle as pk
from io import BytesIO as by
from lib.setting import *
class mainGraphics(pg.sprite.Sprite):
def __init__(self, game, x, y, object):
self.groups = game.all_graphics, game.graphics_menu
pg.sprite.Sprite.__init__(self, self.groups)
self.game = game
self.object = object
self.outputGraphics = by()
self.x = x
self.y = y
self.eventType()
self.rect = self.image.get_rect()
self.rect.x = self.x * tilesizeDefault
self.rect.y = self.y * tilesizeDefault
def eventType(self):
openFile = open(self.object, "rb")
str = pk.load(openFile)
openFile.close()
self.outputGraphics.write(bs.b64decode(str))
self.outputGraphics.seek(0)
self.image = pg.image.load(self.outputGraphics).convert_alpha()
For the question of why I should do such a thing, it is simple:
any attacker with sufficient motivation can still get to it easily
Python is free and open.
On the one hand, we have a person who intentionally goes and modify and recover the hidden data. But if Python is an open language, as with even more complicated and protected languages, the most motivated are able to crack the game or program and retrieve the same data.
On the other hand, we have a person who knows only the basics, or not even that. A person who cannot access the files without knowing more about the language, or decoding the files.
So you can understand that decoding files, from my point of view, does not need to be protected from a motivated person. Because even with a more complex and protected language, that motivated person will be able to get what he wants. The protection is used against people who have no knowledge of the language.
So, if the error you get is indeed "pickle: run out of input", that propably means you messed your directories in the code above, and are trying to read an empty file with the same name as your obj file is.
Actually, as it is, this line in your code:
self.mainTerminal=pg.image.load(path.join(img_main_folder,self.main_uncode
("tr.obj"))).convert_alpha()
Is completly messed up. Just read it and you can see the problem: you are passing to the main_uncode method just the file name, without directory information. And then, if it would by chance have worked, as I have poitned in the comments a while ago, you would try to use the unserialized image data as a filename from where to read your image. (You or someone else had probably thought that main_uncode should write a temporary image file and writ the image data to that, so that Pygame could read it, but as it is, it is just returning the raw image data in a string).
Threfore, by fixing the above call and passing an actual path to main_uncode, and further modifying it to write the temporary data to a file and return its path would fix the snippets of code above.
Second thing is I can't figure out why do you need this ".obj" file at all. If it is just for "security through obscurity" hopping people get your bundled file can't open the images, that is a thing far from a recommended practice. To sum up just one thing: it will delay legitimate uses of your file (like, you yourself does not seem to be able to use it ), while any attacker with sufficient motivation can still get to it easily. By opening an image, base-64 encoding and pickling it, and doing the reverse process you are doing essentially a no-operation. Even more, a pickle file can serialize and write to disk complex Python objects - but a base64 serialization of an image could be written directly to a file, with no need for pickle.
Third thing: just use with to open all the files, not just the ones you read with the imaging library, Take your time to learn a little bit more about Python.

python netcdf: making a copy of all variables and attributes but one

I need to process a single variable in a netcdf file that actually contains many attributes and variable.
I think it is not possible to update a netcdf file (see question How to delete a variable in a Scientific.IO.NetCDF.NetCDFFile?)
My approach is the following:
get the variable to process from the original file
process the variable
copy all data from the original netcdf BUT the processed variable to the final file
copy the processed variable to the final file
My problem is to code step 3. I started with the following:
def processing(infile, variable, outfile):
data = fileH.variables[variable][:]
# do processing on data...
# and now save the result
fileH = NetCDFFile(infile, mode="r")
outfile = NetCDFFile(outfile, mode='w')
# build a list of variables without the processed variable
listOfVariables = list( itertools.ifilter( lamdba x:x!=variable , fileH.variables.keys() ) )
for ivar in listOfVariables:
# here I need to write each variable and each attribute
How can I save all data and attribute in a handfull of code without having to rebuild a whole structure of data?
Here's what I just used and worked. #arne's answer updated for Python 3 and also to include copying variable attributes:
import netCDF4 as nc
toexclude = ['ExcludeVar1', 'ExcludeVar2']
with netCDF4.Dataset("in.nc") as src, netCDF4.Dataset("out.nc", "w") as dst:
# copy global attributes all at once via dictionary
dst.setncatts(src.__dict__)
# copy dimensions
for name, dimension in src.dimensions.items():
dst.createDimension(
name, (len(dimension) if not dimension.isunlimited() else None))
# copy all file data except for the excluded
for name, variable in src.variables.items():
if name not in toexclude:
x = dst.createVariable(name, variable.datatype, variable.dimensions)
dst[name][:] = src[name][:]
# copy variable attributes all at once via dictionary
dst[name].setncatts(src[name].__dict__)
If you just want to copy the file picking out variables, nccopy is a great tool as submitted by #rewfuss.
Here's a Pythonic (and more flexible) solution with python-netcdf4. This allows you to open it for processing and other calculations before writing to file.
with netCDF4.Dataset(file1) as src, netCDF4.Dataset(file2) as dst:
for name, dimension in src.dimensions.iteritems():
dst.createDimension(name, len(dimension) if not dimension.isunlimited() else None)
for name, variable in src.variables.iteritems():
# take out the variable you don't want
if name == 'some_variable':
continue
x = dst.createVariable(name, variable.datatype, variable.dimensions)
dst.variables[x][:] = src.variables[x][:]
This does not take into account of variable attributes, such as fill_values. You can do that easily following the documentation.
Do be careful, netCDF4 files once written/created this way cannot be undone. The moment you modify the variable, it is written to file at the end of with statement, or if you call .close() on the Dataset.
Of course, if you wish to process the variables before writing them, you have to be careful about which dimensions to create. In a new file, Never write to variables without creating them. Also, never create variables without having defined dimensions, as noted in python-netcdf4's documentation.
This answer builds on the one from Xavier Ho (https://stackoverflow.com/a/32002401/7666), but with the fixes I needed to complete it:
import netCDF4 as nc
import numpy as np
toexclude = ["TO_REMOVE"]
with nc.Dataset("orig.nc") as src, nc.Dataset("filtered.nc", "w") as dst:
# copy attributes
for name in src.ncattrs():
dst.setncattr(name, src.getncattr(name))
# copy dimensions
for name, dimension in src.dimensions.iteritems():
dst.createDimension(
name, (len(dimension) if not dimension.isunlimited else None))
# copy all file data except for the excluded
for name, variable in src.variables.iteritems():
if name not in toexclude:
x = dst.createVariable(name, variable.datatype, variable.dimensions)
dst.variables[name][:] = src.variables[name][:]
The nccopy utility in C netCDF versions 4.3.0 and later includes an option to list which variables are to be copied (along with their attributes). Unfortunately, it doesn't include an option for which variables to exclude, which is what you need.
However, if the list of (comma-delimited) variables to be included doesn't cause the nccopy command-line to exceed system limits, this would work. There are two variants for this option:
nccopy -v var1,var2,...,varn input.nc output.nc
nccopy -V var1,var2,...,varn input.nc output.nc
The first (-v) includes all the variable definitions, but only data for the named variables.
The second (-V) doesn't include definitions or data for unnamed variables.
I know this is an old question, but as an alternative, you can use the library netcdf and shutil:
import shutil
from netcdf import netcdf as nc
def processing(infile, variable, outfile):
shutil.copyfile(infile, outfile)
with nc.loader(infile) as in_root, nc.loader(outfile) as out_root:
data = nc.getvar(in_root, variable)
# do your processing with data and save them as memory "values"...
values = data[:] * 3
new_var = nc.getvar(out_root, variable, source=data)
new_var[:] = values
All of the recipes so far (except for one form #rewfuss, which works fine, but is not exactly a pythonic one) produce a plain NetCDF3 file, which might be killing on highly compressed NetCDF4 datasets. Here is an attempt to cope with the issue.
import netCDF4
infname="Inf.nc"
outfname="outf.nc"
skiplist="var1 var2".split()
with netCDF4.Dataset(infname) as src:
with netCDF4.Dataset(outfname, "w", format=src.file_format) as dst:
# copy global attributes all at once via dictionary
dst.setncatts(src.__dict__)
# copy dimensions
for name, dimension in src.dimensions.items():
dst.createDimension(
name, (len(dimension) if not dimension.isunlimited() else None))
# copy all file data except for the excluded
for name, variable in src.variables.items():
if name in skiplist:
continue
createattrs = variable.filters()
if createattrs is None:
createattrs = {}
else:
chunksizes = variable.chunking()
print(createattrs)
if chunksizes == "contiguous":
createattrs["contiguous"] = True
else:
createattrs["chunksizes"] = chunksizes
x = dst.createVariable(name, variable.datatype, variable.dimensions, **createattrs)
# copy variable attributes all at once via dictionary
dst[name].setncatts(src[name].__dict__)
dst[name][:] = src[name][:]
This seems to work fine and store the variables the way they are in the original file, except it does not copy some variable attributes that start from _underscore, and are not known to the NetCDF library.

Categories

Resources