Python: fromfile array too big and/or gzip will not write

Python: fromfile array too big and/or gzip will not write - python

My goal is to extract a large subimage from an even larger uncompressed image (30000, 65536), without reading the whole image into memory, then save the subimage in a compressed format. At the moment, I only care about determining how well the compression worked as an indicator of image complexity; I don't need to save the image in a visible format, but I would love to. This is my first python script and I am getting stuck on the entry-limits for some function calls.
I get two related errors based on two alternate attempts (with boring lines removed):
Version 1:
fd = open(fname,'rb')
h5file=h5py.File(h5fname, "w")
data = h5file.create_dataset("data", (Height, Width), dtype='i', maxshape=(None, None)) # try i8?
data = fromfile(file=fd, dtype='h', count=Height*Width) #FAIL
fd.close()
h5file.close()
outfilez = gzip.open(outfilename,'wb')
outfilez.write(data)
outfilez.close()
Error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:...\sitecustomize.py", line 523, in runfile
execfile(filename, namespace)
File "C:...\Script_v3.py", line 183, in <module>
data = fromfile(file=fd, dtype=datatype, count=BandHeight_tracks*Width)
ValueError: array is too big.
Version 2 (for loop to reduce fromfile usage):
fd = open(fname,'rb')
h5file=h5py.File(h5fname, "w")
data = h5file.create_dataset("data", (Height, Width), dtype='i', maxshape=(None, None)) # try i8?
for i in range(0, Height-1):
data[i:] = fromfile(file=fd, dtype='h', count=Width)
fd.close()
h5file.close()
outfilez = gzip.open(outfilename,'wb')
outfilez.write(data)
outfilez.close()
Error (I do not get this with the other version):
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:...\sitecustomize.py", line 523, in runfile
execfile(filename, namespace)
File "C:...\Script_v4.py", line 195, in <module>
outfilez.write(data)
File "C:...\gzip.py", line 235, in write
self.crc = zlib.crc32(data, self.crc) & 0xffffffffL
TypeError: must be string or read-only buffer, not Dataset
I am running code using spyder on a 64bit Win7 machine with 16GBRAM. The images are a max of 4GB.

You should look if the arrays in NumPy suits your needs:
NumPy is a general-purpose array-processing package designed to efficiently manipulate large multi-dimensional arrays of arbitrary records without sacrificing too much speed for small multi-dimensional arrays. NumPy is built on the Numeric code base and adds features introduced by numarray as well as an extended C-API and the ability to create arrays of arbitrary type which also makes NumPy suitable for interfacing with general-purpose data-base applications.

Solution as I found out eventually:
Error 1:
Ignore. If I need to operate with this much memory, switch to C++
Error 2:
The type was not set somehow. From the terminal we see:
>>> data[0]
array([2, 2, 0, ..., 3, 4, 2])
It should be:
>>> data[0]
array([2, 2, 0, ..., 3, 4, 2], dtype=int16)
The fix is that I added the following line after the for loop:
data=int16(data)

Related

Minecraft Bedrock Python library cannot place blocks. What is the problem?

I am using the python library at this link (https://github.com/BluCodeGH/bedrock) which allows you to place blocks with python code in any Minecraft world. I used the example code that they gave, only changing the "path to save" for my world.
My code :
import bedrock
path_to_save="C:\\Users\\Harris\\AppData\\Local\\Packages\\Microsoft.MinecraftUWP_8wekyb3d8b
bwe\\LocalState\\games\\com.mojang\\minecraftWorlds\\Z1I3YuYlAgA=\\"
with bedrock.World(path_to_save) as world:
block = bedrock.Block("minecraft:stone") # Name must include `minecraft:`
world.setBlock(0, 0, 0, block)
block = bedrock.Block("minecraft:stone", 2) # Data value
world.setBlock(0, 1, 0, block)
if world.getBlock(0, 2, 0).name == "minecraft:stone":
print("More stone!")
# Autosave on close.
print("Done!")
Using this code gives this error message:
Traceback (most recent call last):
File "C:\Users\Harris\Documents\minecraftpythontest.py", line 7, in <module>
world.setBlock(0, 0, 0, block)
File "C:\Users\Harris\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\bedrock\bedrock.py", line 47, in setBlock
chunk = self.getChunk(cx, cz, dimension)
File "C:\Users\Harris\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\bedrock\bedrock.py", line 30, in getChunk
chunk = Chunk(self.db, x, z, dimension)
File "C:\Users\Harris\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\bedrock\bedrock.py", line 98, in __init__
self.version = self._loadVersion(db)
File "C:\Users\Harris\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\bedrock\bedrock.py", line 125, in _loadVersion
raise NotImplementedError("Unexpected chunk version {} at chunk {} {} (Dim {}).".format(version, self.x, self.z, self.dimension))
NotImplementedError: Unexpected chunk version 40 at chunk 0 0 (Dim 0).
I have tried looking through the files which were given with the library and mentioned in the error message, but I didn't see anything that looked like it would cause this error.
I would greatly appreciate if anyone could help work out what the problem is or provide a different python library, which could do similar things

spatial regression in Python - read matrix from list

I have a following problem. I am following this example about spatial regression in Python:
import numpy
import libpysal
import spreg
import pickle
# Read spatial data
ww = libpysal.io.open(libpysal.examples.get_path("baltim_q.gal"))
w = ww.read()
ww.close()
w_name = "baltim_q.gal"
w.transform = "r"
Example above works. But I would like to read my own spatial matrix which I have now as a list of lists. See my approach:
ww = libpysal.io.open(matrix)
But I got this error message:
Traceback (most recent call last):
File "/usr/lib/python3.8/code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
File "/home/vojta/Desktop/INTERNET_HANDEL/ZASILKOVNA/optimal-delivery-branches/venv/lib/python3.8/site-packages/libpysal/io/fileio.py", line 90, in __new__
cls.__registry[cls.getType(dataPath, mode, dataFormat)][mode][0]
File "/home/vojta/Desktop/INTERNET_HANDEL/ZASILKOVNA/optimal-delivery-branches/venv/lib/python3.8/site-packages/libpysal/io/fileio.py", line 105, in getType
ext = os.path.splitext(dataPath)[1]
File "/usr/lib/python3.8/posixpath.py", line 118, in splitext
p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not list
this is how matrix looks like:
[[0, 2, 1], [2, 0, 4], [1, 4, 0]]
EDIT:
If I try to insert my matrix into the GM_Lag like this:
model = spreg.GM_Lag(
y,
X,
w=matrix,
)
I got following error:
warn("w must be API-compatible pysal weights object")
Traceback (most recent call last):
File "/usr/lib/python3.8/code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 2, in <module>
File "/home/vojta/Desktop/INTERNET_HANDEL/ZASILKOVNA/optimal-delivery-branches/venv/lib/python3.8/site-packages/spreg/twosls_sp.py", line 469, in __init__
USER.check_weights(w, y, w_required=True)
File "/home/vojta/Desktop/INTERNET_HANDEL/ZASILKOVNA/optimal-delivery-branches/venv/lib/python3.8/site-packages/spreg/user_output.py", line 444, in check_weights
if w.n != y.shape[0] and time == False:
AttributeError: 'list' object has no attribute 'n'
EDIT 2:
This is how I read the list of lists:
import pickle
with open("weighted_matrix.pkl", "rb") as f:
matrix = pickle.load(f)
How can I insert list of lists into spreg.GM_Lag ? Thanks

Why do you want to pass it to the libpysal.io.open method? If I understand correctly this code, you first open a file, then read it (and the read method seems to be returning a List). So in your case, where you already have the matrix, you don't need to neither open nor read any file.
What will be needed though is what w is supposed to look like here: w = ww.read(). If it is a simple matrix, then you can initialize w = matrix. If the read method also format the data a certain way, you'll need to do it another way. If you could describe the expected behavior of the read method (e.g. what does the input file contain, and what is returned), it would be useful.
As mentioned, as the data is formatted into a libpysal.weights object, you must build one yourself. This can supposedly be done with this method libpysal.weights.W. (Read the doc too fast).

Python-OpenCV floodfill function; strange type errors

I am trying to implement my own version of the MatLab function imhmin() in Python using OpenCV and (naturally) NumPy. If you are not familiar with this MatLab function, it's extremely useful for segmentation. MatLab's documentation can explain it much better than I can:
https://it.mathworks.com/help/images/ref/imhmin.html
Here is what I have so far:
(For the sake of keeping this short, I did not include the local_min function. It takes one image parameter and returns an image of the same size where local minima are 1s and everything else is 0.)
from volume import show
import cv2
import numpy
def main():
arr = numpy.array( [[5,5,5,5,5,5,5],
[5,0,3,1,4,2,5],
[5,5,5,5,5,5,5]] ) + 1
res = imhmin(arr, 3)
print(res)
def imhmin(src, h):
# TODO: speed up function by cropping image
edm = src.copy()
# d is the domain / all values contained in the array
d = numpy.unique(edm)
# for the index of each local minima (sorted gtl)
indices = numpy.nonzero(local_min(edm)) # get indices
indices = numpy.dstack((indices[0], indices[1]))[0].tolist() # zip
# sort based on the value of edm[] at that index
indices.sort(key = lambda _: edm[_[0],_[1]], reverse = True)
for (x,y) in indices:
start = edm[x,y] # remember original value of minima
# for each in a list of heights greater than the starting height
for i in range(*numpy.where(d==edm[x,y])[0], d.shape[0]-1):
# prevent exceeding target height
step = start + h if (d[i+1] - start > h) else d[i+1]
#-------------- WORKS UNTIL HERE --------------#
# complete floodFill syntax:
# cv2.floodFill(image, mask, seed, newVal[, loDiff[, upDiff[, flags]]]) → retval, rect
# fill UPWARD onto image (and onto mask?)
cv2.floodFill(edm, None, (y,x), step, 0, step-d[i], 4)
# fill DOWNWARD NOT onto image
# have you overflowed?
if __name__ == "__main__":
main()
Which works fine until it gets to the floodfill line. It barks this error back:
Traceback (most recent call last):
File "edm.py", line 94, in <module>
main()
File "edm.py", line 14, in main
res = imhmin(arr, 3)
File "edm.py", line 66, in imhmin
cv2.floodFill(edm, None, (y,x), step, 0, step-d[i], 4)
TypeError: Layout of the output array image is incompatible with cv::Mat (step[ndims-1] != elemsize or step[1] != elemsize*nchannels)
At first I thought maybe the way I laid out the parameters was wrong because of the stuff about step in the traceback, but I tried changing that variable's name and have come to the conclusion that step is some variable name in OpenCV's code. It's talking about the output array, and I'm not using a mask, so something must be wrong with the array edm.
I can suppress this error by replacing the floodfill line with this one:
cv2.floodFill(edm.astype(numpy.double), None, (y,x), step, 0, step-d[i], 4)
The difference being that I am typecasting the numpy array to a float array. Then I am left with this error:
Traceback (most recent call last):
File "edm.py", line 92, in <module>
main()
File "edm.py", line 14, in main
res = imhmin(arr, 3)
File "edm.py", line 64, in imhmin
cv2.floodFill(edm.astype(numpy.double), None, (y,x), step, 0, step-d[i], 4)
TypeError: Scalar value for argument 'newVal' is not numeric
This is where I started suspecting something was seriously wrong, because step is "obviously" going to be an integer here (maybe it isn't obvious, but I did try printing it and it looks like it's just an integer, not an array of one integer or anything weird like that).
To entertain the error message, I typecast the newVal parameter to a float. I got pretty much the exact same error message about the upDiff parameter, so I just typecast that too, resulting in this line of code:
cv2.floodFill(edm.astype(numpy.double), None, (y,x), float(step), 0, float(step-d[i]), 4)
I know this isn't how I want to be doing things, but I just wanted to see what would happen. What happened was I got this scary looking error:
Traceback (most recent call last):
File "edm.py", line 92, in <module>
main()
File "edm.py", line 14, in main
res = imhmin(arr, 3)
File "edm.py", line 64, in imhmin
cv2.floodFill(edm.astype(numpy.double), None, (y,x), float(step), 0, float(step-d[i]), 4)
cv2.error: OpenCV(3.4.2) /opt/concourse/worker/volumes/live/9523d527-1b9e-48e0-7ed0-a36adde286f0/volume/opencv-suite_1535558719691/work/modules/imgproc/src/floodfill.cpp:587: error: (-210:Unsupported format or combination of formats) in function 'floodFill'
I don't even know where to start with this. I've used OpenCV's floodfill function many times before and have never run into problems like this. Can anyone provide any insight?
Thanks in advance
Antonio

How to Read nrows From Pandas HDF Storage?

What am I trying to do?
pd.read_csv(... nrows=###) can read the top nrows of a file. I'd like to do the same while using pd.read_hdf(...).
What is the problem?
I am confused by the documentation. start and stop look like what I need but when I try it, a ValueError is returned. The second thing I tried was using nrows=10 thinking that it might be an allowable **kwargs. When I do, no errors are thrown but also the full dataset is returned instead of just 10 rows.
Question: How does one correctly read a smaller subset of rows from an HDF file? (edit: without having to read the whole thing into memory first!)
Below is my interactive session:
>>> import pandas as pd
>>> df = pd.read_hdf('storage.h5')
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
df = pd.read_hdf('storage.h5')
File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 367, in read_hdf
raise ValueError('key must be provided when HDF5 file '
ValueError: key must be provided when HDF5 file contains multiple datasets.
>>> import h5py
>>> f = h5py.File('storage.h5', mode='r')
>>> list(f.keys())[0]
'table'
>>> f.close()
>>> df = pd.read_hdf('storage.h5', key='table', start=0, stop=10)
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
df = pd.read_hdf('storage.h5', key='table', start=0, stop=10)
File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 370, in read_hdf
return store.select(key, auto_close=auto_close, **kwargs)
File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 740, in select
return it.get_result()
File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 1447, in get_result
results = self.func(self.start, self.stop, where)
File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 733, in func
columns=columns, **kwargs)
File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 2890, in read
return self.obj_type(BlockManager(blocks, axes))
File "C:\Python35\lib\site-packages\pandas\core\internals.py", line 2795, in __init__
self._verify_integrity()
File "C:\Python35\lib\site-packages\pandas\core\internals.py", line 3006, in _verify_integrity
construction_error(tot_items, block.shape[1:], self.axes)
File "C:\Python35\lib\site-packages\pandas\core\internals.py", line 4280, in construction_error
passed, implied))
ValueError: Shape of passed values is (614, 593430), indices imply (614, 10)
>>> df = pd.read_hdf('storage.h5', key='table', nrows=10)
>>> df.shape
(593430, 614)
Edit:
I just attempted to use where:
mylist = list(range(30))
df = pd.read_hdf('storage.h5', key='table', where='index=mylist')
Received a TypeError indicating a Fixed format store (the default format value of df.to_hdf(...)):
TypeError: cannot pass a where specification when reading from a
Fixed format store. this store must be selected in its entirety
Does this mean I can't select a subset of rows if the format is Fixed format?

I ran into the same problem. I am pretty certain by now that https://github.com/pandas-dev/pandas/issues/11188 tracks this very problem. It is a ticket from 2015 and it contains a repro. Jeff Reback suggested that this is actually a bug, and he even pointed us towards a solution back in 2015. It's just that nobody built that solution yet. I might have a try.

Seems like this now works, at least with pandas 1.0.1. Just provide start and stop arguments:
df = pd.read_hdf('test.h5', '/floats/trajectories', start=0, stop=5)

broadcasting linalg.pinv on a 3D theano tensor

in the example below, there is a 3d numpy matrix of size (4, 3, 3)+ a solution about how to calculate pinv of each of 4 of those 3*3 matrices in numpy. I also tried to use the same function worked in numpy, in theano hoping that it is implemented the same, but it failed. Any idea how to do it in theano?
dt = np.dtype(np.float32)
a=[[[12,3,1],
[2,4,1],
[2,4,2],],
[[12,3,3],
[2,4,4],
[2,4,5],],
[[12,3,6],
[2,4,5],
[2,4,4],],
[[12,3,3],
[2,4,5],
[2,4,6]]]
a=np.asarray(a,dtype=dt)
print(a.shape)
apinv=np.zeros((4,3,3))
print(np.linalg.pinv(a[0,:,:]).shape)
#numpy solution
apinv = map(lambda n: np.linalg.pinv(n), a)
apinv = np.asarray(apinv,dtype=dt)
#theano solution (not working)
at=T.tensor3('a')
apinvt = map(lambda n: T.nlinalg.pinv(n), at)
The error is:
Original exception was:
Traceback (most recent call last):
File "pydevd.py", line 2403, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "pydevd.py", line 1794, in run
launch(file, globals, locals) # execute the script
File "exp_thn_pinv_map.py", line 35, in <module>
apinvt = map(lambda n: T.nlinalg.pinv(n), at)
File "theano/tensor/var.py", line 549, in __iter__
raise TypeError(('TensorType does not support iteration. '
TypeError: TensorType does not support iteration. Maybe you are using builtin.sum instead of theano.tensor.sum? (Maybe .max?)

The error message is
Traceback (most recent call last):
File "D:/Dropbox/source/intro_theano/pinv.py", line 32, in <module>
apinvt = map(lambda n: T.nlinalg.pinv(n), at)
File "d:\dropbox\source\theano\theano\tensor\var.py", line 549, in __iter__
raise TypeError(('TensorType does not support iteration. '
TypeError: TensorType does not support iteration. Maybe you are using builtin.sum instead of theano.tensor.sum? (Maybe .max?)
This is occurring because, as the error message indicates, the symbolic variable at is not iterable.
The fundamental problem here is that you're incorrectly mixing immediately executed Python code with delayed execution Theano symbolic code.
You need to use a symbolic loop, not a Python loop. The correct solution is to use Theano's scan operator:
at=T.tensor3('a')
apinvt, _ = theano.scan(lambda n: T.nlinalg.pinv(n), at, strict=True)
f = theano.function([at], apinvt)
print np.allclose(f(a), apinv)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: fromfile array too big and/or gzip will not write - python

Related

Minecraft Bedrock Python library cannot place blocks. What is the problem?

spatial regression in Python - read matrix from list

Python-OpenCV floodfill function; strange type errors

How to Read nrows From Pandas HDF Storage?

broadcasting linalg.pinv on a 3D theano tensor

Categories

Resources