Converting files to digital image failing with "tile cannot extend outside image" - python

I am trying to recreate some of the work from the blog posting http://sarvamblog.blogspot.com/2013/04/clustering-malware-corpus.html
import itertools
import glob
import numpy,scipy, os, array
from scipy.misc import imsave
for filename in list(glob.glob('file/*.file')):
f = open(filename,'rb');
#just want to make sure I get the right file'
print filename
ln = os.path.getsize(filename); # length of file in bytes
width = 256;
rem = ln%width;
a = array.array("B"); # uint8 array
a.fromfile(f,ln-rem);
f.close();
g = numpy.reshape(a,(len(a)/width,width));
g = numpy.uint8(g);
fpng = filename + ".png"
# make sure the png process and everything else is going'
print fpng
scipy.misc.imsave(fpng,g);`
And although this runs great on 1 or 2 files, I run into problems on once I expand to dozens
Traceback (most recent call last):
File "<stdin>", line 14, in <module>
File "/usr/lib/python2.7/dist-packages/scipy/misc/pilutil.py", line 120, in imsave
im = toimage(arr)
File "/usr/lib/python2.7/dist-packages/scipy/misc/pilutil.py", line 183, in toimage
image = Image.fromstring('L',shape,bytedata.tostring())
File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 1797, in fromstring
im.fromstring(data, decoder_name, args)
File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 590, in fromstring
d.setimage(self.im)
ValueError: tile cannot extend outside image
I assume that my issue is with not either A: closing the scipy.misc.imsave or B: not resetting the arrarys. Any help would be greatly appreciated

Managed to figure it out with a try/except loop. Once I did that I was able to determine that only certain files were canceling out. These files were extremely small (125 bytes). My assumption is that they were too small to create all the info needed for scipy

im.crop(box) ⇒ image
The box is a 4-tuple defining the left, upper, right, and lower pixel coordinate.
when lower is small than upper in my code,this error has happened.

Related

spatial regression in Python - read matrix from list

I have a following problem. I am following this example about spatial regression in Python:
import numpy
import libpysal
import spreg
import pickle
# Read spatial data
ww = libpysal.io.open(libpysal.examples.get_path("baltim_q.gal"))
w = ww.read()
ww.close()
w_name = "baltim_q.gal"
w.transform = "r"
Example above works. But I would like to read my own spatial matrix which I have now as a list of lists. See my approach:
ww = libpysal.io.open(matrix)
But I got this error message:
Traceback (most recent call last):
File "/usr/lib/python3.8/code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
File "/home/vojta/Desktop/INTERNET_HANDEL/ZASILKOVNA/optimal-delivery-branches/venv/lib/python3.8/site-packages/libpysal/io/fileio.py", line 90, in __new__
cls.__registry[cls.getType(dataPath, mode, dataFormat)][mode][0]
File "/home/vojta/Desktop/INTERNET_HANDEL/ZASILKOVNA/optimal-delivery-branches/venv/lib/python3.8/site-packages/libpysal/io/fileio.py", line 105, in getType
ext = os.path.splitext(dataPath)[1]
File "/usr/lib/python3.8/posixpath.py", line 118, in splitext
p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not list
this is how matrix looks like:
[[0, 2, 1], [2, 0, 4], [1, 4, 0]]
EDIT:
If I try to insert my matrix into the GM_Lag like this:
model = spreg.GM_Lag(
y,
X,
w=matrix,
)
I got following error:
warn("w must be API-compatible pysal weights object")
Traceback (most recent call last):
File "/usr/lib/python3.8/code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 2, in <module>
File "/home/vojta/Desktop/INTERNET_HANDEL/ZASILKOVNA/optimal-delivery-branches/venv/lib/python3.8/site-packages/spreg/twosls_sp.py", line 469, in __init__
USER.check_weights(w, y, w_required=True)
File "/home/vojta/Desktop/INTERNET_HANDEL/ZASILKOVNA/optimal-delivery-branches/venv/lib/python3.8/site-packages/spreg/user_output.py", line 444, in check_weights
if w.n != y.shape[0] and time == False:
AttributeError: 'list' object has no attribute 'n'
EDIT 2:
This is how I read the list of lists:
import pickle
with open("weighted_matrix.pkl", "rb") as f:
matrix = pickle.load(f)
How can I insert list of lists into spreg.GM_Lag ? Thanks
Why do you want to pass it to the libpysal.io.open method? If I understand correctly this code, you first open a file, then read it (and the read method seems to be returning a List). So in your case, where you already have the matrix, you don't need to neither open nor read any file.
What will be needed though is what w is supposed to look like here: w = ww.read(). If it is a simple matrix, then you can initialize w = matrix. If the read method also format the data a certain way, you'll need to do it another way. If you could describe the expected behavior of the read method (e.g. what does the input file contain, and what is returned), it would be useful.
As mentioned, as the data is formatted into a libpysal.weights object, you must build one yourself. This can supposedly be done with this method libpysal.weights.W. (Read the doc too fast).

problem related to h5py and create_dataset

Maybe the question is dumb, but so far I have not been able to find a solution.
I have been handed a code from other person who was working probably with a different set than mine (e.g. Python 2 instead of 3, etc).
So I have done some small changes to make things work, but I am stuck in a probably simple problem related to h5py.
The part of the code where it crushes looks like:
labels_ALL = ['ionic_str','psi0','psi1','psi2','psid','zeta','sig0','sig1','sig2','sigd','sig0_eq','sig1_eq','sig2_eq','sigd_eq','ch_bal_EDL','ch_bal_aq', 'sum_resid']
units_ALL = ['(mol/L)','(V)','(V)','(V)','(V)','(V)','(C/m**2)','(C/m**2)','(C/m**2)','(C/m**2)','(mol(eq))','(mol(eq))','(mol(eq))','(mol(eq))','(C/m**2)','(mol(eq)/L)',' ']
for i in range(len(Labels)):
labels_ALL.append(Labels[i])
units_ALL.append('(mol/L)')
base.create_dataset('Labels', data=labels_ALL)
base.create_dataset('Units', data=units_ALL)
The problem seems to be in base.create_dataset:
Traceback (most recent call last):
File "C:\Users\DaniJ\Documents\PostDoc_Jena\Trips, Conf, etc\Sinfonia Workshop\Exercise_1\exercise_1_SINFONIA_for_One\NR_chem_SINGLE_NoEu.py", line 252, in <module>
base.create_dataset('Labels', data=labels_ALL)
File "C:\Users\DaniJ\anaconda3\lib\site-packages\h5py\_hl\group.py", line 136, in create_dataset
dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
File "C:\Users\DaniJ\anaconda3\lib\site-packages\h5py\_hl\dataset.py", line 118, in make_new_dset
tid = h5t.py_create(dtype, logical=1)
File "h5py\h5t.pyx", line 1634, in h5py.h5t.py_create
File "h5py\h5t.pyx", line 1656, in h5py.h5t.py_create
File "h5py\h5t.pyx", line 1717, in h5py.h5t.py_create
TypeError: No conversion path for dtype: dtype('<U10')
the variable base seems to be a h5py._hl.files.File variable.
Does somebody how can I solve this problem?
Thanks
Best regards,
Dani
Did you solve your problem? I'm 99.9% sure it's related to your Labels data -- likely it's in a NumPy array instead of a List. I wrote 3 short examples to demonstrate the difference.
The first code segment uses a List and successfully creates the
datasets in file SO_69900543_1.h5.
The second code segment reproduces your error. It converts the List
to a NumPy Array then fails when attempting to create the datasets
in file SO_69900543_2.h5. Notice that it gives the same error
message you encountered: TypeError: No conversion path for dtype: dtype('<U10').
The third code segment shows how to modify numpy.str_ elements to str (solves problem in segment #2). Note that the each Labels value is converted with str() before it is added to Labels_All.
Maybe this will help you find (and fix) your problem with Unicode data.
Code segment 1 (works):
Labels = ['H+','Na+','Cl-','OH-','>SOH_x','>SO-_x','>SONa_x','>SOH2+_x','>SOH2Cl_x','>SOH_y','>SO-_y','>SONa_y']
labels_ALL = ['ionic_str','psi0','psi1','psi2','psid','zeta','sig0','sig1','sig2','sigd','sig0_eq','sig1_eq','sig2_eq','sigd_eq','ch_bal_EDL','ch_bal_aq', 'sum_resid']
units_ALL = ['(mol/L)','(V)','(V)','(V)','(V)','(V)','(C/m**2)','(C/m**2)','(C/m**2)','(C/m**2)','(mol(eq))','(mol(eq))','(mol(eq))','(mol(eq))','(C/m**2)','(mol(eq)/L)',' ']
for i in range(len(Labels)):
labels_ALL.append(Labels[i])
units_ALL.append('(mol/L)')
with h5py.File('SO_69900543_1.h5','w') as base:
base.create_dataset('Labels', data=labels_ALL)
base.create_dataset('Units', data=units_ALL)
Code segment 2 (returns TypeError):
Labels = ['H+','Na+','Cl-','OH-','>SOH_x','>SO-_x','>SONa_x','>SOH2+_x','>SOH2Cl_x','>SOH_y','>SO-_y','>SONa_y']
# Convert Labels List to NumPy array
# This will trigger the error when creating the dataset
Labels = np.array(Labels)
labels_ALL = ['ionic_str','psi0','psi1','psi2','psid','zeta','sig0','sig1','sig2','sigd','sig0_eq','sig1_eq','sig2_eq','sigd_eq','ch_bal_EDL','ch_bal_aq', 'sum_resid']
units_ALL = ['(mol/L)','(V)','(V)','(V)','(V)','(V)','(C/m**2)','(C/m**2)','(C/m**2)','(C/m**2)','(mol(eq))','(mol(eq))','(mol(eq))','(mol(eq))','(C/m**2)','(mol(eq)/L)',' ']
for i in range(len(Labels)):
labels_ALL.append(Labels[i])
units_ALL.append('(mol/L)')
for i in range(len(labels_ALL)):
print(i, type(labels_ALL[i]), type(units_ALL[i]))
with h5py.File('SO_69900543_2.h5','w') as base:
base.create_dataset('Labels', data=labels_ALL)
base.create_dataset('Units', data=units_ALL)
Code segment 3 (works):
Labels = ['H+','Na+','Cl-','OH-','>SOH_x','>SO-_x','>SONa_x','>SOH2+_x','>SOH2Cl_x','>SOH_y','>SO-_y','>SONa_y']
# Convert Labels List to NumPy array
# This will trigger the error when creating the dataset if not modified
Labels = np.array(Labels)
labels_ALL = ['ionic_str','psi0','psi1','psi2','psid','zeta','sig0','sig1','sig2','sigd','sig0_eq','sig1_eq','sig2_eq','sigd_eq','ch_bal_EDL','ch_bal_aq', 'sum_resid']
units_ALL = ['(mol/L)','(V)','(V)','(V)','(V)','(V)','(C/m**2)','(C/m**2)','(C/m**2)','(C/m**2)','(mol(eq))','(mol(eq))','(mol(eq))','(mol(eq))','(C/m**2)','(mol(eq)/L)',' ']
for i in range(len(Labels)):
# use str() to convert from 'numpy.str_' to 'str'
labels_ALL.append(str(Labels[i]))
units_ALL.append('(mol/L)')
for i in range(len(labels_ALL)):
print(i, type(labels_ALL[i]), type(units_ALL[i]))
with h5py.File('SO_69900543_2.h5','w') as base:
base.create_dataset('Labels', data=labels_ALL)
base.create_dataset('Units', data=units_ALL)

TypeError: '(slice(0, 15, None), 15)' is an invalid key

I have a code in Python that looks something like the code pasted below. For context, the all csv files print [15 rows x 16 columns], I just changed the name for privacy purposes.
import numpy as np
import pandas as pd
C = pd.read_csv('/Users/name/Desktop/filename1.csv')
Chome = pd.read_csv('/Users/name/Desktop/filename2.csv')
Cwork = pd.read_csv('/Users/name/Desktop/filename3.csv')
Cschool = pd.read_csv('/Users/name/Desktop/filename4.csv')
Cother = pd.read_csv('/Users/name/Desktop/filename5.csv')
Cf = np.zeros([17,17])
Cf = C
Cf[0:15,16] = C[0:15,15]
Cf[16,0:15] = C[15,0:15]
Cf[16,16] = C[15,15]
print(Cf)
When I run the code I get the following error:
runfile('/Users/name/.spyder-py3/untitled12.py', wdir='/Users/name/.spyder-py3')
Traceback (most recent call last):
File "/Users/name/.spyder-py3/untitled12.py", line 23, in <module>
Cf[0:15,16] = C[0:15,15]
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 2800, in __getitem__
indexer = self.columns.get_loc(key)
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 116, in pandas._libs.index.IndexEngine.get_loc
TypeError: '(slice(0, 15, None), 15)' is an invalid key
I am not exactly sure what this error means. I am pretty new to python, so debugging is a skill I am trying to better understand. So any advice on what I can do to fix this error, or what it means would be helpful. Thank you.
Note the following sequence in your code sample:
C = pd.read_csv(...)
... # Other cases of pd.read_csv
Cf = np.zeros([17,17])
So, at least till now, C is a DataFrame and Cf is a Numpy array.
Then Cf = C is probably a logical error, since it overwrites
the Numpy array (full of zeroes) with another reference to C.
And now as the offending instruction (Cf[0:15,16] = C[0:15,15]) is concerned:
Note that C[0:15,15] is wrong (run this code on your own to see it).
In case of pandasonic DataFrames you can use "positional addressing",
including slices, using iloc.
On the other hand, this notation is allowed for Numpy arrays.
So, assuming that Cf = C is not needed and Cf should remain a
Numpy array, you probably should correct this instruction to:
Cf[0:15,16] = C.iloc[0:15,15]
And make analogous corrections in remaining instructions in your code.
Edit
Another option is to refer to the underlying Numpy array in C DataFrame,
using values attribute.
In this case you can use Numpythonic addressing style, e.g.:
C.values[0:15,15]
causes no error.

Python-OpenCV floodfill function; strange type errors

I am trying to implement my own version of the MatLab function imhmin() in Python using OpenCV and (naturally) NumPy. If you are not familiar with this MatLab function, it's extremely useful for segmentation. MatLab's documentation can explain it much better than I can:
https://it.mathworks.com/help/images/ref/imhmin.html
Here is what I have so far:
(For the sake of keeping this short, I did not include the local_min function. It takes one image parameter and returns an image of the same size where local minima are 1s and everything else is 0.)
from volume import show
import cv2
import numpy
def main():
arr = numpy.array( [[5,5,5,5,5,5,5],
[5,0,3,1,4,2,5],
[5,5,5,5,5,5,5]] ) + 1
res = imhmin(arr, 3)
print(res)
def imhmin(src, h):
# TODO: speed up function by cropping image
edm = src.copy()
# d is the domain / all values contained in the array
d = numpy.unique(edm)
# for the index of each local minima (sorted gtl)
indices = numpy.nonzero(local_min(edm)) # get indices
indices = numpy.dstack((indices[0], indices[1]))[0].tolist() # zip
# sort based on the value of edm[] at that index
indices.sort(key = lambda _: edm[_[0],_[1]], reverse = True)
for (x,y) in indices:
start = edm[x,y] # remember original value of minima
# for each in a list of heights greater than the starting height
for i in range(*numpy.where(d==edm[x,y])[0], d.shape[0]-1):
# prevent exceeding target height
step = start + h if (d[i+1] - start > h) else d[i+1]
#-------------- WORKS UNTIL HERE --------------#
# complete floodFill syntax:
# cv2.floodFill(image, mask, seed, newVal[, loDiff[, upDiff[, flags]]]) → retval, rect
# fill UPWARD onto image (and onto mask?)
cv2.floodFill(edm, None, (y,x), step, 0, step-d[i], 4)
# fill DOWNWARD NOT onto image
# have you overflowed?
if __name__ == "__main__":
main()
Which works fine until it gets to the floodfill line. It barks this error back:
Traceback (most recent call last):
File "edm.py", line 94, in <module>
main()
File "edm.py", line 14, in main
res = imhmin(arr, 3)
File "edm.py", line 66, in imhmin
cv2.floodFill(edm, None, (y,x), step, 0, step-d[i], 4)
TypeError: Layout of the output array image is incompatible with cv::Mat (step[ndims-1] != elemsize or step[1] != elemsize*nchannels)
At first I thought maybe the way I laid out the parameters was wrong because of the stuff about step in the traceback, but I tried changing that variable's name and have come to the conclusion that step is some variable name in OpenCV's code. It's talking about the output array, and I'm not using a mask, so something must be wrong with the array edm.
I can suppress this error by replacing the floodfill line with this one:
cv2.floodFill(edm.astype(numpy.double), None, (y,x), step, 0, step-d[i], 4)
The difference being that I am typecasting the numpy array to a float array. Then I am left with this error:
Traceback (most recent call last):
File "edm.py", line 92, in <module>
main()
File "edm.py", line 14, in main
res = imhmin(arr, 3)
File "edm.py", line 64, in imhmin
cv2.floodFill(edm.astype(numpy.double), None, (y,x), step, 0, step-d[i], 4)
TypeError: Scalar value for argument 'newVal' is not numeric
This is where I started suspecting something was seriously wrong, because step is "obviously" going to be an integer here (maybe it isn't obvious, but I did try printing it and it looks like it's just an integer, not an array of one integer or anything weird like that).
To entertain the error message, I typecast the newVal parameter to a float. I got pretty much the exact same error message about the upDiff parameter, so I just typecast that too, resulting in this line of code:
cv2.floodFill(edm.astype(numpy.double), None, (y,x), float(step), 0, float(step-d[i]), 4)
I know this isn't how I want to be doing things, but I just wanted to see what would happen. What happened was I got this scary looking error:
Traceback (most recent call last):
File "edm.py", line 92, in <module>
main()
File "edm.py", line 14, in main
res = imhmin(arr, 3)
File "edm.py", line 64, in imhmin
cv2.floodFill(edm.astype(numpy.double), None, (y,x), float(step), 0, float(step-d[i]), 4)
cv2.error: OpenCV(3.4.2) /opt/concourse/worker/volumes/live/9523d527-1b9e-48e0-7ed0-a36adde286f0/volume/opencv-suite_1535558719691/work/modules/imgproc/src/floodfill.cpp:587: error: (-210:Unsupported format or combination of formats) in function 'floodFill'
I don't even know where to start with this. I've used OpenCV's floodfill function many times before and have never run into problems like this. Can anyone provide any insight?
Thanks in advance
Antonio

Python Binary files.

Hi I am having an issue using unstack in python,
fileID= open('B1b1_t100000.beam','r');
npart = 1E6;
ncoord = 7;
coords = np.reshape(struct.unpack('d'*int(ncoord*npart),fileID.read()),(npart,ncoord));
fileID.close()
And I am getting the error
Traceback (most recent call last):
File "transfer_lev_B1.py", line 30, in <module>
coords = np.reshape(struct.unpack('d'*int(ncoord*npart),fileID.read()),(npart,ncoord));
struct.error: unpack requires a string argument of length 56000000
I cant really see where the problem is. The file byte size is 56000000. In a previous attempt with np=1E4 the code worked for a different file with the same format (less total lines). But i have the problem when i go to a larger file with more lines..
ok I solved my problem,
import struct
import numpy as np
import matplotlib.pyplot as plt
if __name__ == '__main__':
fileID= open('B1b1_t100000.beam','r');
npart = 1E6;
ncoord = 7;
coords=np.fromfile('B1b1_t100000.beam',dtype=np.float64);
coords=coords[:(npart*ncoord)];
coords=np.reshape(coords,(npart,ncoord));
fileID.close()
# Beam 1
b1_x=coords[:,0];
b1_y=coords[:,2];
b1_z=coords[:,4];
b1_px=coords[:,1];
b1_py=coords[:,3];
b1_deltap =coords[:,5];
beam1=np.array([b1_x,b1_px,b1_y,b1_py,b1_z,b1_deltap,coords[:,6]],np.float64);
beam1=beam1.T;
# Map applied and new coordinates calculated.
x_mod=np.sqrt(foc)*coords[:,0];
y_mod=np.sqrt(foc)*coords[:,2];
px_mod=np.sqrt(defoc)*coords[:,1];
py_mod=np.sqrt(defoc)*coords[:,3];
beam1_mod=np.array([x_mod,px_mod,y_mod,py_mod,b1_z,b1_deltap,coords[:,6]],np.float64);
beam1_mod=beam1_mod.T;
#---------------Check shape of matrix----------------
#print coords.shape
# print (beam1_mod).shape
# print beam1.shape
# print 'beam1= \n', beam1
# print 'modified \n', beam1_mod
#----------------------------------------------------
# New coordinates printed to binary file.
fileMod=open("B1b1_t100000_mod.beam","w");
beam1_mod.tofile(fileMod);
fileMod.close()

Categories

Resources