problem related to h5py and create_dataset

problem related to h5py and create_dataset - python

Maybe the question is dumb, but so far I have not been able to find a solution.
I have been handed a code from other person who was working probably with a different set than mine (e.g. Python 2 instead of 3, etc).
So I have done some small changes to make things work, but I am stuck in a probably simple problem related to h5py.
The part of the code where it crushes looks like:
labels_ALL = ['ionic_str','psi0','psi1','psi2','psid','zeta','sig0','sig1','sig2','sigd','sig0_eq','sig1_eq','sig2_eq','sigd_eq','ch_bal_EDL','ch_bal_aq', 'sum_resid']
units_ALL = ['(mol/L)','(V)','(V)','(V)','(V)','(V)','(C/m**2)','(C/m**2)','(C/m**2)','(C/m**2)','(mol(eq))','(mol(eq))','(mol(eq))','(mol(eq))','(C/m**2)','(mol(eq)/L)',' ']
for i in range(len(Labels)):
labels_ALL.append(Labels[i])
units_ALL.append('(mol/L)')
base.create_dataset('Labels', data=labels_ALL)
base.create_dataset('Units', data=units_ALL)
The problem seems to be in base.create_dataset:
Traceback (most recent call last):
File "C:\Users\DaniJ\Documents\PostDoc_Jena\Trips, Conf, etc\Sinfonia Workshop\Exercise_1\exercise_1_SINFONIA_for_One\NR_chem_SINGLE_NoEu.py", line 252, in <module>
base.create_dataset('Labels', data=labels_ALL)
File "C:\Users\DaniJ\anaconda3\lib\site-packages\h5py\_hl\group.py", line 136, in create_dataset
dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
File "C:\Users\DaniJ\anaconda3\lib\site-packages\h5py\_hl\dataset.py", line 118, in make_new_dset
tid = h5t.py_create(dtype, logical=1)
File "h5py\h5t.pyx", line 1634, in h5py.h5t.py_create
File "h5py\h5t.pyx", line 1656, in h5py.h5t.py_create
File "h5py\h5t.pyx", line 1717, in h5py.h5t.py_create
TypeError: No conversion path for dtype: dtype('<U10')
the variable base seems to be a h5py._hl.files.File variable.
Does somebody how can I solve this problem?
Thanks
Best regards,
Dani

Did you solve your problem? I'm 99.9% sure it's related to your Labels data -- likely it's in a NumPy array instead of a List. I wrote 3 short examples to demonstrate the difference.
The first code segment uses a List and successfully creates the
datasets in file SO_69900543_1.h5.
The second code segment reproduces your error. It converts the List
to a NumPy Array then fails when attempting to create the datasets
in file SO_69900543_2.h5. Notice that it gives the same error
message you encountered: TypeError: No conversion path for dtype: dtype('<U10').
The third code segment shows how to modify numpy.str_ elements to str (solves problem in segment #2). Note that the each Labels value is converted with str() before it is added to Labels_All.
Maybe this will help you find (and fix) your problem with Unicode data.
Code segment 1 (works):
Labels = ['H+','Na+','Cl-','OH-','>SOH_x','>SO-_x','>SONa_x','>SOH2+_x','>SOH2Cl_x','>SOH_y','>SO-_y','>SONa_y']
labels_ALL = ['ionic_str','psi0','psi1','psi2','psid','zeta','sig0','sig1','sig2','sigd','sig0_eq','sig1_eq','sig2_eq','sigd_eq','ch_bal_EDL','ch_bal_aq', 'sum_resid']
units_ALL = ['(mol/L)','(V)','(V)','(V)','(V)','(V)','(C/m**2)','(C/m**2)','(C/m**2)','(C/m**2)','(mol(eq))','(mol(eq))','(mol(eq))','(mol(eq))','(C/m**2)','(mol(eq)/L)',' ']
for i in range(len(Labels)):
labels_ALL.append(Labels[i])
units_ALL.append('(mol/L)')
with h5py.File('SO_69900543_1.h5','w') as base:
base.create_dataset('Labels', data=labels_ALL)
base.create_dataset('Units', data=units_ALL)
Code segment 2 (returns TypeError):
Labels = ['H+','Na+','Cl-','OH-','>SOH_x','>SO-_x','>SONa_x','>SOH2+_x','>SOH2Cl_x','>SOH_y','>SO-_y','>SONa_y']
# Convert Labels List to NumPy array
# This will trigger the error when creating the dataset
Labels = np.array(Labels)
labels_ALL = ['ionic_str','psi0','psi1','psi2','psid','zeta','sig0','sig1','sig2','sigd','sig0_eq','sig1_eq','sig2_eq','sigd_eq','ch_bal_EDL','ch_bal_aq', 'sum_resid']
units_ALL = ['(mol/L)','(V)','(V)','(V)','(V)','(V)','(C/m**2)','(C/m**2)','(C/m**2)','(C/m**2)','(mol(eq))','(mol(eq))','(mol(eq))','(mol(eq))','(C/m**2)','(mol(eq)/L)',' ']
for i in range(len(Labels)):
labels_ALL.append(Labels[i])
units_ALL.append('(mol/L)')
for i in range(len(labels_ALL)):
print(i, type(labels_ALL[i]), type(units_ALL[i]))
with h5py.File('SO_69900543_2.h5','w') as base:
base.create_dataset('Labels', data=labels_ALL)
base.create_dataset('Units', data=units_ALL)
Code segment 3 (works):
Labels = ['H+','Na+','Cl-','OH-','>SOH_x','>SO-_x','>SONa_x','>SOH2+_x','>SOH2Cl_x','>SOH_y','>SO-_y','>SONa_y']
# Convert Labels List to NumPy array
# This will trigger the error when creating the dataset if not modified
Labels = np.array(Labels)
labels_ALL = ['ionic_str','psi0','psi1','psi2','psid','zeta','sig0','sig1','sig2','sigd','sig0_eq','sig1_eq','sig2_eq','sigd_eq','ch_bal_EDL','ch_bal_aq', 'sum_resid']
units_ALL = ['(mol/L)','(V)','(V)','(V)','(V)','(V)','(C/m**2)','(C/m**2)','(C/m**2)','(C/m**2)','(mol(eq))','(mol(eq))','(mol(eq))','(mol(eq))','(C/m**2)','(mol(eq)/L)',' ']
for i in range(len(Labels)):
# use str() to convert from 'numpy.str_' to 'str'
labels_ALL.append(str(Labels[i]))
units_ALL.append('(mol/L)')
for i in range(len(labels_ALL)):
print(i, type(labels_ALL[i]), type(units_ALL[i]))
with h5py.File('SO_69900543_2.h5','w') as base:
base.create_dataset('Labels', data=labels_ALL)
base.create_dataset('Units', data=units_ALL)

Related

trend following using meta label issue with time ubtracting `n`, use `n * obj.freq`

I am trying to implement the trend following labeling from asset managment book. I found the following code that I wanted to implement, however I am getting an error
TypeError: Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported. Instead of adding/subtracting n, use n * obj.freq
!pip install yfinance
!pip install mplfinance
import yfinance as yf
import mplfinance as mpf
import numpy as np
import pandas as pd
# get the data from yfiance
df=yf.download('BTC-USD',start='2008-01-04',end='2021-06-3',interval='1d')
#code snippet 5.1
# Fit linear regression on close
# Return the t-statistic for a given parameter estimate.
def tValLinR(close):
#tValue from a linear trend
x = np.ones((close.shape[0],2))
x[:,1] = np.arange(close.shape[0])
ols = sm1.OLS(close, x).fit()
return ols.tvalues[1]
#code snippet 5.2
'''
#search for the maximum absolutet-value. To identify the trend
# - molecule - index of observations we wish to labels.
# - close - which is the time series of x_t
# - span - is the set of values of L (look forward period) that the algorithm will #try (window_size)
# The L that maximizes |tHat_B_1| (t-value) is choosen - which is the look-forward #period
# with the most significant trend. (optimization)
'''
def getBinsFromTrend(molecule, close, span):
#Derive labels from the sign of t-value of trend line
#output includes:
# - t1: End time for the identified trend
# - tVal: t-value associated with the estimated trend coefficient
#- bin: Sign of the trend (1,0,-1)
#The t-statistics for each tick has a different look-back window.
#- idx start time in look-forward window
#- dt1 stop time in look-forward window
#- df1 is the look-forward window (window-size)
#- iloc ?
out = pd.DataFrame(index=molecule, columns=['t1', 'tVal', 'bin', 'windowSize'])
hrzns = range(*span)
windowSize = span[1] - span[0]
maxWindow = span[1]-1
minWindow = span[0]
for idx in close.index:
idx += maxWindow
if idx >= len(close):
break
df_tval = pd.Series(dtype='float64')
iloc0 = close.index.get_loc(idx)
if iloc0+max(hrzns) > close.shape[0]:
continue
for hrzn in hrzns:
dt1 = close.index[iloc0-hrzn+1]
df1 = close.loc[dt1:idx]
df_tval.loc[dt1] = tValLinR(df1.values) #calculates t-statistics on period
dt1 = df_tval.replace([-np.inf, np.inf, np.nan], 0).abs().idxmax() #get largest t-statistics calculated over span period
print(df_tval.index[-1])
print(dt1)
print(abs(df_tval.values).argmax() + minWindow)
out.loc[idx, ['t1', 'tVal', 'bin', 'windowSize']] = df_tval.index[-1], df_tval[dt1], np.sign(df_tval[dt1]), abs(df_tval.values).argmax() + minWindow #prevent leakage
out['t1'] = pd.to_datetime(out['t1'])
out['bin'] = pd.to_numeric(out['bin'], downcast='signed')
#deal with massive t-Value outliers - they dont provide more confidence and they ruin the scatter plot
tValueVariance = out['tVal'].values.var()
tMax = 20
if tValueVariance < tMax:
tMax = tValueVariance
out.loc[out['tVal'] > tMax, 'tVal'] = tMax #cutoff tValues > 20
out.loc[out['tVal'] < (-1)*tMax, 'tVal'] = (-1)*tMax #cutoff tValues < -20
return out.dropna(subset=['bin'])
if __name__ == '__main__':
#snippet 5.3
idx_range_from = 3
idx_range_to = 10
df1 = getBinsFromTrend(df.index, df['Close'], [idx_range_from,idx_range_to,1]) #[3,10,1] = range(3,10) This is the issue
tValues = df1['tVal'].values #tVal
doNormalize = False
#normalise t-values to -1, 1
if doNormalize:
np.min(tValues)
minusArgs = [i for i in range(0, len(tValues)) if tValues[i] < 0]
tValues[minusArgs] = tValues[minusArgs] / (np.min(tValues)*(-1.0))
plus_one = [i for i in range(0, len(tValues)) if tValues[i] > 0]
tValues[plus_one] = tValues[plus_one] / np.max(tValues)
#+(idx_range_to-idx_range_from+1)
plt.scatter(df1.index, df0.loc[df1.index].values, c=tValues, cmap='viridis') #df1['tVal'].values, cmap='viridis')
plt.plot(df0.index, df0.values, color='gray')
plt.colorbar()
plt.show()
plt.savefig('fig5.2.png')
plt.clf()
plt.df['Close']()
plt.scatter(df1.index, df0.loc[df1.index].values, c=df1['bin'].values, cmap='vipridis')
#Test methods
ols_tvalue = tValLinR( np.array([3.0, 3.5, 4.0]) )

There are at least several issues with the code that you "found" (not just the one issue that you posted).
Before I go through some of the issues, let me say the following, which I may very well be wrong about, but based on my experience and on the way your question is worded (and the fact that you are "wanted to implement" code that you "found") it seems to me that you have minimal experience with coding and debugging.
Stackoverflow is not a place to ask others to debug your code. That said, I will try to walk you though some of the steps that I took in trying to figure out what's going on with this code, and then perhaps point you to some resources where you can learn the same skills.
Step 1:
I took the code as you posted it, and copy/pasted it into a file that I named so68871906.py; I then commented out the two lines at the top that are installing yfinance and mplfinance because I don't want to try to install them every time I run the code; rather I will install them once before running the code.
I then ran the code with the following result (similar to what you posted) ...
dino#DINO:~/code/mplfinance/examples/scratch_pad/issues$ python so68871906.py
[*********************100%***********************] 1 of 1 completed
Traceback (most recent call last):
File "so68871906.py", line 84, in <module>
df1 = getBinsFromTrend(df.index, df['Close'], [idx_range_from,idx_range_to,1]) #[3,10,1] = range(3,10) This is the issue
File "so68871906.py", line 50, in getBinsFromTrend
idx += maxWindow
File "pandas/_libs/tslibs/timestamps.pyx", line 310, in pandas._libs.tslibs.timestamps._Timestamp.__add__
TypeError: Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported. Instead of adding/subtracting `n`, use `n * obj.freq`
The key to successful debugging, especially in python, is to recognize that the Traceback gives you a lot of very important information. You just have to read through it very carefully. In the above case, the Traceback tells me:
The problem is with this line of code: idx += maxWindow. This line of code is adding idx + maxWindow and reassigning the result back to idx
The error is a TypeError which tells me there is a problem with the types of the variables. Since there are two variables on that line of code (idx and maxWindow) one may guess that one or both of those variables is the wrong type or otherwise incompatible with what the code is trying to do with the variable.
Based on the error message "Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported", and the fact that we are doing addition of idx and maxWindow, you can guess that one of the variables is of type integer or integer-array, while the other is of type Timestamp.
You can verify the types by adding print statements just before the error occurs. The code looks like this:
maxWindow = span[1]-1
minWindow = span[0]
for idx in close.index:
print('type(idx)=',type(idx))
print('type(maxWindow)=',type(maxWindow))
idx += maxWindow
Now the output looks like this:
dino#DINO:~/code/mplfinance/examples/scratch_pad/issues$ python so68871906.py
[*********************100%***********************] 1 of 1 completed
type(idx)= <class 'pandas._libs.tslibs.timestamps.Timestamp'>
type(maxWindow)= <class 'int'>
Traceback (most recent call last):
File "so68871906.py", line 86, in <module>
df1 = getBinsFromTrend(df.index, df['Close'], [idx_range_from,idx_range_to,1]) #[3,10,1] = range(3,10) This is the issue
File "so68871906.py", line 52, in getBinsFromTrend
idx += maxWindow
File "pandas/_libs/tslibs/timestamps.pyx", line 310, in pandas._libs.tslibs.timestamps._Timestamp.__add__
TypeError: Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported. Instead of adding/subtracting `n`, use `n * obj.freq`
Notice that indeed type(maxWindow) is int and type(idx) is Timestamp
The TypeError exception message further states "Instead of adding/subtracting n, use n * obj.freq" from which one may infer that n is intended to represent the integer. It seems that the error is suggesting that we multiply the integer by some frequency before adding it to the Timestamp variable. This is not entirely clear, so I Googled "pandas add integer to Timestamp" (because clearly that is what the code is trying to do). All of the top answers suggest using pandas.to_timedelta() or pandas.Timedelta().
At this point I thought to myself: it makes sense that you can't just add an integer to a Timestamp, because what are you adding? minutes? seconds? days? weeks?
However, you can add an integer number of one of these frequencies, in fact the pandas.Timedelta() constructor takes a value argument indicates a number of days, weeks, etc.
The data from yf.download() is daily (interval='1d') which suggest that the integer should be multiplied by a pandas.Timedelta of 1 day. I cannot be certain of this, because I don't have your textbook, so I am not 100% sure what the code is trying to accomplish there, but it is a reasonable guess, so I will change idx += maxWindow to
idx += (maxWindow*pd.Timedelta('1 day'))
and see what happens:
dino#DINO:~/code/mplfinance/examples/scratch_pad/issues$ python so68871906.py
[*********************100%***********************] 1 of 1 completed
Traceback (most recent call last):
File "so68871906.py", line 84, in <module>
df1 = getBinsFromTrend(df.index, df['Close'], [idx_range_from,idx_range_to,1]) #[3,10,1] = range(3,10) This is the issue
File "so68871906.py", line 51, in getBinsFromTrend
if idx >= len(close):
TypeError: '>=' not supported between instances of 'Timestamp' and 'int'
Step 2:
The code passes the modified line of code (line 50), but now it's failing on the next line of code with a similar TypeError in that it doesn't support comparing ('>=') and integer and a Timestamp. So next I try similarly modifying line 51 to:
if idx >= len(close)*pd.Timedelta('1 day'):
The result:
dino#DINO:~/code/mplfinance/examples/scratch_pad/issues$ python so68871906.py
[*********************100%***********************] 1 of 1 completed
Traceback (most recent call last):
File "so68871906.py", line 84, in <module>
df1 = getBinsFromTrend(df.index, df['Close'], [idx_range_from,idx_range_to,1]) #[3,10,1] = range(3,10) This is the issue
File "so68871906.py", line 51, in getBinsFromTrend
if idx >= len(close)*pd.Timedelta('1 day'):
TypeError: '>=' not supported between instances of 'Timestamp' and 'Timedelta'
This doesn't work either, as you can see (can't compare Timestamp and Timedelta).
Step 3:
Looking more closely at the code, it seems the code is trying to determine if adding maxWindow to idx has moved idx past the end of the data.
Looking a couple lines higher in the code, you can see that the variable idx comes from the list of Timestamp objects in close.index, so perhaps the correct comparison would be:
if idx >= close.index[-1]:
that is, comparing idx to the last possible idx value. The result:
dino#DINO:~/code/mplfinance/examples/scratch_pad/issues$ python so68871906.py
[*********************100%***********************] 1 of 1 completed
Traceback (most recent call last):
File "so68871906.py", line 84, in <module>
df1 = getBinsFromTrend(df.index, df['Close'], [idx_range_from,idx_range_to,1]) #[3,10,1] = range(3,10) This is the issue
File "so68871906.py", line 60, in getBinsFromTrend
df_tval.loc[dt1] = tValLinR(df1.values) #calculates t-statistics on period
File "so68871906.py", line 18, in tValLinR
ols = sm1.OLS(close, x).fit()
NameError: name 'sm1' is not defined
Step 4:
Wow! Great, we got past the errors on lines 50 and 51. But now we have an error on line 18. Indicating that the name sm1 is not defined. This suggests that either you did not copy all of the code that you should have, or perhaps there is something else that needs to be imported that would define sm1 for the python interpreter.
So you see, this is the basic process of debugging your code. Again the key, at least with python, is carefully reading the Traceback. And again, Stackoverflow is not intended as a place to ask others to debug your code. A little bit of searching online for things like "learn python" and "python debugging techniques" will yield a wealth of helpful information.
I hope this is pointing you in the right direction. If for some reason I am way off base with this answer, let me know and I will delete it.

TypeError: '(slice(0, 15, None), 15)' is an invalid key

I have a code in Python that looks something like the code pasted below. For context, the all csv files print [15 rows x 16 columns], I just changed the name for privacy purposes.
import numpy as np
import pandas as pd
C = pd.read_csv('/Users/name/Desktop/filename1.csv')
Chome = pd.read_csv('/Users/name/Desktop/filename2.csv')
Cwork = pd.read_csv('/Users/name/Desktop/filename3.csv')
Cschool = pd.read_csv('/Users/name/Desktop/filename4.csv')
Cother = pd.read_csv('/Users/name/Desktop/filename5.csv')
Cf = np.zeros([17,17])
Cf = C
Cf[0:15,16] = C[0:15,15]
Cf[16,0:15] = C[15,0:15]
Cf[16,16] = C[15,15]
print(Cf)
When I run the code I get the following error:
runfile('/Users/name/.spyder-py3/untitled12.py', wdir='/Users/name/.spyder-py3')
Traceback (most recent call last):
File "/Users/name/.spyder-py3/untitled12.py", line 23, in <module>
Cf[0:15,16] = C[0:15,15]
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 2800, in __getitem__
indexer = self.columns.get_loc(key)
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 116, in pandas._libs.index.IndexEngine.get_loc
TypeError: '(slice(0, 15, None), 15)' is an invalid key
I am not exactly sure what this error means. I am pretty new to python, so debugging is a skill I am trying to better understand. So any advice on what I can do to fix this error, or what it means would be helpful. Thank you.

Note the following sequence in your code sample:
C = pd.read_csv(...)
... # Other cases of pd.read_csv
Cf = np.zeros([17,17])
So, at least till now, C is a DataFrame and Cf is a Numpy array.
Then Cf = C is probably a logical error, since it overwrites
the Numpy array (full of zeroes) with another reference to C.
And now as the offending instruction (Cf[0:15,16] = C[0:15,15]) is concerned:
Note that C[0:15,15] is wrong (run this code on your own to see it).
In case of pandasonic DataFrames you can use "positional addressing",
including slices, using iloc.
On the other hand, this notation is allowed for Numpy arrays.
So, assuming that Cf = C is not needed and Cf should remain a
Numpy array, you probably should correct this instruction to:
Cf[0:15,16] = C.iloc[0:15,15]
And make analogous corrections in remaining instructions in your code.
Edit
Another option is to refer to the underlying Numpy array in C DataFrame,
using values attribute.
In this case you can use Numpythonic addressing style, e.g.:
C.values[0:15,15]
causes no error.

Iterating through DataLoader (PyTorch): RuntimeError: Expected object of scalar type unsigned char but got scalar type float for sequence element 9

I am new to PyTorch and am running into an expected error. The overall context is trying to build a building segmentation model off of Spacenet imagery. I am forked off of this repo from someone at Microsoft AI who built a segmentation model, and I am just trying to re-run her training scripts.
I've been able to download the data, and do the pre-processing. My issue comes when trying to actually train the model, I am trying to iterate through my DataLoader, and I get the following error message:
RuntimeError: Expected object of scalar type unsigned char but got
scalar type float for sequence element 9.
Snippets of code that are useful:
I have a dataset.py that creates the SpaceNetDataset class and looks like:
import os
# Ignore warnings
import warnings
import numpy as np
from PIL import Image
import torch
from torch.utils.data import Dataset
warnings.filterwarnings('ignore')
class SpaceNetDataset(Dataset):
"""Class representing a SpaceNet dataset, such as a training set."""
def __init__(self, root_dir, splits=['trainval', 'test'], transform=None):
"""
Args:
root_dir (string): Directory containing folder annotations and .txt files with the
train/val/test splits
splits: ['trainval', 'test'] - the SpaceNet utilities code would create these two
splits while converting the labels from polygons to mask annotations. The two
splits are created after chipping larger images into the required input size with
some overlaps. Thus to have splits that do not have overlapping areas, we manually
split the images (not chips) into train/val/test using utils/split_train_val_test.py,
followed by using the SpaceNet utilities to annotate each folder, and combine the
trainval and test splits it creates inside each folder.
transform (callable, optional): Optional transform to be applied
on a sample.
"""
self.root_dir = root_dir
self.transform = transform
self.image_list = []
self.xml_list = []
data_files = []
for split in splits:
with open(os.path.join(root_dir, split + '.txt')) as f:
data_files.extend(f.read().splitlines())
for line in data_files:
line = line.split(' ')
image_name = line[0].split('/')[-1]
xml_name = line[1].split('/')[-1]
self.image_list.append(image_name)
self.xml_list.append(xml_name)
def __len__(self):
return len(self.image_list)
def __getitem__(self, idx):
img_path = os.path.join(self.root_dir, 'RGB-PanSharpen', self.image_list[idx])
target_path = os.path.join(self.root_dir, 'annotations', self.image_list[idx].replace('.tif', 'segcls.tif'))
image = np.array(Image.open(img_path))
target = np.array(Image.open(target_path))
target[target == 100] = 1 # building interior
target[target == 255] = 2 # border
sample = {'image': image, 'target': target, 'image_name': self.image_list[idx]}
if self.transform:
sample = self.transform(sample)
return sample
To create the DataLoader, I have something like:
dset_train = SpaceNetDataset(data_path_train, split_tags, transform=T.Compose([ToTensor()]))
loader_train = DataLoader(dset_train, batch_size=train_batch_size, shuffle=True,
num_workers=num_workers)
I then iterate over the data loader by doing something like:
for batch in loader_train:
image_tensors = batch['image']
images = batch['image'].cpu().numpy()
break # take the first shuffled batch
but then I get the error:
Traceback (most recent call last):
File "training/train_aml.py", line 137, in <module> sample_images_train, sample_images_train_tensors = get_sample_images(which_set='train')
File "training/train_aml.py", line 123, in get_sample_images for i, batch in enumerate(loader):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 345, in __next__ data = self._next_data() File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 856, in _next_data return self._process_data(data)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 881, in _process_data data.reraise()
File "/usr/local/lib/python3.6/dist-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/collate.py", line 74, in default_collate return {key: default_collate([d[key] for d in batch]) for key in elem}
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/collate.py", line 74, in <dictcomp> return {key: default_collate([d[key] for d in batch]) for key in elem}
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate return torch.stack(batch, 0, out=out)
RuntimeError: Expected object of scalar type unsigned char but got scalar type float for sequence element 9.
The error seems quite similar to this one, although I did try a similar solution by casting:
dtype = torch.cuda.CharTensor if torch.cuda.is_available() else torch.CharTensor
for batch in loader:
batch['image'] = batch['image'].type(dtype)
batch['target'] = batch['target'].type(dtype)
but I end up with the same error.
A couple of other things that are weird:
This seems to be non-deterministic. Most of the time I get this error, but some times the code keeps running (not sure why)
The "Sequence Element" number at the end of the error message keeps changing. In this case it was "sequence element 9" sometimes it's "sequence element 2", etc. Not sure why.

Ah nevermind.
Turns out unsigned char comes from C++ where it gives you 0 to 255, so it makes sense that's what it expects from image data.
So I actually fixed this by doing:
image = np.array(Image.open(img_path)).astype(np.int)
target = np.array(Image.open(target_path)).astype(np.int)
inside the SpaceNetDataset class and it seemed to work!

Converting a numpy array to integer with astype(int) does not work on Python 3.6

I have a large text file with labels 0 or 1 in each line like this:
1
0
0
1
...
I load it, convert it into a numpy array, and then I want to convert the array into dtype=int64 (as I assume these are strings). I do it like this:
def load_data(infile):
text_file = open(infile,'r')
text = text_file.readlines()
text = map(str.strip,text)
return text
labels = load_data('labels.txt')
labels_encoded = np.array(labels)
labels_encoded = labels_encoded.astype(int)
It works fine in Python 2.7, and I can work on the array later with my code, but for now I'm stuck with Python 3.6 and when I ran my code, I get an error:
Traceback (most recent call last):
File "dText.py", line 77, in <module>
labels_encoded = labels_encoded.astype(int)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'map'
Can anyone help me figure out what is going on here and how to get it to work on Python 3.6? I also tried:
labels_encoded = np.int_(labels_encoded)
but I got the same error. I'm using numpy version 1.13.3. Thanks.

You are passing a map object into the array and trying to convert it. Take a look at the array once it has been created. It looks like this:
array(<map object at 0x127680cf8>, dtype=object)
Try using list(map(...)) instead.
def load_data(infile):
text_file = open(infile,'r')
text = text_file.readlines()
text = list(map(str.strip,text))
return text
labels = load_data('labels.txt')
labels_encoded = np.array(labels)
labels_encoded = labels_encoded.astype(int)
labels_encoded
array([1, 0, 1, 0])
If you are just making the jump from 2.7 you should note that map no longer returns a list but an iterable.

I had same problem. My code did not work:
train_size = np.ceil(len(dataset) * 0.8).astype(int)
print(type(train_size)) # --> numpy.int32
But this works great:
train_size = int(np.ceil(len(dataset) * 0.8))
print(type(train_size)) # --> int

ifft function gives "'str' object is not callable" error

I am trying to take the inverse Fourier transform of a list, and for some reason I keep getting the following error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "simulating_coherent_data.py", line 238, in <module>
exec('ift%s = np.fft.ifft(nd.array(FTxSQRT_PS%s))'(x,x))
TypeError: 'str' object is not callable
And I can't figure out where I have a string. The part of my code it relates to is as follows
def FTxSQRT_PS(FT,PS):
# Import: The Fourier Transform and the Power Spectrum, both as lists
# Export: The result of FTxsqrt(PS), as a list
# Function:
# Takes each element in the FT and PS and finds FTxsqrt(PS) for each
# appends each results to a list called signal
signal = []
print type(PS)
for x in range(len(FT)):
indiv_signal = np.abs(FT[x])*math.sqrt(PS[x])
signal.append(indiv_signal)
return signal
for x in range(1,number_timesteps+1):
exec('FTxSQRT_PS%s = FTxSQRT_PS(fshift%s,power_spectrum%s)'%(x,x,x))
exec('ift%s = np.fft.ifft(FTxSQRT_PS%s)'(x,x))
Where FTxSQRT_PS%s are all lists. fshift%s is a np.array and power_spectrum%s is a list. I've also tried setting the type for FTxSQRT_PS%s as a np.array but that did not help.
I have very similar code a few lines up that works fine;
for x in range(1,number_timesteps+1):
exec('fft%s = np.fft.fft(source%s)'%(x,x))
where source%s are all type np.array
The only thing I can think of is that maybe np.fft.ifft is not how I should be taking the inverse Fourier transform for Python 2.7.6 but I also cannot find an alternative.
Let me know if you'd like to see the whole code, there is about 240 lines up to where I'm having trouble, though a lot of that is commenting.
Thanks for any help,
Teresa

You are missing a %
exec('ift%s = np.fft.ifft(FTxSQRT_PS%s)'(x,x))
Should be:
exec('ift%s = np.fft.ifft(FTxSQRT_PS%s)'%(x,x))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

problem related to h5py and create_dataset - python

Related

trend following using meta label issue with time ubtracting `n`, use `n * obj.freq`

TypeError: '(slice(0, 15, None), 15)' is an invalid key

Iterating through DataLoader (PyTorch): RuntimeError: Expected object of scalar type unsigned char but got scalar type float for sequence element 9

Converting a numpy array to integer with astype(int) does not work on Python 3.6

ifft function gives "'str' object is not callable" error

Categories

Resources