How to set a condition statement on a loop process - python

I am using Python 3; I have a problem in setting a condition statement over some groups (to consider pixel only when there are more than 5 available data) in a loop and I expect to get a blank pixel whether the condition isn't satisfied.
I tried some 'if' statement, but I am constantly getting a KeyError when the condition isn't maybe satisfied.
I'll show the code:
Xpix = 78
Ypix = 30
row = []
mean_val = []
for i in range (0,Ypix):
for j in range (0,Xpix):
if(len(data_pixel.groupby(['lin','col']).get_group((i,j))[['gamma']])>=5):
means = data_pixel.groupby(['lin','col']).get_group((i,j))[['gamma'].mean()
else:
means = 0
row.append(means)
mean_val = np.array(row).reshape(Ypix, Xpix)
I expect a 78 x 30 array to plot with blank pixels and mean pixels.
Here I show the error I got:
Traceback (most recent call last):
File "map.py", line 415, in <module>
proc.process()
File "map.py", line 215, in process
if (len(data_pixel.groupby(['lin', 'col']).get_group((i,j))[['gamma']])>=5):
File "/xxx/yyy/anaconda3/envs/gnss/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 680, in get_group
raise KeyError(name)
KeyError: (10,41)
data_pixel refers to a big dataframe with a lot of data. I would appreciate a lot if anyone could help with this.

Related

trend following using meta label issue with time ubtracting `n`, use `n * obj.freq`

I am trying to implement the trend following labeling from asset managment book. I found the following code that I wanted to implement, however I am getting an error
TypeError: Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported. Instead of adding/subtracting n, use n * obj.freq
!pip install yfinance
!pip install mplfinance
import yfinance as yf
import mplfinance as mpf
import numpy as np
import pandas as pd
# get the data from yfiance
df=yf.download('BTC-USD',start='2008-01-04',end='2021-06-3',interval='1d')
#code snippet 5.1
# Fit linear regression on close
# Return the t-statistic for a given parameter estimate.
def tValLinR(close):
#tValue from a linear trend
x = np.ones((close.shape[0],2))
x[:,1] = np.arange(close.shape[0])
ols = sm1.OLS(close, x).fit()
return ols.tvalues[1]
#code snippet 5.2
'''
#search for the maximum absolutet-value. To identify the trend
# - molecule - index of observations we wish to labels.
# - close - which is the time series of x_t
# - span - is the set of values of L (look forward period) that the algorithm will #try (window_size)
# The L that maximizes |tHat_B_1| (t-value) is choosen - which is the look-forward #period
# with the most significant trend. (optimization)
'''
def getBinsFromTrend(molecule, close, span):
#Derive labels from the sign of t-value of trend line
#output includes:
# - t1: End time for the identified trend
# - tVal: t-value associated with the estimated trend coefficient
#- bin: Sign of the trend (1,0,-1)
#The t-statistics for each tick has a different look-back window.
#- idx start time in look-forward window
#- dt1 stop time in look-forward window
#- df1 is the look-forward window (window-size)
#- iloc ?
out = pd.DataFrame(index=molecule, columns=['t1', 'tVal', 'bin', 'windowSize'])
hrzns = range(*span)
windowSize = span[1] - span[0]
maxWindow = span[1]-1
minWindow = span[0]
for idx in close.index:
idx += maxWindow
if idx >= len(close):
break
df_tval = pd.Series(dtype='float64')
iloc0 = close.index.get_loc(idx)
if iloc0+max(hrzns) > close.shape[0]:
continue
for hrzn in hrzns:
dt1 = close.index[iloc0-hrzn+1]
df1 = close.loc[dt1:idx]
df_tval.loc[dt1] = tValLinR(df1.values) #calculates t-statistics on period
dt1 = df_tval.replace([-np.inf, np.inf, np.nan], 0).abs().idxmax() #get largest t-statistics calculated over span period
print(df_tval.index[-1])
print(dt1)
print(abs(df_tval.values).argmax() + minWindow)
out.loc[idx, ['t1', 'tVal', 'bin', 'windowSize']] = df_tval.index[-1], df_tval[dt1], np.sign(df_tval[dt1]), abs(df_tval.values).argmax() + minWindow #prevent leakage
out['t1'] = pd.to_datetime(out['t1'])
out['bin'] = pd.to_numeric(out['bin'], downcast='signed')
#deal with massive t-Value outliers - they dont provide more confidence and they ruin the scatter plot
tValueVariance = out['tVal'].values.var()
tMax = 20
if tValueVariance < tMax:
tMax = tValueVariance
out.loc[out['tVal'] > tMax, 'tVal'] = tMax #cutoff tValues > 20
out.loc[out['tVal'] < (-1)*tMax, 'tVal'] = (-1)*tMax #cutoff tValues < -20
return out.dropna(subset=['bin'])
if __name__ == '__main__':
#snippet 5.3
idx_range_from = 3
idx_range_to = 10
df1 = getBinsFromTrend(df.index, df['Close'], [idx_range_from,idx_range_to,1]) #[3,10,1] = range(3,10) This is the issue
tValues = df1['tVal'].values #tVal
doNormalize = False
#normalise t-values to -1, 1
if doNormalize:
np.min(tValues)
minusArgs = [i for i in range(0, len(tValues)) if tValues[i] < 0]
tValues[minusArgs] = tValues[minusArgs] / (np.min(tValues)*(-1.0))
plus_one = [i for i in range(0, len(tValues)) if tValues[i] > 0]
tValues[plus_one] = tValues[plus_one] / np.max(tValues)
#+(idx_range_to-idx_range_from+1)
plt.scatter(df1.index, df0.loc[df1.index].values, c=tValues, cmap='viridis') #df1['tVal'].values, cmap='viridis')
plt.plot(df0.index, df0.values, color='gray')
plt.colorbar()
plt.show()
plt.savefig('fig5.2.png')
plt.clf()
plt.df['Close']()
plt.scatter(df1.index, df0.loc[df1.index].values, c=df1['bin'].values, cmap='vipridis')
#Test methods
ols_tvalue = tValLinR( np.array([3.0, 3.5, 4.0]) )
There are at least several issues with the code that you "found" (not just the one issue that you posted).
Before I go through some of the issues, let me say the following, which I may very well be wrong about, but based on my experience and on the way your question is worded (and the fact that you are "wanted to implement" code that you "found") it seems to me that you have minimal experience with coding and debugging.
Stackoverflow is not a place to ask others to debug your code. That said, I will try to walk you though some of the steps that I took in trying to figure out what's going on with this code, and then perhaps point you to some resources where you can learn the same skills.
Step 1:
I took the code as you posted it, and copy/pasted it into a file that I named so68871906.py; I then commented out the two lines at the top that are installing yfinance and mplfinance because I don't want to try to install them every time I run the code; rather I will install them once before running the code.
I then ran the code with the following result (similar to what you posted) ...
dino#DINO:~/code/mplfinance/examples/scratch_pad/issues$ python so68871906.py
[*********************100%***********************] 1 of 1 completed
Traceback (most recent call last):
File "so68871906.py", line 84, in <module>
df1 = getBinsFromTrend(df.index, df['Close'], [idx_range_from,idx_range_to,1]) #[3,10,1] = range(3,10) This is the issue
File "so68871906.py", line 50, in getBinsFromTrend
idx += maxWindow
File "pandas/_libs/tslibs/timestamps.pyx", line 310, in pandas._libs.tslibs.timestamps._Timestamp.__add__
TypeError: Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported. Instead of adding/subtracting `n`, use `n * obj.freq`
The key to successful debugging, especially in python, is to recognize that the Traceback gives you a lot of very important information. You just have to read through it very carefully. In the above case, the Traceback tells me:
The problem is with this line of code: idx += maxWindow. This line of code is adding idx + maxWindow and reassigning the result back to idx
The error is a TypeError which tells me there is a problem with the types of the variables. Since there are two variables on that line of code (idx and maxWindow) one may guess that one or both of those variables is the wrong type or otherwise incompatible with what the code is trying to do with the variable.
Based on the error message "Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported", and the fact that we are doing addition of idx and maxWindow, you can guess that one of the variables is of type integer or integer-array, while the other is of type Timestamp.
You can verify the types by adding print statements just before the error occurs. The code looks like this:
maxWindow = span[1]-1
minWindow = span[0]
for idx in close.index:
print('type(idx)=',type(idx))
print('type(maxWindow)=',type(maxWindow))
idx += maxWindow
Now the output looks like this:
dino#DINO:~/code/mplfinance/examples/scratch_pad/issues$ python so68871906.py
[*********************100%***********************] 1 of 1 completed
type(idx)= <class 'pandas._libs.tslibs.timestamps.Timestamp'>
type(maxWindow)= <class 'int'>
Traceback (most recent call last):
File "so68871906.py", line 86, in <module>
df1 = getBinsFromTrend(df.index, df['Close'], [idx_range_from,idx_range_to,1]) #[3,10,1] = range(3,10) This is the issue
File "so68871906.py", line 52, in getBinsFromTrend
idx += maxWindow
File "pandas/_libs/tslibs/timestamps.pyx", line 310, in pandas._libs.tslibs.timestamps._Timestamp.__add__
TypeError: Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported. Instead of adding/subtracting `n`, use `n * obj.freq`
Notice that indeed type(maxWindow) is int and type(idx) is Timestamp
The TypeError exception message further states "Instead of adding/subtracting n, use n * obj.freq" from which one may infer that n is intended to represent the integer. It seems that the error is suggesting that we multiply the integer by some frequency before adding it to the Timestamp variable. This is not entirely clear, so I Googled "pandas add integer to Timestamp" (because clearly that is what the code is trying to do). All of the top answers suggest using pandas.to_timedelta() or pandas.Timedelta().
At this point I thought to myself: it makes sense that you can't just add an integer to a Timestamp, because what are you adding? minutes? seconds? days? weeks?
However, you can add an integer number of one of these frequencies, in fact the pandas.Timedelta() constructor takes a value argument indicates a number of days, weeks, etc.
The data from yf.download() is daily (interval='1d') which suggest that the integer should be multiplied by a pandas.Timedelta of 1 day. I cannot be certain of this, because I don't have your textbook, so I am not 100% sure what the code is trying to accomplish there, but it is a reasonable guess, so I will change idx += maxWindow to
idx += (maxWindow*pd.Timedelta('1 day'))
and see what happens:
dino#DINO:~/code/mplfinance/examples/scratch_pad/issues$ python so68871906.py
[*********************100%***********************] 1 of 1 completed
Traceback (most recent call last):
File "so68871906.py", line 84, in <module>
df1 = getBinsFromTrend(df.index, df['Close'], [idx_range_from,idx_range_to,1]) #[3,10,1] = range(3,10) This is the issue
File "so68871906.py", line 51, in getBinsFromTrend
if idx >= len(close):
TypeError: '>=' not supported between instances of 'Timestamp' and 'int'
Step 2:
The code passes the modified line of code (line 50), but now it's failing on the next line of code with a similar TypeError in that it doesn't support comparing ('>=') and integer and a Timestamp. So next I try similarly modifying line 51 to:
if idx >= len(close)*pd.Timedelta('1 day'):
The result:
dino#DINO:~/code/mplfinance/examples/scratch_pad/issues$ python so68871906.py
[*********************100%***********************] 1 of 1 completed
Traceback (most recent call last):
File "so68871906.py", line 84, in <module>
df1 = getBinsFromTrend(df.index, df['Close'], [idx_range_from,idx_range_to,1]) #[3,10,1] = range(3,10) This is the issue
File "so68871906.py", line 51, in getBinsFromTrend
if idx >= len(close)*pd.Timedelta('1 day'):
TypeError: '>=' not supported between instances of 'Timestamp' and 'Timedelta'
This doesn't work either, as you can see (can't compare Timestamp and Timedelta).
Step 3:
Looking more closely at the code, it seems the code is trying to determine if adding maxWindow to idx has moved idx past the end of the data.
Looking a couple lines higher in the code, you can see that the variable idx comes from the list of Timestamp objects in close.index, so perhaps the correct comparison would be:
if idx >= close.index[-1]:
that is, comparing idx to the last possible idx value. The result:
dino#DINO:~/code/mplfinance/examples/scratch_pad/issues$ python so68871906.py
[*********************100%***********************] 1 of 1 completed
Traceback (most recent call last):
File "so68871906.py", line 84, in <module>
df1 = getBinsFromTrend(df.index, df['Close'], [idx_range_from,idx_range_to,1]) #[3,10,1] = range(3,10) This is the issue
File "so68871906.py", line 60, in getBinsFromTrend
df_tval.loc[dt1] = tValLinR(df1.values) #calculates t-statistics on period
File "so68871906.py", line 18, in tValLinR
ols = sm1.OLS(close, x).fit()
NameError: name 'sm1' is not defined
Step 4:
Wow! Great, we got past the errors on lines 50 and 51. But now we have an error on line 18. Indicating that the name sm1 is not defined. This suggests that either you did not copy all of the code that you should have, or perhaps there is something else that needs to be imported that would define sm1 for the python interpreter.
So you see, this is the basic process of debugging your code. Again the key, at least with python, is carefully reading the Traceback. And again, Stackoverflow is not intended as a place to ask others to debug your code. A little bit of searching online for things like "learn python" and "python debugging techniques" will yield a wealth of helpful information.
I hope this is pointing you in the right direction. If for some reason I am way off base with this answer, let me know and I will delete it.

Index 1 is out of bounds for axis 0 in size 1 in Python

For a university assignment I was asked to convert a 1 line text file into a 2d array. However, when I run the program, I get this error:
(venv) D:\Uni Stuff\Year 2\AIGP\Assignment\PYTHONASSIGNMEN>python astar.py
Input file name: Lab9TerrainFile1.txt
Traceback (most recent call last):
File "D:\Uni Stuff\Year 2\AIGP\Assignment\PYTHONASSIGNMEN\astar.py", line 129, in <module>
main()
File "D:\Uni Stuff\Year 2\AIGP\Assignment\PYTHONASSIGNMEN\astar.py", line 110, in main
number_of_rows = maze_file[1]
IndexError: index 1 is out of bounds for axis 0 with size 1
This is the code for generating the maze:
def main():
maze_file = open(input("Input file name: "), "r").readlines()
maze_file = np.array([maze_file])
number_of_columns = maze_file[0]
number_of_rows = maze_file[1]
maze_column = np.array_split(maze_file[2:8], number_of_columns)
maze_row = np.array_split(maze_file[2:8], number_of_rows)
maze = np.concatenate([maze_column][maze_row])
start = np.where(maze == 2)
end = np.where(maze == 3)
maze_file.close()
path = astar(maze, start, end)
print(path)
Any help would be appreciated and thank you!
You can test this by checking the size of your array maze_file by running the code below.
print(len(maze_file))
If it returns 1, then it means it only has 1 element.
maze_file[0] means you are getting the first element. Hence, the index 0 between the square brackets. When you specify maze_file[1], its trying to get the 2nd element, which doesn't exists. Hence the error Index out of Bounds.
Reviewing your code, it looks like you are trying to get the number of columns and rows for the array. You can use the following code.
number_of_columns = len(maze_file)
number_of_rows = len(maze_file[0])

ValueError: could not broadcast input array from shape (2) into shape (1) when using df.apply

I have a code that runs through each row/item in a series and turns it into a bigram/trigram. The code is the following
def splitting(txt,gram=2):
tx1 = txt.str.replace('[^\w\s]','').str.split().tolist()[0]
if(len(tx1)==0):
return np.nan
txlis = [w for w in tx1 if w.lower() not in stop_wrds]
if gram==2:
return map(tuple,set(map(frozenset,list(nltk.bigrams(txlis)))))
else:
return map(tuple,set(map(frozenset,list(nltk.trigrams(txlis)))))
#pdb.set_trace()
print len(namedat)
prop_data = pd.DataFrame(namedat.apply(splitting,axis=1))
The error comes in the last line when I apply to a series data called namedat that looks something like this:
0 inter-burgo ansan
1 dogo glory condo
2 w hotel
3 onyang grand hotel
4 onyang hot spring hotel
5 onyang cheil hotel (ex. onyang palace hotel)
6 springhill suites paso robles atascadero
7 best western plus colony inn
8 hesse
9 ibis styles aachen city
10 pullman aachen quellenhof
11 mercure aachen europaplatz
12 leonardo hotel aachen
13 aquis grana cityhotel
14 buschhausen
... ...
[166295 rows x 1 columns]
ValueError: could not broadcast input array from shape (2) into shape (1) when using df.apply
I tried debugging and the txts and bigrams are all generated succesfully, there seems to be no issue with the function called splitting. I am out of ideas on how to go about solving this. Please help
The complete error message:
Traceback (most recent call last):
File "data_playground.py", line 163, in <module>
main()
File "data_playground.py", line 156, in main
createparams(db.hotelbeds_properties,"hotelbeds")
File "data_playground.py", line 139, in createparams
prop_params = analyze(prop_subdf)
File "data_playground.py", line 110, in analyze
prop_data = pd.DataFrame(namedat.apply(splitting,axis=1))
File "/home/shubhang/.virtualenvs/pa/local/lib/python2.7/site-packages/pandas/core/frame.py", line 4877, in apply
ignore_failures=ignore_failures)
File "/home/shubhang/.virtualenvs/pa/local/lib/python2.7/site-packages/pandas/core/frame.py", line 4990, in _apply_standard
result = self._constructor(data=results, index=index)
File "/home/shubhang/.virtualenvs/pa/local/lib/python2.7/site-packages/pandas/core/frame.py", line 330, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/home/shubhang/.virtualenvs/pa/local/lib/python2.7/site-packages/pandas/core/frame.py", line 461, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "/home/shubhang/.virtualenvs/pa/local/lib/python2.7/site-packages/pandas/core/frame.py", line 6173, in _arrays_to_mgr
return create_block_manager_from_arrays(arrays, arr_names, axes)
File "/home/shubhang/.virtualenvs/pa/local/lib/python2.7/site-packages/pandas/core/internals.py", line 4642, in create_block_manager_from_arrays
construction_error(len(arrays), arrays[0].shape, axes, e)
File "/home/shubhang/.virtualenvs/pa/local/lib/python2.7/site-packages/pandas/core/internals.py", line 4604, in construction_error
raise e
ValueError: could not broadcast input array from shape (2) into shape (1)
An example of what my code does:
It takes a row from the table shown above for ex:
name shaba boutique hotel
Name: 166278, dtype: object
and then returns bigrams made from it
[(u'shaba', u'boutique'), (u'boutique', u'hotel')]
If I do a simple for loop (using iterrows), the function works and I get a list. I do not understand why the apply function fails.
The reason for this error is that df.apply(axis=1) is expecting a single value back to make a series out of it, you can read more about it here. Your code is returning the result of map(tuple(...)) which has a shape > 1 for any row that has more than two words. You can try this out on a small, fake dataframe and see that it works with it as is below,
namedat_s = pd.Series(['inter-burgo ansan', 'glory condo', 'w hotel'])
namedat = pd.DataFrame(namedat_s)
...but put 'dogo' back in, and you'll get the error again. This is a good example of why single long lines of code are not always useful, especially if you are just starting.
If you would have tried this, you probably would have found the answer sooner:
def splitting(txt,gram=2):
tx1 = txt.str.replace('[^\w\s]','').str.split().tolist()[0]
if(len(tx1)==0):
return np.nan
txlis = [w for w in tx1 if w.lower() not in stop_wrds]
print 1, txlis
print 2, find_ngrams(txlis,2)
print 3, list(find_ngrams(txlis,2))
print 4, map(frozenset,list(find_ngrams(txlis,2)))
print 5, set(map(frozenset,list(find_ngrams(txlis,2))))
print 6, map(tuple,set(map(frozenset,list(find_ngrams(txlis,2)))))
print len(map(tuple,set(map(frozenset,list(find_ngrams(txlis,2))))))
if gram==2:
return map(tuple,set(map(frozenset,list(find_ngrams(txlis,2)))))
else:
return map(tuple,set(map(frozenset,list(find_ngrams(txlis,2)))))
You'd see that the error happens, as you said, not in the splitting function, but in what happens after the return, and knowing what is being returned would give you big clue as to why.

Error while using sum() in Python SFrame

I'm new to python and I'm performing a basic EDA analysis on two similar SFrames. I have a dictionary as two of my columns and I'm trying to find out if the max values of each dictionary are the same or not. In the end I want to sum up the Value_Match column so that I can know how many values match but I'm getting a nasty error and I haven't been able to find the source. The weird thing is I have used the same methodology for both the SFrames and only one of them is giving me this error but not the other one.
I have tried calculating max_func in different ways as given here but the same error has persisted : getting-key-with-maximum-value-in-dictionary
I have checked for any possible NaN values in the column but didn't find any of them.
I have been stuck on this for a while and any help will be much appreciated. Thanks!
Code:
def max_func(d):
v=list(d.values())
k=list(d.keys())
return k[v.index(max(v))]
sf['Max_Dic_1'] = sf['Dic1'].apply(max_func)
sf['Max_Dic_2'] = sf['Dic2'].apply(max_func)
sf['Value_Match'] = sf['Max_Dic_1'] == sf['Max_Dic_2']
sf['Value_Match'].sum()
Error :
RuntimeError Traceback (most recent call last)
<ipython-input-70-f406eb8286b3> in <module>()
----> 1 x = sf['Value_Match'].sum()
2 y = sf.num_rows()
3
4 print x
5 print y
C:\Users\rakesh\Anaconda2\lib\site-
packages\graphlab\data_structures\sarray.pyc in sum(self)
2216 """
2217 with cython_context():
-> 2218 return self.__proxy__.sum()
2219
2220 def mean(self):
C:\Users\rakesh\Anaconda2\lib\site-packages\graphlab\cython\context.pyc in
__exit__(self, exc_type, exc_value, traceback)
47 if not self.show_cython_trace:
48 # To hide cython trace, we re-raise from here
---> 49 raise exc_type(exc_value)
50 else:
51 # To show the full trace, we do nothing and let
exception propagate
RuntimeError: Runtime Exception. Exception in python callback function
evaluation:
ValueError('max() arg is an empty sequence',):
Traceback (most recent call last):
File "graphlab\cython\cy_pylambda_workers.pyx", line 426, in
graphlab.cython.cy_pylambda_workers._eval_lambda
File "graphlab\cython\cy_pylambda_workers.pyx", line 169, in
graphlab.cython.cy_pylambda_workers.lambda_evaluator.eval_simple
File "<ipython-input-63-b4e3c0e28725>", line 4, in max_func
ValueError: max() arg is an empty sequence
In order to debug this problem, you have to look at the stack trace. On the last line we see:
File "<ipython-input-63-b4e3c0e28725>", line 4, in max_func
ValueError: max() arg is an empty sequence
Python thus says that you aim to calculate the maximum of a list with no elements. This is the case if the dictionary is empty. So in one of your dataframes there is probably an empty dictionary {}.
The question is what to do in case the dictionary is empty. You might decide to return a None into that case.
Nevertheless the code you write is too complicated. A simpler and more efficient algorithm would be:
def max_func(d):
if d:
return max(d,key=d.get)
else:
# or return something if there is no element in the dictionary
return None

Index out of bound while reading a dataframe

I have a tab separated file that I am trying to parse and for that I am doing this :
header of my file :
chrom coord ref_base var_base A C G T
17 26695663 G A 1 0 1934 0
17 26695664 T A 1 0 1 1935
my code is :
counts = pd.read_csv(args.counts_file, sep='\t')
toto = counts[(counts['chrom'].astype(str) == "17") & (counts['coord'].astype(str) == "26695663")]
print toto["G"].values[0]
this function returns the number wanted which is 1934
Now when I try to create a function that takes arguments the dataframe read from the file, I wrote this function
def get_foreground_counts(chrom, coord, counts, ref_base, var_base):
foreground_counts = counts[(counts['chrom'] == chrom) & (counts['coord'] == coord)]
foreground_ref_counts = foreground_counts[ref_base].values[0]
foreground_var_counts = foreground_counts[var_base].values[0]
return foreground_ref_counts, foreground_var_counts
I got this error that I am trying to figure out but still cant see why
Traceback (most recent call last):
File "test.py", line 203, in <module>
main(args)
File "test.py", line 71, in main
foreground_ref_counts, foreground_var_counts = get_foreground_counts(chrom, coord, counts, ref_base, var_base)
File "test.py", line 137, in get_foreground_counts
foreground_ref_counts = foreground_counts[ref_base].values[0]
IndexError: index out of bounds
Any idea why ?
Thanks
UPDATE
When I try to print foreground_counts[ref_base].values I get this []
What I am passing to the function is chrom (string), coord(string), counts(panda dataframe), ref_base (string), var_base(string) )
In your function, your filter does return zero rows, that's why you get the error. It seems you forgot the .astype(str) in your function's first line.
You could either cast the column type before calling the function or modify that line. The former would be a better approach if you really need to use a string type, otherwise why don't you use integer values for the comparison?.

Categories

Resources