Pyomo TypeError: unhashable type: 'EqualityExpression' - python

I am building an energy planning model in Pyomo and I am running into problems building some power grid constraints.
def grid2grid_rule(m, ts):
return m.power['grid','grid', ts] == 0
m.const_grid2grid = Constraint(ts_i, grid2grid_rule)
def import_rule(m, ts):
return m.gridImport[ts] == sum(m.power['grid',derIn,ts] for derIn in elIn)
m.const_import = Constraint(ts_i, rule = import_rule)
def export_rule(m, ts):
return m.gridExport[ts] == sum(m.power[derOut,'grid',ts] for derOut in elOut)
m.const_export = Constraint(ts_i, export_rule)
Definition of Power:
m.power = Var(elOut, elIn, ts_i, within = NonNegativeReals)
Explaining the code:
m.power is a decision variable with 3 indices: The electricity source (elOut), the electricity 'usage' (elIn) and the current timestep index ts_i. elOut and elIn are numpy arrays with strings and ts_i a numpy array with integers from 0 to how many timesteps there are.
The first constraint just says that at any timestep there the electricity cannot flow from the grid to the grid. The import constraint says that the grid imports at each timestep are the sum over all power flows from the grid to electricity takers. The export constraint says that the grid exports at each timestep are a sum of all powerflows from electricity 'givers' to the grid.
Now, my problem is, when I comment the grid2grid and the export constraint, it works and a set of constraints is built as expected. However, for example when I uncomment the export rule, which is almost identical to the import rule, I get this error:
m = build_model('Input_Questionaire.xlsx', 'DER_excel', yeardivision = "repr_day")
ERROR: Constructing component 'const_export_index_1' from data=None failed:
TypeError: Problem inserting gridExport[1] == power[pv_ground,grid,1] +
power[wind_s,grid,1] + power[battery,grid,1] + power[grid,grid,1] into set
const_export_index_1
Traceback (most recent call last):
File "C:\Users\Axel\Anaconda3\lib\site-packages\pyomo\core\base\sets.py", line 824, in add
if tmp in self:
File "C:\Users\Axel\Anaconda3\lib\site-packages\pyomo\core\base\sets.py", line 998, in __contains__
return self._set_contains(element)
File "C:\Users\Axel\Anaconda3\lib\site-packages\pyomo\core\base\sets.py", line 1302, in _set_contains
return element in self.value
TypeError: unhashable type: 'EqualityExpression'
Accompanied with this error:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
...
...
...
File "C:\Users\Axel\Anaconda3\lib\site-packages\pyomo\core\base\sets.py", line 833, in add
raise TypeError("Problem inserting "+str(tmp)+" into set "+self.name)
TypeError: Problem inserting gridExport[1] == power[pv_ground,grid,1] + power[wind_s,grid,1] + power[battery,grid,1] + power[grid,grid,1] into set const_export_index_1
I do not know how to fix it, especially since there is basically no difference in the two Constraints...
Thanks heaps for your help!
Axel

Ugh... just saw it. It's an easy one. :)
you omitted "rule=" portion of the constraint construction, so it is passing in the function as a set or something weird...
Anyhow. Change:
m.const_export = Constraint(ts_i, export_rule)
to:
m.const_export = Constraint(ts_i, rule=export_rule)
same for your grid2grid

Related

trend following using meta label issue with time ubtracting `n`, use `n * obj.freq`

I am trying to implement the trend following labeling from asset managment book. I found the following code that I wanted to implement, however I am getting an error
TypeError: Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported. Instead of adding/subtracting n, use n * obj.freq
!pip install yfinance
!pip install mplfinance
import yfinance as yf
import mplfinance as mpf
import numpy as np
import pandas as pd
# get the data from yfiance
df=yf.download('BTC-USD',start='2008-01-04',end='2021-06-3',interval='1d')
#code snippet 5.1
# Fit linear regression on close
# Return the t-statistic for a given parameter estimate.
def tValLinR(close):
#tValue from a linear trend
x = np.ones((close.shape[0],2))
x[:,1] = np.arange(close.shape[0])
ols = sm1.OLS(close, x).fit()
return ols.tvalues[1]
#code snippet 5.2
'''
#search for the maximum absolutet-value. To identify the trend
# - molecule - index of observations we wish to labels.
# - close - which is the time series of x_t
# - span - is the set of values of L (look forward period) that the algorithm will #try (window_size)
# The L that maximizes |tHat_B_1| (t-value) is choosen - which is the look-forward #period
# with the most significant trend. (optimization)
'''
def getBinsFromTrend(molecule, close, span):
#Derive labels from the sign of t-value of trend line
#output includes:
# - t1: End time for the identified trend
# - tVal: t-value associated with the estimated trend coefficient
#- bin: Sign of the trend (1,0,-1)
#The t-statistics for each tick has a different look-back window.
#- idx start time in look-forward window
#- dt1 stop time in look-forward window
#- df1 is the look-forward window (window-size)
#- iloc ?
out = pd.DataFrame(index=molecule, columns=['t1', 'tVal', 'bin', 'windowSize'])
hrzns = range(*span)
windowSize = span[1] - span[0]
maxWindow = span[1]-1
minWindow = span[0]
for idx in close.index:
idx += maxWindow
if idx >= len(close):
break
df_tval = pd.Series(dtype='float64')
iloc0 = close.index.get_loc(idx)
if iloc0+max(hrzns) > close.shape[0]:
continue
for hrzn in hrzns:
dt1 = close.index[iloc0-hrzn+1]
df1 = close.loc[dt1:idx]
df_tval.loc[dt1] = tValLinR(df1.values) #calculates t-statistics on period
dt1 = df_tval.replace([-np.inf, np.inf, np.nan], 0).abs().idxmax() #get largest t-statistics calculated over span period
print(df_tval.index[-1])
print(dt1)
print(abs(df_tval.values).argmax() + minWindow)
out.loc[idx, ['t1', 'tVal', 'bin', 'windowSize']] = df_tval.index[-1], df_tval[dt1], np.sign(df_tval[dt1]), abs(df_tval.values).argmax() + minWindow #prevent leakage
out['t1'] = pd.to_datetime(out['t1'])
out['bin'] = pd.to_numeric(out['bin'], downcast='signed')
#deal with massive t-Value outliers - they dont provide more confidence and they ruin the scatter plot
tValueVariance = out['tVal'].values.var()
tMax = 20
if tValueVariance < tMax:
tMax = tValueVariance
out.loc[out['tVal'] > tMax, 'tVal'] = tMax #cutoff tValues > 20
out.loc[out['tVal'] < (-1)*tMax, 'tVal'] = (-1)*tMax #cutoff tValues < -20
return out.dropna(subset=['bin'])
if __name__ == '__main__':
#snippet 5.3
idx_range_from = 3
idx_range_to = 10
df1 = getBinsFromTrend(df.index, df['Close'], [idx_range_from,idx_range_to,1]) #[3,10,1] = range(3,10) This is the issue
tValues = df1['tVal'].values #tVal
doNormalize = False
#normalise t-values to -1, 1
if doNormalize:
np.min(tValues)
minusArgs = [i for i in range(0, len(tValues)) if tValues[i] < 0]
tValues[minusArgs] = tValues[minusArgs] / (np.min(tValues)*(-1.0))
plus_one = [i for i in range(0, len(tValues)) if tValues[i] > 0]
tValues[plus_one] = tValues[plus_one] / np.max(tValues)
#+(idx_range_to-idx_range_from+1)
plt.scatter(df1.index, df0.loc[df1.index].values, c=tValues, cmap='viridis') #df1['tVal'].values, cmap='viridis')
plt.plot(df0.index, df0.values, color='gray')
plt.colorbar()
plt.show()
plt.savefig('fig5.2.png')
plt.clf()
plt.df['Close']()
plt.scatter(df1.index, df0.loc[df1.index].values, c=df1['bin'].values, cmap='vipridis')
#Test methods
ols_tvalue = tValLinR( np.array([3.0, 3.5, 4.0]) )
There are at least several issues with the code that you "found" (not just the one issue that you posted).
Before I go through some of the issues, let me say the following, which I may very well be wrong about, but based on my experience and on the way your question is worded (and the fact that you are "wanted to implement" code that you "found") it seems to me that you have minimal experience with coding and debugging.
Stackoverflow is not a place to ask others to debug your code. That said, I will try to walk you though some of the steps that I took in trying to figure out what's going on with this code, and then perhaps point you to some resources where you can learn the same skills.
Step 1:
I took the code as you posted it, and copy/pasted it into a file that I named so68871906.py; I then commented out the two lines at the top that are installing yfinance and mplfinance because I don't want to try to install them every time I run the code; rather I will install them once before running the code.
I then ran the code with the following result (similar to what you posted) ...
dino#DINO:~/code/mplfinance/examples/scratch_pad/issues$ python so68871906.py
[*********************100%***********************] 1 of 1 completed
Traceback (most recent call last):
File "so68871906.py", line 84, in <module>
df1 = getBinsFromTrend(df.index, df['Close'], [idx_range_from,idx_range_to,1]) #[3,10,1] = range(3,10) This is the issue
File "so68871906.py", line 50, in getBinsFromTrend
idx += maxWindow
File "pandas/_libs/tslibs/timestamps.pyx", line 310, in pandas._libs.tslibs.timestamps._Timestamp.__add__
TypeError: Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported. Instead of adding/subtracting `n`, use `n * obj.freq`
The key to successful debugging, especially in python, is to recognize that the Traceback gives you a lot of very important information. You just have to read through it very carefully. In the above case, the Traceback tells me:
The problem is with this line of code: idx += maxWindow. This line of code is adding idx + maxWindow and reassigning the result back to idx
The error is a TypeError which tells me there is a problem with the types of the variables. Since there are two variables on that line of code (idx and maxWindow) one may guess that one or both of those variables is the wrong type or otherwise incompatible with what the code is trying to do with the variable.
Based on the error message "Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported", and the fact that we are doing addition of idx and maxWindow, you can guess that one of the variables is of type integer or integer-array, while the other is of type Timestamp.
You can verify the types by adding print statements just before the error occurs. The code looks like this:
maxWindow = span[1]-1
minWindow = span[0]
for idx in close.index:
print('type(idx)=',type(idx))
print('type(maxWindow)=',type(maxWindow))
idx += maxWindow
Now the output looks like this:
dino#DINO:~/code/mplfinance/examples/scratch_pad/issues$ python so68871906.py
[*********************100%***********************] 1 of 1 completed
type(idx)= <class 'pandas._libs.tslibs.timestamps.Timestamp'>
type(maxWindow)= <class 'int'>
Traceback (most recent call last):
File "so68871906.py", line 86, in <module>
df1 = getBinsFromTrend(df.index, df['Close'], [idx_range_from,idx_range_to,1]) #[3,10,1] = range(3,10) This is the issue
File "so68871906.py", line 52, in getBinsFromTrend
idx += maxWindow
File "pandas/_libs/tslibs/timestamps.pyx", line 310, in pandas._libs.tslibs.timestamps._Timestamp.__add__
TypeError: Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported. Instead of adding/subtracting `n`, use `n * obj.freq`
Notice that indeed type(maxWindow) is int and type(idx) is Timestamp
The TypeError exception message further states "Instead of adding/subtracting n, use n * obj.freq" from which one may infer that n is intended to represent the integer. It seems that the error is suggesting that we multiply the integer by some frequency before adding it to the Timestamp variable. This is not entirely clear, so I Googled "pandas add integer to Timestamp" (because clearly that is what the code is trying to do). All of the top answers suggest using pandas.to_timedelta() or pandas.Timedelta().
At this point I thought to myself: it makes sense that you can't just add an integer to a Timestamp, because what are you adding? minutes? seconds? days? weeks?
However, you can add an integer number of one of these frequencies, in fact the pandas.Timedelta() constructor takes a value argument indicates a number of days, weeks, etc.
The data from yf.download() is daily (interval='1d') which suggest that the integer should be multiplied by a pandas.Timedelta of 1 day. I cannot be certain of this, because I don't have your textbook, so I am not 100% sure what the code is trying to accomplish there, but it is a reasonable guess, so I will change idx += maxWindow to
idx += (maxWindow*pd.Timedelta('1 day'))
and see what happens:
dino#DINO:~/code/mplfinance/examples/scratch_pad/issues$ python so68871906.py
[*********************100%***********************] 1 of 1 completed
Traceback (most recent call last):
File "so68871906.py", line 84, in <module>
df1 = getBinsFromTrend(df.index, df['Close'], [idx_range_from,idx_range_to,1]) #[3,10,1] = range(3,10) This is the issue
File "so68871906.py", line 51, in getBinsFromTrend
if idx >= len(close):
TypeError: '>=' not supported between instances of 'Timestamp' and 'int'
Step 2:
The code passes the modified line of code (line 50), but now it's failing on the next line of code with a similar TypeError in that it doesn't support comparing ('>=') and integer and a Timestamp. So next I try similarly modifying line 51 to:
if idx >= len(close)*pd.Timedelta('1 day'):
The result:
dino#DINO:~/code/mplfinance/examples/scratch_pad/issues$ python so68871906.py
[*********************100%***********************] 1 of 1 completed
Traceback (most recent call last):
File "so68871906.py", line 84, in <module>
df1 = getBinsFromTrend(df.index, df['Close'], [idx_range_from,idx_range_to,1]) #[3,10,1] = range(3,10) This is the issue
File "so68871906.py", line 51, in getBinsFromTrend
if idx >= len(close)*pd.Timedelta('1 day'):
TypeError: '>=' not supported between instances of 'Timestamp' and 'Timedelta'
This doesn't work either, as you can see (can't compare Timestamp and Timedelta).
Step 3:
Looking more closely at the code, it seems the code is trying to determine if adding maxWindow to idx has moved idx past the end of the data.
Looking a couple lines higher in the code, you can see that the variable idx comes from the list of Timestamp objects in close.index, so perhaps the correct comparison would be:
if idx >= close.index[-1]:
that is, comparing idx to the last possible idx value. The result:
dino#DINO:~/code/mplfinance/examples/scratch_pad/issues$ python so68871906.py
[*********************100%***********************] 1 of 1 completed
Traceback (most recent call last):
File "so68871906.py", line 84, in <module>
df1 = getBinsFromTrend(df.index, df['Close'], [idx_range_from,idx_range_to,1]) #[3,10,1] = range(3,10) This is the issue
File "so68871906.py", line 60, in getBinsFromTrend
df_tval.loc[dt1] = tValLinR(df1.values) #calculates t-statistics on period
File "so68871906.py", line 18, in tValLinR
ols = sm1.OLS(close, x).fit()
NameError: name 'sm1' is not defined
Step 4:
Wow! Great, we got past the errors on lines 50 and 51. But now we have an error on line 18. Indicating that the name sm1 is not defined. This suggests that either you did not copy all of the code that you should have, or perhaps there is something else that needs to be imported that would define sm1 for the python interpreter.
So you see, this is the basic process of debugging your code. Again the key, at least with python, is carefully reading the Traceback. And again, Stackoverflow is not intended as a place to ask others to debug your code. A little bit of searching online for things like "learn python" and "python debugging techniques" will yield a wealth of helpful information.
I hope this is pointing you in the right direction. If for some reason I am way off base with this answer, let me know and I will delete it.

How to recognize float variable? getting ValueError: could not convert string to float:

I'm trying to create a GUI for a signal analysis simulation that i'm writing in Python. For that, I use AppJar. However, when I call the function that generates the signal, I get a ValueError like in the title.
I've read every single ValueError post on stackOverflow (i could have missed one maybe, but i did my best) and all of them are about extra spacings, letters that can not be parsed as a floating point number, etc. None of that seems to apply here.
Basically, i'm using this code to call a function to generate my signal:
signal_axes = app.addPlot("signal", *logic.signal(5, 2), 0, 0, 1)
And the relevant part of the function itself (in the file logic.py, which is imported)
def signal(electrodes, length):
velocity = math.sqrt((3.2e-19 * kinetic_energy) / (mass * 1.66e-27))
frequency = velocity / length
This is not the whole function, the variables are all declared and unused variables are used later in the function.
The error specifically points to the line with "frequency = velocity / length", telling me:
TypeError: unsupported operand type(s) for /: 'float' and 'str'
When i try to fix it by using "float(length)" i get the error:
ValueError: could not convert string to float:
In one of the answers on StackExchange someone suggested using .strip() to get rid of invisible spaces. So i tried using:
length.strip()
But that gives me the following error:
AttributeError: 'float' object has no attribute 'strip'
I am slowly descending into madness here. The following code, by the way, stand-alone, works:
import numpy as np
kinetic_energy = 9000
mass = 40
length = 2e-2
velocity = np.sqrt((3.2e-19 * kinetic_energy) / (mass * 1.66e-27))
frequency = float(velocity) / float(length)
print(frequency)
Can anyone see what could be wrong? I've included all the relevant code below, it's not my complete file but this alone should give an output, at least.
run.py
import logic
from appjar import gui
def generate(btn):
app.updatePlot("signal", *logic.signal(app.getEntry("electrodes"), app.getEntry("length")))
showSignalLabels()
def showSignalLabels():
signal_axes.set_xlabel("time (us)")
signal_axes.set_ylabel("amplitude (uV)")
app.refreshPlot("signal")
app = gui()
signal_axes = app.addPlot("signal", *logic.signal(5, 0.02), 0, 0, 1)
app.addLabelEntry("electrodes", 1, 0, 1)
app.addLabelEntry("length", 2, 0, 1)
showSignalLabels()
app.addButton("Generate", generate)
app.go()
logic.py
import numpy as np
import math
import colorednoise as cn
steps = 5000
amplitude = 1
offset_code = 0
kinetic_energy = 9000
mass = 40
centered = 1
def signal(electrodes, length):
velocity = math.sqrt((3.2e-19 * kinetic_energy) / (mass * 1.66e-27))
frequency = velocity / length
time = 2 * (electrodes / frequency)
--- irrelevant code ---
return OutputTime, OutputSignal
edit: here is the full traceback.
Exception in Tkinter callback
Traceback (most recent call last):
File "E:\Internship IOM\WPy64-3720\python-3.7.2.amd64\lib\tkinter\__init__.py", line 1705, in __call__
return self.func(*args)
File "E:\Internship IOM\PythonScripts\appJar\appjar.py", line 3783, in <lambda>
return lambda *args: funcName(param)
File "E:/Internship IOM/PythonScripts/appJar/testrun.py", line 12, in generate
app.updatePlot("signal", *logic.signal(app.getEntry("electrodes"), app.getEntry("length")))
File "E:\Internship IOM\PythonScripts\appJar\logic.py", line 33, in signal
frequency = velocity / length
TypeError: unsupported operand type(s) for /: 'float' and 'str'
You should convert at the calling site, i.e. do:
def generate(btn):
app.updatePlot("signal", *logic.signal(app.getEntry("electrodes"),
float(app.getEntry("length"))))
...
Because otherwise your function logic.signal receives different type (str and float). That's why you receive the other error about float has no strip because somewhere else in your code you do:
signal_axes = app.addPlot("signal", *logic.signal(5, 0.02), 0, 0, 1)
Here you pass it a float.
Since your original error was could not convert string to float with an apparently empty string, you need to take an extra measure to prevent empty values from the app. You can use a try ... except:
def generate(btn):
try:
length = float(app.getEntry("length"))
except ValueError:
# Use some default value here, or re-raise.
length = 0.
app.updatePlot("signal", *logic.signal(app.getEntry("electrodes"), length))

Global function is not defined

I am facing this error as I define my module. I am trying to write a program for edit distance problem via dynamic programming method.
Here is the module where I am stuck:
def cost(i,j,M,w,text,pattern,compare): #Defining the cost functions or can say recurrence formula
M[0,j]=0
text1=list(text)
pattern1=list(pattern)
for i in range(1,m+1):
for j in range(1,n+1):
insertions = M[i-1,j]+1
deletions = M[i,j-1]+1
matches=M[i-1,j-1]
if text1[i]==patttern1[j]:
matches = matches+1
return matches
else :
return matches
and the error is :
Traceback (most recent call last): File
"/Users/sayaneshome/Documents/plschk.py", line 202, in
fill(M, w, text, max) #Filling matrix M with scores File
"/Users/sayaneshome/Documents/plschk.py", line 117, in fill c =
cost(i,j,M,w,text,pattern,compare) File
"/Users/sayaneshome/Documents/plschk.py", line 95, in cost if
text1[i]==patttern1[j]: NameError: global name 'patttern1' is not
defined
Your patttern1 has three t's. Remove one to get pattern1.

ifft function gives "'str' object is not callable" error

I am trying to take the inverse Fourier transform of a list, and for some reason I keep getting the following error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "simulating_coherent_data.py", line 238, in <module>
exec('ift%s = np.fft.ifft(nd.array(FTxSQRT_PS%s))'(x,x))
TypeError: 'str' object is not callable
And I can't figure out where I have a string. The part of my code it relates to is as follows
def FTxSQRT_PS(FT,PS):
# Import: The Fourier Transform and the Power Spectrum, both as lists
# Export: The result of FTxsqrt(PS), as a list
# Function:
# Takes each element in the FT and PS and finds FTxsqrt(PS) for each
# appends each results to a list called signal
signal = []
print type(PS)
for x in range(len(FT)):
indiv_signal = np.abs(FT[x])*math.sqrt(PS[x])
signal.append(indiv_signal)
return signal
for x in range(1,number_timesteps+1):
exec('FTxSQRT_PS%s = FTxSQRT_PS(fshift%s,power_spectrum%s)'%(x,x,x))
exec('ift%s = np.fft.ifft(FTxSQRT_PS%s)'(x,x))
Where FTxSQRT_PS%s are all lists. fshift%s is a np.array and power_spectrum%s is a list. I've also tried setting the type for FTxSQRT_PS%s as a np.array but that did not help.
I have very similar code a few lines up that works fine;
for x in range(1,number_timesteps+1):
exec('fft%s = np.fft.fft(source%s)'%(x,x))
where source%s are all type np.array
The only thing I can think of is that maybe np.fft.ifft is not how I should be taking the inverse Fourier transform for Python 2.7.6 but I also cannot find an alternative.
Let me know if you'd like to see the whole code, there is about 240 lines up to where I'm having trouble, though a lot of that is commenting.
Thanks for any help,
Teresa
You are missing a %
exec('ift%s = np.fft.ifft(FTxSQRT_PS%s)'(x,x))
Should be:
exec('ift%s = np.fft.ifft(FTxSQRT_PS%s)'%(x,x))

py2neo rel() list indices must be integer not float

I'm trying to import nodes into Neo4j in a batch. But when I try to execute it, it throws an error: List indices must be integers, not float. I don't really understand which listitems, I do have floats, but these are cast to strings...
Partial code:
graph_db = neo4j.GraphDatabaseService("http://127.0.0.1:7474/db/data/")
batch = neo4j.WriteBatch(graph_db)
for ngram, one_grams in data.items():
ngram_rank = int(one_grams['_rank'])
ngram_prob = '%.16f' % float(one_grams['_prob'])
ngram_id = 'a'+str(n)
ngram_node = batch.create(node({"word": ngram, "rank": str(ngram_rank), "prob": str(ngram_prob)}))
for one_gram, two_grams in one_grams.items():
one_rank = int(two_grams['_rank'])
one_prob = '%.16f' % float(two_grams['_prob'])
one_node = batch.create(node({"word": one_gram, "rank": str(one_rank), "prob": one_prob}))
batch.create(rel((ngram_node, "FOLLOWED_BY", one_node))) #line 81 throwing error
results = batch.submit()
Full traceback
Traceback (most recent call last):
File "Ngram_neo4j.py", line 81, in probability_items
batch.create(rel((ngram_node, "FOLLOWED_BY", one_node))),
File "virtenv\\lib\\site-packages\\py2neo\\neo4j.py", line 2692, in create
uri = self._uri_for(entity.start_node, "relationships"),
File "virtenv\\lib\\site-packages\\py2neo\\neo4j.py", line 2537, in _uri_for
uri = "{{{0}}}".format(self.find(resource)),
File "virtenv\\lib\\site-packages\\py2neo\\neo4j.py", line 2525, in find
for i, req in pendulate(self._requests):,
File "virtenv\\lib\\site-packages\\py2neo\\util.py", line 161, in pendulate
yield index, collection[index],
TypeError: list indices must be integers, not float
running neo4j 2.0, py2neo 1.6.1, Windows 7/64bit, python 3.3/64bit
--EDIT--
Did some testing, but the error is located in the referencing to nodes.
oversimplified sample code:
for key, dict in data.items(): #string, dictionary
batch = neo4j.WriteBatch(graph_db)
three_gram_node = batch.create(node({"word": key}))
pprint(three_gram_node)
batch.add_labels(three_gram_node, "3gram") # must be int, not float
for k,v in dict.items(): #string, string
four_gram_node = batch.create(node({"word": k}))
batch.create_path(three_gram_node, "FOLLOWED_BY", four_gram_node)
# cannot cast node from BatchRequest obj
batch.submit()
When a node is created batch.create(node({props})), the pprint returns a P2Neo.neo4j. batchrequest object.
At the line add_labels(), it gives the same error as when trying to create a relation: List indices must be integers, not float.
At the batch.create_path() line it throws an error saying it can't cast a node from a P2Neo.neo4j. batchrequest object.
I'm trying the dirty-debug now to understand the indices.
--Dirty Debug Edit--
I've been meddling around with the pendulate(collection) function.
Although I don't really understand how it fits in, and how it's used, the following is happening:
Whenever it hits an uneven number, it gets cast to a float (which is weird, since count - ((i + 1) / 2), where i is an uneven number.) This float then throws the list indices error. Some prints:
count: 3
i= 0
index: 0
(int)index: 0
i= 1 # i = uneven
index: 2.0 # a float appears
(int)index: 2 # this is a safe cast
This results in the list indices error. This also happens when i=0. As this is a common case, I made an additional if() to circumvent the code (possible speedup?) Although I've not unit tested this, it seems that we can safely cast index to an int...
The pendulate function as used:
def pendulate(collection):
count = len(collection)
print("count: ", count)
for i in range(count):
print("i=", i)
if i == 0:
index = 0
elif i % 2 == 0:
index = i / 2
else:
index = count - ((i + 1) / 2)
print("index:", index)
index = int(index)
print("(int)index:", index)
yield index, collection[index]
soft debug : print ngram_node and one_node to see what they contains
dirty debug : modify File "virtenv\lib\site-packages\py2neo\util.py", line 161, add a line before :
print index
You are accessing a collection (a Python list given the traceback), so, for sure, index must be an integer :)
printing it will probably help you to understand why exception raised
(Don't forget to remove your dirty debug afterwards ;))
While it is currently possible for WriteBatch objects to be executed multiple times with edits in between, it is inadvisable to use them in this way and this will be restricted in the next version of py2neo. This is because objects created during one execution will not be available during a subsequent execution and it is not easy to detect when this is being requested.
Without looking back at the underlying code, I'm unsure why you are seeing this exact error but I would suggest refactoring your code so that each WriteBatch creation is paired with one and only one execution call (submit). You can probably achieve this by putting your batch creation within your outer loop and moving your submit call out of the inner loop into the outer loop as well.

Categories

Resources