Error while using sum() in Python SFrame - python

I'm new to python and I'm performing a basic EDA analysis on two similar SFrames. I have a dictionary as two of my columns and I'm trying to find out if the max values of each dictionary are the same or not. In the end I want to sum up the Value_Match column so that I can know how many values match but I'm getting a nasty error and I haven't been able to find the source. The weird thing is I have used the same methodology for both the SFrames and only one of them is giving me this error but not the other one.
I have tried calculating max_func in different ways as given here but the same error has persisted : getting-key-with-maximum-value-in-dictionary
I have checked for any possible NaN values in the column but didn't find any of them.
I have been stuck on this for a while and any help will be much appreciated. Thanks!
Code:
def max_func(d):
v=list(d.values())
k=list(d.keys())
return k[v.index(max(v))]
sf['Max_Dic_1'] = sf['Dic1'].apply(max_func)
sf['Max_Dic_2'] = sf['Dic2'].apply(max_func)
sf['Value_Match'] = sf['Max_Dic_1'] == sf['Max_Dic_2']
sf['Value_Match'].sum()
Error :
RuntimeError Traceback (most recent call last)
<ipython-input-70-f406eb8286b3> in <module>()
----> 1 x = sf['Value_Match'].sum()
2 y = sf.num_rows()
3
4 print x
5 print y
C:\Users\rakesh\Anaconda2\lib\site-
packages\graphlab\data_structures\sarray.pyc in sum(self)
2216 """
2217 with cython_context():
-> 2218 return self.__proxy__.sum()
2219
2220 def mean(self):
C:\Users\rakesh\Anaconda2\lib\site-packages\graphlab\cython\context.pyc in
__exit__(self, exc_type, exc_value, traceback)
47 if not self.show_cython_trace:
48 # To hide cython trace, we re-raise from here
---> 49 raise exc_type(exc_value)
50 else:
51 # To show the full trace, we do nothing and let
exception propagate
RuntimeError: Runtime Exception. Exception in python callback function
evaluation:
ValueError('max() arg is an empty sequence',):
Traceback (most recent call last):
File "graphlab\cython\cy_pylambda_workers.pyx", line 426, in
graphlab.cython.cy_pylambda_workers._eval_lambda
File "graphlab\cython\cy_pylambda_workers.pyx", line 169, in
graphlab.cython.cy_pylambda_workers.lambda_evaluator.eval_simple
File "<ipython-input-63-b4e3c0e28725>", line 4, in max_func
ValueError: max() arg is an empty sequence

In order to debug this problem, you have to look at the stack trace. On the last line we see:
File "<ipython-input-63-b4e3c0e28725>", line 4, in max_func
ValueError: max() arg is an empty sequence
Python thus says that you aim to calculate the maximum of a list with no elements. This is the case if the dictionary is empty. So in one of your dataframes there is probably an empty dictionary {}.
The question is what to do in case the dictionary is empty. You might decide to return a None into that case.
Nevertheless the code you write is too complicated. A simpler and more efficient algorithm would be:
def max_func(d):
if d:
return max(d,key=d.get)
else:
# or return something if there is no element in the dictionary
return None

Related

Pyomo TypeError: unhashable type: 'EqualityExpression'

I am building an energy planning model in Pyomo and I am running into problems building some power grid constraints.
def grid2grid_rule(m, ts):
return m.power['grid','grid', ts] == 0
m.const_grid2grid = Constraint(ts_i, grid2grid_rule)
def import_rule(m, ts):
return m.gridImport[ts] == sum(m.power['grid',derIn,ts] for derIn in elIn)
m.const_import = Constraint(ts_i, rule = import_rule)
def export_rule(m, ts):
return m.gridExport[ts] == sum(m.power[derOut,'grid',ts] for derOut in elOut)
m.const_export = Constraint(ts_i, export_rule)
Definition of Power:
m.power = Var(elOut, elIn, ts_i, within = NonNegativeReals)
Explaining the code:
m.power is a decision variable with 3 indices: The electricity source (elOut), the electricity 'usage' (elIn) and the current timestep index ts_i. elOut and elIn are numpy arrays with strings and ts_i a numpy array with integers from 0 to how many timesteps there are.
The first constraint just says that at any timestep there the electricity cannot flow from the grid to the grid. The import constraint says that the grid imports at each timestep are the sum over all power flows from the grid to electricity takers. The export constraint says that the grid exports at each timestep are a sum of all powerflows from electricity 'givers' to the grid.
Now, my problem is, when I comment the grid2grid and the export constraint, it works and a set of constraints is built as expected. However, for example when I uncomment the export rule, which is almost identical to the import rule, I get this error:
m = build_model('Input_Questionaire.xlsx', 'DER_excel', yeardivision = "repr_day")
ERROR: Constructing component 'const_export_index_1' from data=None failed:
TypeError: Problem inserting gridExport[1] == power[pv_ground,grid,1] +
power[wind_s,grid,1] + power[battery,grid,1] + power[grid,grid,1] into set
const_export_index_1
Traceback (most recent call last):
File "C:\Users\Axel\Anaconda3\lib\site-packages\pyomo\core\base\sets.py", line 824, in add
if tmp in self:
File "C:\Users\Axel\Anaconda3\lib\site-packages\pyomo\core\base\sets.py", line 998, in __contains__
return self._set_contains(element)
File "C:\Users\Axel\Anaconda3\lib\site-packages\pyomo\core\base\sets.py", line 1302, in _set_contains
return element in self.value
TypeError: unhashable type: 'EqualityExpression'
Accompanied with this error:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
...
...
...
File "C:\Users\Axel\Anaconda3\lib\site-packages\pyomo\core\base\sets.py", line 833, in add
raise TypeError("Problem inserting "+str(tmp)+" into set "+self.name)
TypeError: Problem inserting gridExport[1] == power[pv_ground,grid,1] + power[wind_s,grid,1] + power[battery,grid,1] + power[grid,grid,1] into set const_export_index_1
I do not know how to fix it, especially since there is basically no difference in the two Constraints...
Thanks heaps for your help!
Axel
Ugh... just saw it. It's an easy one. :)
you omitted "rule=" portion of the constraint construction, so it is passing in the function as a set or something weird...
Anyhow. Change:
m.const_export = Constraint(ts_i, export_rule)
to:
m.const_export = Constraint(ts_i, rule=export_rule)
same for your grid2grid

How to set a condition statement on a loop process

I am using Python 3; I have a problem in setting a condition statement over some groups (to consider pixel only when there are more than 5 available data) in a loop and I expect to get a blank pixel whether the condition isn't satisfied.
I tried some 'if' statement, but I am constantly getting a KeyError when the condition isn't maybe satisfied.
I'll show the code:
Xpix = 78
Ypix = 30
row = []
mean_val = []
for i in range (0,Ypix):
for j in range (0,Xpix):
if(len(data_pixel.groupby(['lin','col']).get_group((i,j))[['gamma']])>=5):
means = data_pixel.groupby(['lin','col']).get_group((i,j))[['gamma'].mean()
else:
means = 0
row.append(means)
mean_val = np.array(row).reshape(Ypix, Xpix)
I expect a 78 x 30 array to plot with blank pixels and mean pixels.
Here I show the error I got:
Traceback (most recent call last):
File "map.py", line 415, in <module>
proc.process()
File "map.py", line 215, in process
if (len(data_pixel.groupby(['lin', 'col']).get_group((i,j))[['gamma']])>=5):
File "/xxx/yyy/anaconda3/envs/gnss/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 680, in get_group
raise KeyError(name)
KeyError: (10,41)
data_pixel refers to a big dataframe with a lot of data. I would appreciate a lot if anyone could help with this.

Too many indices in array error brian2-python

I am trying to compare an array value with the previous and the next one using the below code but i get the too many indices in array error, which I would like to bypass, but I dont know how.
spikes=print(M.V[0])
#iterate in list of M.V[0] with three iterators to find the spikes
for i,x in enumerate(M.V[0]):
if (i>=1):
if x[i-1]<x[i] & x[i]>x[i+1] & x[i]>25*mV:
spikes+=1
print(spikes)
and I get this error:
IndexError Traceback (most recent call last)
<ipython-input-24-76d7b392071a> in <module>
3 for i,x in enumerate(M.V[0]):
4 if (i>=1):
----> 5 if x[i-1]<x[i] & x[i]>x[i+1] & x[i]>25*mV:
6 spikes+=1
7 print(spikes)
~/anaconda3/lib/python3.6/site-packages/brian2/units/fundamentalunits.py in __getitem__(self, key)
1306 single integer or a tuple of integers) retain their unit.
1307 '''
-> 1308 return Quantity(np.ndarray.__getitem__(self, key), self.dim)
1309
1310 def __getslice__(self, start, end):
IndexError: too many indices for array
Do note that M.V[0] is an array by itself
You said that "M.V[0] is an array by itself". However, you need to say more about it. Probably M is StateMonitor object detailed in https://brian2.readthedocs.io/en/stable/user/recording.html#recording-spikes . Is this correct ?
If so, you need to give full and minimal code in order to understand your details. For instance, what is your neuron model inside NeuronGroup object? More importantly, instead of finding spike event on your own, why don't you use SpikeMonitor class which extremely ease what you are planning ?
SpikeMonitor class in Brian2 : https://brian2.readthedocs.io/en/stable/reference/brian2.monitors.spikemonitor.SpikeMonitor.html

Index out of bound while reading a dataframe

I have a tab separated file that I am trying to parse and for that I am doing this :
header of my file :
chrom coord ref_base var_base A C G T
17 26695663 G A 1 0 1934 0
17 26695664 T A 1 0 1 1935
my code is :
counts = pd.read_csv(args.counts_file, sep='\t')
toto = counts[(counts['chrom'].astype(str) == "17") & (counts['coord'].astype(str) == "26695663")]
print toto["G"].values[0]
this function returns the number wanted which is 1934
Now when I try to create a function that takes arguments the dataframe read from the file, I wrote this function
def get_foreground_counts(chrom, coord, counts, ref_base, var_base):
foreground_counts = counts[(counts['chrom'] == chrom) & (counts['coord'] == coord)]
foreground_ref_counts = foreground_counts[ref_base].values[0]
foreground_var_counts = foreground_counts[var_base].values[0]
return foreground_ref_counts, foreground_var_counts
I got this error that I am trying to figure out but still cant see why
Traceback (most recent call last):
File "test.py", line 203, in <module>
main(args)
File "test.py", line 71, in main
foreground_ref_counts, foreground_var_counts = get_foreground_counts(chrom, coord, counts, ref_base, var_base)
File "test.py", line 137, in get_foreground_counts
foreground_ref_counts = foreground_counts[ref_base].values[0]
IndexError: index out of bounds
Any idea why ?
Thanks
UPDATE
When I try to print foreground_counts[ref_base].values I get this []
What I am passing to the function is chrom (string), coord(string), counts(panda dataframe), ref_base (string), var_base(string) )
In your function, your filter does return zero rows, that's why you get the error. It seems you forgot the .astype(str) in your function's first line.
You could either cast the column type before calling the function or modify that line. The former would be a better approach if you really need to use a string type, otherwise why don't you use integer values for the comparison?.

py2neo rel() list indices must be integer not float

I'm trying to import nodes into Neo4j in a batch. But when I try to execute it, it throws an error: List indices must be integers, not float. I don't really understand which listitems, I do have floats, but these are cast to strings...
Partial code:
graph_db = neo4j.GraphDatabaseService("http://127.0.0.1:7474/db/data/")
batch = neo4j.WriteBatch(graph_db)
for ngram, one_grams in data.items():
ngram_rank = int(one_grams['_rank'])
ngram_prob = '%.16f' % float(one_grams['_prob'])
ngram_id = 'a'+str(n)
ngram_node = batch.create(node({"word": ngram, "rank": str(ngram_rank), "prob": str(ngram_prob)}))
for one_gram, two_grams in one_grams.items():
one_rank = int(two_grams['_rank'])
one_prob = '%.16f' % float(two_grams['_prob'])
one_node = batch.create(node({"word": one_gram, "rank": str(one_rank), "prob": one_prob}))
batch.create(rel((ngram_node, "FOLLOWED_BY", one_node))) #line 81 throwing error
results = batch.submit()
Full traceback
Traceback (most recent call last):
File "Ngram_neo4j.py", line 81, in probability_items
batch.create(rel((ngram_node, "FOLLOWED_BY", one_node))),
File "virtenv\\lib\\site-packages\\py2neo\\neo4j.py", line 2692, in create
uri = self._uri_for(entity.start_node, "relationships"),
File "virtenv\\lib\\site-packages\\py2neo\\neo4j.py", line 2537, in _uri_for
uri = "{{{0}}}".format(self.find(resource)),
File "virtenv\\lib\\site-packages\\py2neo\\neo4j.py", line 2525, in find
for i, req in pendulate(self._requests):,
File "virtenv\\lib\\site-packages\\py2neo\\util.py", line 161, in pendulate
yield index, collection[index],
TypeError: list indices must be integers, not float
running neo4j 2.0, py2neo 1.6.1, Windows 7/64bit, python 3.3/64bit
--EDIT--
Did some testing, but the error is located in the referencing to nodes.
oversimplified sample code:
for key, dict in data.items(): #string, dictionary
batch = neo4j.WriteBatch(graph_db)
three_gram_node = batch.create(node({"word": key}))
pprint(three_gram_node)
batch.add_labels(three_gram_node, "3gram") # must be int, not float
for k,v in dict.items(): #string, string
four_gram_node = batch.create(node({"word": k}))
batch.create_path(three_gram_node, "FOLLOWED_BY", four_gram_node)
# cannot cast node from BatchRequest obj
batch.submit()
When a node is created batch.create(node({props})), the pprint returns a P2Neo.neo4j. batchrequest object.
At the line add_labels(), it gives the same error as when trying to create a relation: List indices must be integers, not float.
At the batch.create_path() line it throws an error saying it can't cast a node from a P2Neo.neo4j. batchrequest object.
I'm trying the dirty-debug now to understand the indices.
--Dirty Debug Edit--
I've been meddling around with the pendulate(collection) function.
Although I don't really understand how it fits in, and how it's used, the following is happening:
Whenever it hits an uneven number, it gets cast to a float (which is weird, since count - ((i + 1) / 2), where i is an uneven number.) This float then throws the list indices error. Some prints:
count: 3
i= 0
index: 0
(int)index: 0
i= 1 # i = uneven
index: 2.0 # a float appears
(int)index: 2 # this is a safe cast
This results in the list indices error. This also happens when i=0. As this is a common case, I made an additional if() to circumvent the code (possible speedup?) Although I've not unit tested this, it seems that we can safely cast index to an int...
The pendulate function as used:
def pendulate(collection):
count = len(collection)
print("count: ", count)
for i in range(count):
print("i=", i)
if i == 0:
index = 0
elif i % 2 == 0:
index = i / 2
else:
index = count - ((i + 1) / 2)
print("index:", index)
index = int(index)
print("(int)index:", index)
yield index, collection[index]
soft debug : print ngram_node and one_node to see what they contains
dirty debug : modify File "virtenv\lib\site-packages\py2neo\util.py", line 161, add a line before :
print index
You are accessing a collection (a Python list given the traceback), so, for sure, index must be an integer :)
printing it will probably help you to understand why exception raised
(Don't forget to remove your dirty debug afterwards ;))
While it is currently possible for WriteBatch objects to be executed multiple times with edits in between, it is inadvisable to use them in this way and this will be restricted in the next version of py2neo. This is because objects created during one execution will not be available during a subsequent execution and it is not easy to detect when this is being requested.
Without looking back at the underlying code, I'm unsure why you are seeing this exact error but I would suggest refactoring your code so that each WriteBatch creation is paired with one and only one execution call (submit). You can probably achieve this by putting your batch creation within your outer loop and moving your submit call out of the inner loop into the outer loop as well.

Categories

Resources