pandas: bad argument to internal function ( in iterators.c)

pandas: bad argument to internal function ( in iterators.c) - python

Why do I get an error here? Using Python 2.6 and pandas v.0.13.1
In [2]: df = pd.DataFrame({'x': [1, 1, 2, 2, 1, 1], 'y':[1, 2, 2, 2, 2, 1]})
In [3]: print pd.factorize(pd.lib.fast_zip([df.x, df.y]))[0]
---------------------------------------------------------------------------
SystemError Traceback (most recent call last)
<ipython-input-3-d98d985f2794> in <module>()
----> 1 print pd.factorize(pd.lib.fast_zip([df.x, df.y]))[0]
/usr/lib64/python2.6/site-packages/pandas/lib.so in pandas.lib.fast_zip (pandas/lib.c:8026)()
SystemError: numpy/core/src/multiarray/iterators.c:370: bad argument to internal function

You have to use df.x.values and df.y.values instead, in order to access the np.ndarray objects needed in pd.lib.fast_zip():
print(pd.factorize(pd.lib.fast_zip([df.x.values, df.y.values]))[0])

Related

Errors while importing Operator (Python)

I am a little confused after a couple attempts while importing Operator and receiving errors. Along with a couple of examples, I've shared a python doc link for reference below.
What I'm expecting to happen below is that operator will run the product and multiply 3 * 4 in the data list which the answer will start [3, 12....] then multiply 12 by the next element '6' to give, [3, 12, 72...]. However importing Operator here isn't working as expected?
The Output I'm expecting for this problem is:
[3, 12, 72, 144, 144, 1296, 0, 0, 0, 0]
Running the below code in PythonTutor.com gives me an Error:
ImportError: cannot import name 'operator'
from itertools import operator
data = [3, 4, 6, 2, 1, 9, 0, 7, 5, 8]
list(accumulate(data, operator.mul))
I've gotten the same type of error running this in Jupyter notebook:
ImportError Traceback (most recent call last)
<ipython-input-1-bc61652bebb8> in <module>
----> 1 from itertools import operator
2
3 data = [3, 4, 6, 2, 1, 9, 0, 7, 5, 8]
4 list(accumulate(data, operator.mul))
ImportError: cannot import name 'operator' from 'itertools' (unknown location)
I've spelled check about 100 times and I've ran these on both PythonTutor and Jupyter NB, and both are giving me errors - can this be an issue with itertools?
Below is from The Python Docs. I'm using the first case:
operator.mul(a, b)
I'll share for your reference: Here
----> operator.mul(a, b)
operator.__mul__(a, b)
Return a * b, for a and b numbers.
Why isn't this working, and how can I fix it?

operator is its own module, not part of itertools:
import itertools
import operator
Note that itertools.accumulate doesn't modify the iterable it is given. It returns a new object which you are not using above. Consider assigning it to a new variable:
data = [3, 4, 6, 2, 1, 9, 0, 7, 5, 8]
accumulated_list = list(itertools.accumulate(data, operator.mul))

comm.bcast not working properly

I am trying to test a simple mpi code on python with the following code :
from scipy.sparse import csr_matrix
from mpi4py import MPI
comm=MPI.COMM_WORLD
rank=comm.Get_rank()
size=comm.Get_size()
if rank == 0:
data = [1, 2, 3, 4, 5]
indices = [1, 3, 2, 1, 0]
indptr = [0, 2, 3, 4, 5]
#A=csr_matrix((data,indices,indptr),shape=(4,4))
data=comm.bcast(data, root=0)
indices=comm.bcast(indices, root=0)
indptr=comm.bcast(indptr, root=0)
print rank,data,indices,indptr
which returns the following error:
Traceback (most recent call last):
File "test.py", line 14, in <module>
data=comm.bcast(data, root=0)
NameError: name 'data' is not defined
Traceback (most recent call last):
File "test.py", line 14, in <module>
data=comm.bcast(data, root=0)
NameError: name 'data' is not defined
Traceback (most recent call last):
File "test.py", line 14, in <module>
data=comm.bcast(data, root=0)
NameError: name 'data' is not defined
0 [1, 2, 3, 4, 5] [1, 3, 2, 1, 0] [0, 2, 3, 4, 5]
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[10263,1],1]
Exit code: 1
It seems like the error is due to me not using comm.bcast properly, but that is exactly how its used in the docs.

you are defining data in the if block. What happens when the if block is false? the variable data is not defined.
from scipy.sparse import csr_matrix
from mpi4py import MPI
comm=MPI.COMM_WORLD
rank=comm.Get_rank()
size=comm.Get_size()
data = []
indices = []
indptr = []
if rank == 0:
data = [1, 2, 3, 4, 5]
indices = [1, 3, 2, 1, 0]
indptr = [0, 2, 3, 4, 5]
#A=csr_matrix((data,indices,indptr),shape=(4,4))
data=comm.bcast(data, root=0)
indices=comm.bcast(indices, root=0)
indptr=comm.bcast(indptr, root=0)
print rank,data,indices,indptr
This should now work.

Create dictionary of statistics for several lists in Python?

I would like to create some basic statistics for several lists of data and store them in a dictionary:
>>> from statistics import mean,median
>>> a,b,c=[1,2,3],[4,5,6],[7,8,9]
The following list comprehension works and outputs stats for "a":
>>> [eval("{}({})".format(op,a)) for op in ['mean','median','min','max']]
[2, 2, 1, 3]
Assigning the list's variable name (a) to another object (dta) and evaluating "dta" in a list comprehension also works:
>>> dta="a"
>>> [eval("{}({})".format(op,eval("dta"))) for op in ['mean','median','min','max']]
[2, 2, 1, 3]
But when I try to tie this all together in a dictionary comprehension, it does not work:
>>> {k:[eval("{}({})".format(op,eval("k"))) for op in ['mean','median','min','max']] for k in ['a','b','c']}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <dictcomp>
File "<stdin>", line 1, in <listcomp>
File "<string>", line 1, in <module>
NameError: name 'k' is not defined
My guess is that the eval is processed before the comprehension, which is why 'k' is not yet defined? Any suggestions for how to get this work or a different routine that would accomplish the same output?

Do not quote the k in the inner eval:
{k:[eval("{}({})".format(op,eval(k))) for op in ['mean','median','min','max']] for k in ['a','b','c']}
^
Or drop eval altogether:
[[mean(k), median(k), min(k), max(k)] for k in [a, b, c]]
You can do a simple workaround with the keys to change this to a dictionary comprehension.

Try removing the quotation marks around k in your call to eval in the format function.
I ran the following commands:
> from statistics import mean,median
> a,b,c=[1,2,3],[4,5,6],[7,8,9]
> {k:[eval("{}({})".format(op,eval(k))) for op in ['mean','median','min','max']] for k in ['a','b','c']}
and got the following output:
{'a': [2.0, 2, 1, 3], 'c': [8.0, 8, 7, 9], 'b': [5.0, 5, 4, 6]}

How to use qflll() in the PARI library?

I wanted to use the function qflll from the PARI library in python, so I downloaded pari-python-cygwin-0.1.zip, however when I attempted to use qflll in python, i.e.
qflll([[1,0,0],[0,1,0],[0,0,1]])
I got this error message
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Too few parameters provided: 1
So I how do I invoke the function qflll in python properly without any error?

As you can see in these docs, the qflll function takes a PARI matrix as input. Therefore, you have to do something like:
sage: M = Matrix([[1,0,0],[0,1,0],[0,0,1]])
sage: p = pari(M)
sage: p.qflll()
[1, 0, 0; 0, 1, 0; 0, 0, 1]
Or, if you prefer, one sentence:
sage: pari(Matrix([[1,0,0],[0,1,0],[0,0,1]])).qflll()
[1, 0, 0; 0, 1, 0; 0, 0, 1]

Theano: Why does indexing fail in this case?

I'm trying to get the max of a vector given a boolean value.
With Numpy:
>>> this = np.arange(10)
>>> this[~(this>=5)].max()
4
But with Theano:
>>> that = T.arange(10, dtype='int32')
>>> that[~(that>=5)].max().eval()
9
>>> that[~(that>=5).nonzero()].max().eval()
Traceback (most recent call last):
File "<pyshell#146>", line 1, in <module>
that[~(that>=5).nonzero()].max().eval()
AttributeError: 'TensorVariable' object has no attribute 'nonzero'
Why does this happen? Is this a subtle nuance that i'm missing?

You are using a version of Theano that is too old. In fact, tensor_var.nonzero() isn't in any released version. You need to update to the development version.
With the development version I have this:
>>> that[~(that>=5).nonzero()].max().eval()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: bad operand type for unary ~: 'tuple'
This is because you are missing parenthesis in your line. Here is the good line:
>>> that[(~(that>=5)).nonzero()].max().eval()
array(9, dtype=int32)
But we still have unexpected result! The problem is that Theano do not support bool. Doing ~ on int8, is doing the bitwise invert on 8 bits, not 1 bit. It give this result:
>>> (that>=5).eval()
array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1], dtype=int8)
>>> (~(that>=5)).eval()
array([-1, -1, -1, -1, -1, -2, -2, -2, -2, -2], dtype=int8)
You can remove the ~ with this:
>>> that[(that<5).nonzero()].max().eval()
array(4, dtype=int32)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas: bad argument to internal function ( in iterators.c) - python

You have to use df.x.values and df.y.values instead, in order to access the np.ndarray objects needed in pd.lib.fast_zip(): print(pd.factorize(pd.lib.fast_zip([df.x.values, df.y.values]))[0])

Related

Errors while importing Operator (Python)

comm.bcast not working properly

Create dictionary of statistics for several lists in Python?

How to use qflll() in the PARI library?

Theano: Why does indexing fail in this case?

Categories

Resources