df_data not defined, unsure of the cause - python

working through a tutorial that is supposed to help students do the assignment, but I'm encountering a problem. I'm using python on a notebook project in IBM. Right now the section is simply data exploration. However this error is occurring and I'm not sure how to fix it, no one else seemed to have this problem in this class and the teacher is rather slow to help so I came here!
I tried just defining the variable before its called, but no dice either way.
All the code prior to this is just importing libraries and then parsing the data
# Infer the data type of each column and convert the data to the inferred data type
from ingest import *
eu = ExtensionUtils(sqlContext)
df_data_1 = eu.convertTypes(df_data_1)
df_data_1.printSchema()
the error I'm getting is
TypeError Traceback (most recent call last)
<ipython-input-14-33250ae79106> in <module>()
2 from ingest import *
3 eu = ExtensionUtils(sqlContext)
----> 4 df_data_1 = eu.convertTypes(df_data_1)
5 df_data_1.printSchema()
/opt/ibm/third-party/libs/python3/ingest/extension_utils.py in convertTypes(self, input_obj, dictVal)
304 """
305
--> 306 checkEnrichType_or_DataFrame("input_obj",input_obj)
307 self.logger = self._jLogger.getLogger(__name__)
308 methodname = str(inspect.stack()[0][3])
/opt/ibm/third-party/libs/python3/ingest/extension_utils.py in checkEnrichType_or_DataFrame(param, paramval)
81 if not isinstance(paramval,(EnrichType ,DataFrame)):
82 raise TypeError("%s should be a EnrichType class object or DataFrame, got type %s"
---> 83 % (str(param), type(paramval)))
84
85
TypeError: input_obj should be a EnrichType class object or DataFrame, got type <class 'NoneType'>

The solution was not with the code itself but rather with the notebook. A code snippet from a built in function needed to be inserted first before this.

Related

pycaret compare_models() call doesn't recognize sort

After creating the clr_default:
clr_default = setup(df_rain_definitivo_one_drop_catboost_norm_fs_dropna,fold_shuffle=True, target='RainTomorrow', session_id=123)
I've tried to use the compare_models() function in Pycaret, using the following call:
best_model = compare_models()
from pycaret.classification import *
However I get the following error message:
ValueError Traceback (most recent call last)
<ipython-input-228-e1d76b68915a> in <module>()
----> 1 best_model = compare_models(n_select = 5, sort='Accuracy')
1 frames
/usr/local/lib/python3.7/dist-packages/pycaret/internal/tabular.py in compare_models(include, exclude, fold, round, cross_validation, sort, n_select, budget_time, turbo, errors, fit_kwargs, groups, verbose, display)
1954 if sort is None:
1955 raise ValueError(
-> 1956 f"Sort method not supported. See docstring for list of available parameters."
1957 )
1958
ValueError: Sort method not supported. See docstring for list of available parameters.
I've tried to call compare_models() with the sort parameter = 'Accuracy' but it didn't do any good.
Also, I'm on Google Colab
I dont get what is n_select = 5? do you want to get the top-5 models? Otherwise;
Im using your code examples:
First import pycaret
from pycaret.classification import *
Then setup,
clr_default = setup(df_rain_definitivo_one_drop_catboost_norm_fs_dropna,fold_shuffle=True, target='RainTomorrow', session_id=123)
Last use compare model method
best_model = compare_models(sort='Accuracy')
After that you can create your models then tune it.

Unable to create a tensor using torch.Tensor

i was trying to create a tensor as below.
import torch
t = torch.tensor(2,3)
i got the following error.
TypeError Traceback (most recent call
last) in ()
----> 1 a=torch.tensor(2,3)
TypeError: tensor() takes 1 positional argument but 2 were given
so, i tried the following
import torch
t = torch.Tensor(2,3)
# No error while creating the tensor
# When i print i get an error
print(t)
i get the following error
RuntimeError Traceback (most recent call
last) in ()
----> 1 print(a)
D:\softwares\anaconda\lib\site-packages\torch\tensor.py in
repr(self)
55 # characters to replace unicode characters with.
56 if sys.version_info > (3,):
---> 57 return torch._tensor_str._str(self)
58 else:
59 if hasattr(sys.stdout, 'encoding'):
D:\softwares\anaconda\lib\site-packages\torch_tensor_str.py in
_str(self)
216 suffix = ', dtype=' + str(self.dtype) + suffix
217
--> 218 fmt, scale, sz = _number_format(self)
219 if scale != 1:
220 prefix = prefix + SCALE_FORMAT.format(scale) + ' ' * indent
D:\softwares\anaconda\lib\site-packages\torch_tensor_str.py in
_number_format(tensor, min_sz)
94 # TODO: use fmod?
95 for value in tensor:
---> 96 if value != math.ceil(value.item()):
97 int_mode = False
98 break
RuntimeError: Overflow when unpacking long
But, according to This SO Post, he was able to create a tensor. Am i missing something here. Also, why was i able to create a tensor with Tensor(capital T) and not with tensor(small t)
torch.tensor() expects a sequence or array_like to create a tensor whereas torch.Tensor() class can create a tensor with just shape information.
Here's the signature of torch.tensor():
Docstring:
tensor(data, dtype=None, device=None, requires_grad=False) -> Tensor
Constructs a tensor with :attr:data.
Args:
data (array_like): Initial data for the tensor. Can be a list, tuple,
NumPy ndarray, scalar, and other types.
dtype (:class:torch.dtype, optional): the desired data type of returned tensor.
Regarding the RuntimeError: I cannot reproduce the error in Linux distros. Printing the tensor works perfectly fine from ipython terminal.
Taking a closer look at the error, this seems to be a problem only in Windows OS. As mentioned in the comments, have a look at the issues/6339: Error when printing tensors containing large values

Using sframe.apply() causing runtime error

I am trying to use a simple apply on s frame full of data. This is for a simple data transform on one of the columns applying a function that takes a text input and splits it into a list. Here is the function and its call/output:
In [1]: def count_words(txt):
count = Counter()
for word in txt.split():
count[word]+=1
return count
In [2]: products.apply(lambda x: count_words(x['review']))
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-8-85338326302c> in <module>()
----> 1 products.apply(lambda x: count_words(x['review']))
C:\Anaconda3\envs\dato-env\lib\site-packages\graphlab\data_structures\sframe.pyc in apply(self, fn, dtype, seed)
2607
2608 with cython_context():
-> 2609 return SArray(_proxy=self.__proxy__.transform(fn, dtype, seed))
2610
2611 def flat_map(self, column_names, fn, column_types='auto', seed=None):
C:\Anaconda3\envs\dato-env\lib\site-packages\graphlab\cython\context.pyc in __exit__(self, exc_type, exc_value, traceback)
47 if not self.show_cython_trace:
48 # To hide cython trace, we re-raise from here
---> 49 raise exc_type(exc_value)
50 else:
51 # To show the full trace, we do nothing and let exception propagate
RuntimeError: Runtime Exception. Unable to evaluate lambdas. Lambda workers did not start.
When I run my code I get that error. The s frame (df) is only 10 by 2 so there should be no overload coming from there. I don't know how to fix this issue.
If you're using GraphLab Create, there is actually a built-in tool for doing this, in the "text analytics" toolkit. Let's say I have data like:
import graphlab
products = graphlab.SFrame({'review': ['a portrait of the artist as a young man',
'the sound and the fury']})
The easiest way to count the words in each entry is
products['counts'] = graphlab.text_analytics.count_words(products['review'])
If you're using the sframe package by itself, or if you want to do a custom function like the one you described, I think the key missing piece in your code is that the Counter needs to be converted into a dictionary in order for the SFrame to handle the output.
from collections import Counter
def count_words(txt):
count = Counter()
for word in txt.split():
count[word] += 1
return dict(count)
products['counts'] = products.apply(lambda x: count_words(x['review']))
For anyone who has come across this issue while using graphlab here is the the discussion thread on the issue on dato support:
http://forum.dato.com/discussion/1499/graphlab-create-using-anaconda-ipython-notebook-lambda-workers-did-not-start
Here is the code that can be run to provide a case by case basis for this issue.
After starting ipython or ipython notebook in the Dato/Graphlab environment, but before importing graphlab, copy and run the following code
import ctypes, inspect, os, graphlab
from ctypes import wintypes
kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)
kernel32.SetDllDirectoryW.argtypes = (wintypes.LPCWSTR,)
src_dir = os.path.split(inspect.getfile(graphlab))[0]
kernel32.SetDllDirectoryW(src_dir)
# Should work
graphlab.SArray(range(1000)).apply(lambda x: x)
If this is run, the the apply function should work fine with sframe.

IPython.parallel ValueError: cannot create an OBJECT array from memory buffer

I'm trying to write a function to be executed in several IPython engines. The function takes a pandas Series as an argument. Each element of the Series is a string, and the whole Series constitutes a corpus for TF.IDF computation.
After reading IPython parallel documentation and some tutorials, it seems to be quite straightforward to do, and I came up with the following:
import pandas as pd
from IPython.parallel import Client
def calculemus(corpus):
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(min_df=1, stop_words='english')
return vectorizer.fit_transform(corpus)
review = pd.read_csv('review.csv')['text']
review = review.fillna('')
client = Client()
r = client[-1].apply(calculemus, review).get()
BUT I got this error instead:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)/xxx/site-packages/IPython/zmq/serialize.pyc in unpack_apply_message(bufs, g, copy)
154 sa.data = m.bytes
155
--> 156 args = uncanSequence(map(unserialize, sargs), g)
157 kwargs = {}
158 for k in sorted(skwargs.iterkeys()):
/xxx/site-packages/IPython/utils/newserialized.pyc in unserialize(serialized)
175
176 def unserialize(serialized):
--> 177 return UnSerializeIt(serialized).getObject()
/xxx/site-packages/IPython/utils/newserialized.pyc in getObject(self)
159 buf = self.serialized.getData()
160 if isinstance(buf, (bytes, buffer, memoryview)):
--> 161 result = numpy.frombuffer(buf, dtype = self.serialized.metadata['dtype'])
162 else:
163 raise TypeError("Expected bytes or buffer/memoryview, but got %r"%type(buf))
ValueError: cannot create an OBJECT array from memory buffer
I'm not sure what the problem is, could someone enlighten me on this?
UPDATE
Apparently the error says exactly what it says. If I do this:
r = client[-1].apply(calculemus, np.array(review, dtype=str)).get()
it kinda works.
So the next question is, is this a feature or a limitation of IPython?
This is a bug in IPython 0.13 that should be fixed in master. There is a special case for serializing numpy arrays that avoids copying data, and this behavior is triggered by an isinstance(numpy.ndarray) check. This was inappropriate, because isinstance catches subclasses, which includes pandas objects, but those pandas objects (and array subclasses in general) should not be treated in the same way, as metadata will be lost, and reconstruction on the other side will often fail.
PS:
r = client[-1].apply(calculemus, np.array(review, dtype=str)).get()
is equivalent to
r = client[-1].apply_sync(calculemus, np.array(review, dtype=str))

Data conversion error with numpy

i am in the process of making mu code nicer and i saw that numpy has some very nifty functions already built-in. However the following code throws an error that i cannot
explain:
data = numpy.genfromtxt('table.oout',unpack=True,names=True,dtype=None)
real_ov_data=np.float32(data['real_overlap'])
ana_ov_data= np.float32(data['Analyt_overlap'])
length_data =np.float32(data['Residues'])
plot(length_data,real_ov_data,label="overlapped Peaks, exponential function",marker="x", markeredgecolor="blue", markersize=3.0, linestyle=" ",color="blue")
plot(length_data,ana_ov_data,label="expected overlapped Peaks",marker="o", markeredgecolor="green", markersize=3.0, linestyle=" ",color="green")
throws the error
Traceback (most recent call last):
File "length_vs_overlap.py", line 52, in <module>
real_ov_data=np.float32(data['real_overlap'])
ValueError: invalid literal for float(): real_overlap
>Exit code: 1
when i am trying to read the following file:
'Residues' 'Analyt_overlap' 'anz_analyt_overlap' 'real_overlap'
21 1.2502 29 0.0000
13 1.0306 25 0.0000
56 5.8513 84 2.8741
190 68.0940 329 28.4706
54 5.4271 83 2.4999
What am i doing wrong? My piece of code should be simple enough?
You've either repeated the header line, or you're specifying the names as a list.
That's causing each column to be read as a string type starting with the column title.

Categories

Resources