I got errors with large dataset clustermap - python

I'm trying to create a clustermap for a large dataset using the command line :
sns.clustermap(dataset)
and visualize it using :
plt.show()
But I'm getting this error:
Traceback (most recent call last):
File "C:\Users\admin\PycharmProjects\pythonProject1\RNA_expression.py", line 44, in <module>
sns.clustermap(exp)
File "C:\Users\admin\.virtualenvs\pythonProject1-6HarIJ1S\lib\site-packages\seaborn\_decorators.py", line 46, in inner_f
return f(**kwargs)
File "C:\Users\admin\.virtualenvs\pythonProject1-6HarIJ1S\lib\site-packages\seaborn\matrix.py", line 1406, in clustermap
return plotter.plot(metric=metric, method=method,
File "C:\Users\admin\.virtualenvs\pythonProject1-6HarIJ1S\lib\site-packages\seaborn\matrix.py", line 1219, in plot
self.plot_dendrograms(row_cluster, col_cluster, metric, method,
File "C:\Users\admin\.virtualenvs\pythonProject1-6HarIJ1S\lib\site-packages\seaborn\matrix.py", line 1074, in plot_dendrograms
self.dendrogram_col = dendrogram(
File "C:\Users\admin\.virtualenvs\pythonProject1-6HarIJ1S\lib\site-packages\scipy\cluster\hierarchy.py", line 3653, in _dendrogram_calculate_info
_dendrogram_calculate_info(
[Previous line repeated 237 more times]
File "C:\Users\admin\.virtualenvs\pythonProject1-6HarIJ1S\lib\site-packages\scipy\cluster\hierarchy.py", line 3620, in _dendrogram_calculate_info
_dendrogram_calculate_info(
File "C:\Users\admin\.virtualenvs\pythonProject1-6HarIJ1S\lib\site-packages\scipy\cluster\hierarchy.py", line 3653, in _dendrogram_calculate_info
_dendrogram_calculate_info(
[Previous line repeated 646 more times]
File "C:\Users\admin\.virtualenvs\pythonProject1-6HarIJ1S\lib\site-packages\scipy\cluster\hierarchy.py", line 3620, in _dendrogram_calculate_info
_dendrogram_calculate_info(
File "C:\Users\admin\.virtualenvs\pythonProject1-6HarIJ1S\lib\site-packages\scipy\cluster\hierarchy.py", line 3550, in _dendrogram_calculate_info
_append_singleton_leaf_node(Z, p, n, level, lvs, ivl,
File "C:\Users\admin\.virtualenvs\pythonProject1-6HarIJ1S\lib\site-packages\scipy\cluster\hierarchy.py", line 3425, in _append_singleton_leaf_node
ivl.append(str(int(i)))
RecursionError: maximum recursion depth exceeded while getting the str of an object
I tried resolving it by increasing the recursion limit :
import sys
sys.setrecursionlimit(2000)
But then I got this line :
Process finished with exit code -107341571 (0xC00000FD)
Do you have any other solution that I can try to get my clustermap?

Related

Pandas plotting routine fails with NoneType is not callable, but only when run inside pdb

The following code
If I run the following code in pdb (i.e. with python -m pdb)
if __name__=='__main__':
import pandas as pd
df=pd.DataFrame([[0,1,2],[63,146, 135]])
df.plot.area()
it fails with a TypeError inside a numpy routine that's called by matplotlib:
> python -m pdb test_dtype.py
> /home/jhaiduce/financial/forecasting/test_dtype.py(1)<module>()
-> if __name__=='__main__':
(Pdb) r
QSocketNotifier: Can only be used with threads started with QThread
--Return--
> /home/jhaiduce/financial/forecasting/test_dtype.py(6)<module>()->None
-> df.plot.area()
(Pdb) c
Traceback (most recent call last):
File "/usr/lib64/python3.10/site-packages/numpy/core/getlimits.py", line 384, in __new__
dtype = numeric.dtype(dtype)
TypeError: 'NoneType' object is not callable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib64/python3.10/pdb.py", line 1726, in main
pdb._runscript(mainpyfile)
File "/usr/lib64/python3.10/pdb.py", line 1586, in _runscript
self.run(statement)
File "/usr/lib64/python3.10/bdb.py", line 597, in run
exec(cmd, globals, locals)
File "<string>", line 1, in <module>
File "/home/jhaiduce/financial/forecasting/test_dtype.py", line 6, in <module>
df.plot.area()
File "/usr/lib64/python3.10/site-packages/pandas/plotting/_core.py", line 1496, in area
return self(kind="area", x=x, y=y, **kwargs)
File "/usr/lib64/python3.10/site-packages/pandas/plotting/_core.py", line 972, in __call__
return plot_backend.plot(data, kind=kind, **kwargs)
File "/usr/lib64/python3.10/site-packages/pandas/plotting/_matplotlib/__init__.py", line 71, in plot
plot_obj.generate()
File "/usr/lib64/python3.10/site-packages/pandas/plotting/_matplotlib/core.py", line 294, in generate
self._post_plot_logic_common(ax, self.data)
File "/usr/lib64/python3.10/site-packages/pandas/plotting/_matplotlib/core.py", line 473, in _post_plot_logic_common
self._apply_axis_properties(ax.xaxis, rot=self.rot, fontsize=self.fontsize)
File "/usr/lib64/python3.10/site-packages/pandas/plotting/_matplotlib/core.py", line 561, in _apply_axis_properties
labels = axis.get_majorticklabels() + axis.get_minorticklabels()
File "/usr/lib64/python3.10/site-packages/matplotlib/axis.py", line 1201, in get_majorticklabels
ticks = self.get_major_ticks()
File "/usr/lib64/python3.10/site-packages/matplotlib/axis.py", line 1371, in get_major_ticks
numticks = len(self.get_majorticklocs())
File "/usr/lib64/python3.10/site-packages/matplotlib/axis.py", line 1277, in get_majorticklocs
return self.major.locator()
File "/usr/lib64/python3.10/site-packages/matplotlib/ticker.py", line 2113, in __call__
vmin, vmax = self.axis.get_view_interval()
File "/usr/lib64/python3.10/site-packages/matplotlib/axis.py", line 1987, in getter
return getattr(getattr(self.axes, lim_name), attr_name)
File "/usr/lib64/python3.10/site-packages/matplotlib/axes/_base.py", line 781, in viewLim
self._unstale_viewLim()
File "/usr/lib64/python3.10/site-packages/matplotlib/axes/_base.py", line 776, in _unstale_viewLim
self.autoscale_view(**{f"scale{name}": scale
File "/usr/lib64/python3.10/site-packages/matplotlib/axes/_base.py", line 2932, in autoscale_view
handle_single_axis(
File "/usr/lib64/python3.10/site-packages/matplotlib/axes/_base.py", line 2895, in handle_single_axis
x0, x1 = locator.nonsingular(x0, x1)
File "/usr/lib64/python3.10/site-packages/matplotlib/ticker.py", line 1654, in nonsingular
return mtransforms.nonsingular(v0, v1, expander=.05)
File "/usr/lib64/python3.10/site-packages/matplotlib/transforms.py", line 2880, in nonsingular
if maxabsvalue < (1e6 / tiny) * np.finfo(float).tiny:
File "/usr/lib64/python3.10/site-packages/numpy/core/getlimits.py", line 387, in __new__
dtype = numeric.dtype(type(dtype))
TypeError: 'NoneType' object is not callable
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /usr/lib64/python3.10/site-packages/numpy/core/getlimits.py(387)__new__()
-> dtype = numeric.dtype(type(dtype))
(Pdb)
The error occurs only when run in the debugger; the program runs as normal when run outside the debugger.
Any idea what could be the cause of this?

TypeError: _get_dataset_for_single_task() got an unexpected keyword argument 'sequence_length' #790

I got the following error in the evaluation of a t5 model:
model.batch_size = train_batch_size * 4
model.eval(
mixture_or_task_name="trivia_all",
checkpoint_steps=-1 #"all"
)
Traceback (most recent call last):
File "train.py", line 140, in <module>
checkpoint_steps=-1 #"all"
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/t5/models/mtf_model.py", line 267, in eval
self._model_dir, dataset_fn, summary_dir, checkpoint_steps)
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/mesh_tensorflow/transformer/utils.py", line 2025, in eval_model
for d in decode(estimator, input_fn, vocabulary, checkpoint_path)
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/mesh_tensorflow/transformer/utils.py", line 2024, in <listcomp>
d.decode("utf-8") if isinstance(d, bytes) else d
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/mesh_tensorflow/transformer/utils.py", line 1114, in decode
for i, result in enumerate(result_iter):
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3132, in predict
rendezvous.raise_errors()
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 150, in raise_errors
six.reraise(typ, value, traceback)
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/six.py", line 703, in reraise
raise value
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3126, in predict
yield_single_examples=yield_single_examples):
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 611, in predict
input_fn, ModeKeys.PREDICT)
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1007, in _get_features_from_input_fn
result = self._call_input_fn(input_fn, mode)
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3041, in _call_input_fn
return input_fn(**kwargs)
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/mesh_tensorflow/transformer/utils.py", line 1182, in input_fn
ds = dataset.dataset_fn(sequence_length=sequence_length)
TypeError: _get_dataset_for_single_task() got an unexpected keyword argument 'sequence_length'
There is a similar issue but I didn't get the solution which is one line.
https://github.com/google-research/text-to-text-transfer-transformer/issues/631
I had installed t5 v0.6.0, which wasn't the newest version. When I installed v0.9.0, the problem was resolved.
pip install t5==0.9.0

IndexError: list index out of range right after printing len(list) > 0

I literally have these two lines of code in sequence:
print(len(counter.ondas))
onda = counter.ondas[-1]
I got 13 printed for the first line and it crashed giving me tracebook to the onda = counter.ondas[-1] line saying IndexError: list index out of range right after printing len(list).
It works thousands of times before crashing. I have no clue on how to approach this problem.
Output for print(counter.ondas):
[Onda([<workers.mov.Mov object at 0x244EA050>], [Candle(4, 'GBP_JPY', Timestamp('2017-06-12 16:59:00'), 138.884, 138.897, 138.674, 138.76, 10957.0, True)]), Onda([<workers.mov.Mov object at 0x245073D0>], [Candle(4, 'GBP_JPY', Timestamp('2017-06-12 16:59:00'), 138.884, 138.897, 138.674, 138.76, 10957.0, True)...]
Output for print(type(counter.ondas)):
<class 'list'>
Output for print(isinstance(counter.ondas, list)):
True
Full Error Traceback
Traceback (most recent call last):
File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\pool.py", line 44, in mapstar
return list(map(*args))
File "C:\Users\joaoa\PycharmProjects\aquitania\general_manager.py", line 88, in load_observer_manager
observer_manager.update_load_run_data()
File "C:\Users\joaoa\PycharmProjects\aquitania\observer\management\observer_manager.py", line 78, in update_load_run_data
self.load_run_data()
File "C:\Users\joaoa\PycharmProjects\aquitania\observer\management\observer_manager.py", line 91, in load_run_data
self.feeder.exec_df(df[self.start_date:])
File "C:\Users\joaoa\PycharmProjects\aquitania\data_source\feeder.py", line 429, in exec_df
self.feed(candle)
File "C:\Users\joaoa\PycharmProjects\aquitania\data_source\feeder.py", line 98, in feed
self.make_candle(ts, candle, criteria_table)
File "C:\Users\joaoa\PycharmProjects\aquitania\data_source\feeder.py", line 126, in make_candle
self.set_values(ts, candle)
File "C:\Users\joaoa\PycharmProjects\aquitania\data_source\feeder.py", line 310, in set_values
ts_obs.feed_complete(self._candles[ts])
File "C:\Users\joaoa\PycharmProjects\aquitania\observer\management\observer_loader.py", line 100, in feed_complete
observer.update_last_candle(candle, store_candle)
File "C:\Users\joaoa\PycharmProjects\aquitania\observer\abstract\observer_abc.py", line 93, in update_last_candle
self.set_observe(self.update_method(candle))
File "C:\Users\joaoa\PycharmProjects\aquitania\observer\ondas\ondas_inside.py", line 393, in update_method
self.update_routine(counter_id=2, candle=candle)
File "C:\Users\joaoa\PycharmProjects\aquitania\observer\ondas\ondas_inside.py", line 412, in update_routine
onda = list(counter.ondas)[-1]
IndexError: list index out of range
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:/Users/joaoa/PycharmProjects/aquitania/general_manager.py", line 145, in <module>
gm.run()
File "C:/Users/joaoa/PycharmProjects/aquitania/general_manager.py", line 113, in run
list_of_observer_managers = self.load_all_observer_managers()
File "C:/Users/joaoa/PycharmProjects/aquitania/general_manager.py", line 60, in load_all_observer_managers
observer = currency_pool.map(self.load_observer_manager, list_of_currencies)
File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\pool.py", line 644, in get
raise self._value
IndexError: list index out of range
I was able to idenfity the issue. The error was being thrown in a separate process than the one that was being printed and for some reason the error ocurred only after the process that was printing the results was interrupted by another unrelated error.
Weird behavior, but now I am able to tackle the problem.
Debugging in a single process mode helped a lot.
Thank you for your help.

Memory error on Random Forest Classifier prediction

I have fitted a Random Forest Classifier on my dataset containing 7 features and about 1 million rows or records.
Following is my code.
randForestClassifier=RandomForestClassifier(n_estimators=10,max_depth=3)
randForestClassifier.fit(X_train,y)
pred=randForestClassifier.predict(featues_test)
I am getting Memory error when I use predict method of my classifier.How to fix it?
Following is my complete log
randForestClassifier.predict(featues_test)
Traceback (most recent call last):
File "<ipython-input-15-0b7612d6e958>", line 1, in <module>
randForestClassifier.predict(featues_test)
File "C:\Python27\lib\site-packages\sklearn\ensemble\forest.py", line 462, in predict
proba = self.predict_proba(X)
File "C:\Python27\lib\site-packages\sklearn\ensemble\forest.py", line 513, in predict_proba
for e in self.estimators_)
File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 659, in __call__
self.dispatch(function, args, kwargs)
File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 406, in dispatch
job = ImmediateApply(func, args, kwargs)
File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 140, in __init__
self.results = func(*args, **kwargs)
File "C:\Python27\lib\site-packages\sklearn\ensemble\forest.py", line 106, in _parallel_helper
return getattr(obj, methodname)(*args, **kwargs)
File "C:\Python27\lib\site-packages\sklearn\tree\tree.py", line 592, in predict_proba
proba = self.tree_.predict(X)
File "sklearn/tree/_tree.pyx", line 3207, in sklearn.tree._tree.Tree.predict (sklearn\tree\_tree.c:24468)
File "sklearn/tree/_tree.pyx", line 3209, in sklearn.tree._tree.Tree.predict (sklearn\tree\_tree.c:24340)
MemoryError
Yes, you are getting the MemoryError at randForestClassifier.predict(featues_test), as shown by the stack trace:
File "<ipython-input-15-0b7612d6e958>", line 1, in <module>
randForestClassifier.predict(featues_test)
The remaining lines of the stack trace shows that the problems comes from sklearn, in the C code: sklearn\tree\_tree.c:24340

MDP FANode issues

I'm trying to perform factorial analysis on a distance matrix (made of distances between about 1700 points, all ranging between 0.0 and 1.0, inclusively). I'm a total FA newbie.
Anyways, this code:
fan=mdp.nodes.FANode()
far=fan.execute(a)
# a is a numpy.array, size 1780x1780
Gives me:
Traceback (most recent call last):
File "<pyshell#29>", line 1, in <module>
far=fan.execute(a)
File "/usr/lib/pymodules/python2.7/mdp/signal_node.py", line 575, in execute
self._pre_execution_checks(x)
File "/usr/lib/pymodules/python2.7/mdp/signal_node.py", line 451, in _pre_execution_checks
self._if_training_stop_training()
File "/usr/lib/pymodules/python2.7/mdp/signal_node.py", line 431, in _if_training_stop_training
self.stop_training()
File "/usr/lib/pymodules/python2.7/mdp/signal_node.py", line 556, in stop_training
self._train_seq[self._train_phase][1](*args, **kwargs)
File "/usr/lib/pymodules/python2.7/mdp/nodes/em_nodes.py", line 93, in _stop_training
A = normal(0., sqrt(scale/k), size=(d, k)).astype(typ)
File "mtrand.pyx", line 1279, in mtrand.RandomState.normal (numpy/random/mtrand/mtrand.c:6943)
ValueError: scale <= 0
I tried replacing 0 values with 0.00001, to no avail. Any idea what this might mean?

Categories

Resources