scikit-learn cross_validate parameter n_jobs=-1 causes error

scikit-learn cross_validate parameter n_jobs=-1 causes error - python

I use scikit-learn and the cross_validate function for a simple machine-learning model. I would like to set the functions parameter "n_jobs" to "-1" to allow to use of more than one core. However, if always get an error. Does anybody have a solution to this?
Here is the error message:
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 172, in get_preparation_data
main_mod_name = getattr(main_module.__spec__, "name", None)
AttributeError: module '__main__' has no attribute '__spec__'
My code looks the following:
#Go trough each alpha, save average RMSE from crossvalidation into array
for k, i in enumerate(alpha):
Model.set_params(Ridge__alpha = i)
scores = cross_validate(Model,X,y,scoring='neg_mean_squared_error',cv=10,n_jobs=-1)
avgRMSE[k] = np.mean(np.sqrt(-1*scores['test_score']))
I use the following versions:
Python 3.6
Anaconda 5.1
scikit-learn 0.19.1
scipy 1.0.0
numpy 1.14.2
The exact same script is running on my friends laptop (Win10 same OS as me) without any issues. I have no idea what the issue is here, so I really hope to get some help here :)

IFF I execute my script in an external system terminal AND write
if name == 'main':
at the beginning of the script: My issue is solved.
Would still be wondering if there is a better solution, especially because on my friends system it is running without that fix (and he is using the same OS and hardware)
Example:
#Imports are here
if __name__ == '__main__':
#All the other code start from here
for k, i in enumerate(alpha):
Model.set_params(Ridge__alpha = i)
scores = cross_validate(Model,X,y,scoring='neg_mean_squared_error',cv=10,n_jobs=-1)
avgRMSE[k] = np.mean(np.sqrt(-1*scores['test_score']))
#More code

Related

PyAutoGui module not working in PyCharm IDE

I am trying to use PyAutoGui's locateCenterOnScreen() function in PyCharm. However, it always throws an error even though it works in VS Code. I am running on an M1 Macbook Air w/ macOS Monterey. I have PyAutoGui version 0.9.53 installed. Does anyone know why it does this?
Traceback (most recent call last):
File "/Users/username/Desktop/Files/Programming/Projects/Auto Mining Tool/main.py", line 3, in <module>
x, y = pyautogui.locateCenterOnScreen('image.png')
File "/Users/username/Library/Python/3.8/lib/python/site-packages/pyautogui/__init__.py", line 175, in wrapper
return wrappedFunction(*args, **kwargs)
File "/Users/username/Library/Python/3.8/lib/python/site-packages/pyautogui/__init__.py", line 207, in locateCenterOnScreen
return pyscreeze.locateCenterOnScreen(*args, **kwargs)
File "/Users/andrewwalker/Library/Python/3.8/lib/python/site-packages/pyscreeze/__init__.py", line 413, in locateCenterOnScreen
coords = locateOnScreen(image, **kwargs)
File "/Users/andrewwalker/Library/Python/3.8/lib/python/site-packages/pyscreeze/__init__.py", line 372, in locateOnScreen
screenshotIm = screenshot(region=None) # the locateAll() function must handle cropping to return accurate coordinates, so don't pass a region here.
File "/Users/andrewwalker/Library/Python/3.8/lib/python/site-packages/pyscreeze/__init__.py", line 477, in _screenshot_osx
im = Image.open(tmpFilename)
NameError: name 'Image' is not defined

I can tell you why, but I'm not smart enough to know how to fix it.
The reason seems to stem from updates. I just ran into this problem today myself, I typically build projects and then never update modules etc.
Did you update modules prior to this happening?
It seems they've changed function names, and other modules that were built by other people either were not updated, or have import errors, resulting in pyscreeze and pyautogui not playing nice together.

Python Linear Regression in parallel - Scoop

I am trying to run in parallel a Linear Regression over 10000000 data point (4 features, 1 target variable) randomly generated from a Normal Distribution using Python's Scoop library. Here is the code:
import pandas as pd
import numpy as np
import random
from scoop import futures
import statsmodels.api as sm
from time import time
def linreg(vals):
global model
model = sm.OLS(y_vals,X_vals).fit()
return model
print(model.summary())
if __name__ == '__main__':
random.seed(42)
vals = pd.DataFrame(np.random.normal(loc = 3, scale = 100, size =(10000000,5)))
vals.columns = ['dep', 'ind1', 'ind2', 'ind3', 'ind4']
y_vals = vals['dep']
X_vals = vals[['ind1', 'ind2', 'ind3', 'ind4']]
bt = time()
model_vals = list(map(linreg, [1,2,3]))
mval = model_vals[0]
print(mval.summary())
serial_time = time() - bt
bt1 = time()
model_vals_1 = list(futures.map(linreg, [1,2,3]))
mval_1 = model_vals_1[0]
print(mval_1.summary())
parallel_time = time() - bt1
print(serial_time, parallel_time)`
However, after that the regression summary is indeed produced in serial - via the Python's standard map function - an error:
Traceback (most recent call last): File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\site-packages\scoop\bootstrap__main__.py", line 302, in b.main() File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\site-packages\scoop\bootstrap__main__.py", line 92, in main self.run() File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\site-packages\scoop\bootstrap__main__.py", line 290, in run futures_startup() File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\site-packages\scoop\bootstrap__main__.py", line 271, in futures_startup run_name="main" File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\site-packages\scoop\futures.py", line 64, in _startup result = _controller.switch(rootFuture, *args, **kargs) File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\site-packages\scoop_control.py", line 253, in runController raise future.exceptionValue File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\site-packages\scoop_control.py", line 127, in runFuture future.resultValue = future.callable(*future.args, **future.kargs) File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "Scoop_map_linear_regression1.py", line 33, in model_vals_1 = list(futures.map(linreg, [1,2,3])) File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\site-packages\scoop\futures.py", line 102, in _mapGenerator for future in _waitAll(*futures): File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\site-packages\scoop\futures.py", line 358, in _waitAll for f in _waitAny(future): File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\site-packages\scoop\futures.py", line 335, in _waitAny raise childFuture.exceptionValue NameError: name 'y_vals' is not defined
is produced afterwards. This means that the code stops at model_vals_1 = list(futures.map(linreg, [1,2,3]))
Please carefully note that in order to be able to run the code in parallel, it has to be launched from the command line specifying the -m scoop parameter, like this:
python -m scoop Scoop_map_linear_regression1.py
Indeed, should it be launched without the -m scoop parameter, it would not be parallelized and would indeed actually run, but just using two times the built in Python's map function (hence, running two times in serial), as how you would get reported in the Warnings. That is, without specifying the -m scoop parameter when launching it, futures.map would be replaced by map, while the goal is instead to indeed run it in parallel using futures.map.
This clarification is done so to avoid people answering that they solved the problem by simply launching the code without the -m scoop parameter, as already has happened here:
Python Parallel Computing - Scoop
where, as a consequence of this, the question was wrongly put on hold as off topic because no more reproducible.
Many thanks in advance and any comment is highly appreciated and welcome.

The solution is to pass, as second argument of futures.map (but not necessarily of map), only [1].
Indeed, even though the linreg function doesn't use the second argument passed to map, it still determines how many times the linreg function will be run. As an example, consider the following basic example:
def welcome(x):
print('Hello world!')
if __name__ == '__main__':
a = list(map(welcome, [1,2]))
The function welcome doesn't actually need any argument, but still the output will be
Hello world!
Hello world!
repeated two times, that is the length of the list passed as second argument.
In this specific case, this implies that the linear regression will be run 3 times by map, despite the fact that the regression output will appear just once as the summary is called outside the map.
The point is that, instead, it is not possible to run multiple times the linear regression with futures.map. The problem with is that, apparently, after the first run it actually deletes the used datasets, from which the impossibility to continue with the second and third run, and the consequent
NameError: name 'y_vals' is not defined
thrown at the end of the Trace. This should be visible by navigating over: scoop.futures source code
Didn't go over all of it, but I guess the problem should be related with greenlet switchers.

Why do I get "_curses.error: cbreak() returned ERR" when using TensorFlow CLI Debugger?

I am trying to use the TensorFlow CLI debugger in order to identify the operation which is causing a NaN during training of a network, but when I try to run the code I get an error:
_curses.error: cbreak() returned ERR
I'm running the code on an Ubuntu server, which I'm connecting to via SSH, and have tried to follow this tutorial.
I have tried using tf.add_check_numerics_ops(), but the layers in the network include while loops so are not compatible. This is the section of code where the error is being raised:
import tensorflow as tf
from tensorflow.python import debug as tf_debug
...
#Prepare data
train_data, val_data, test_data = dataset.prepare_datasets(model_config)
sess = tf.Session()
sess = tf_debug.LocalCLIDebugWrapperSession(sess)
# Create iterators
handle = tf.placeholder(tf.string, shape=[])
iterator = tf.data.Iterator.from_string_handle(handle, train_data.output_types, train_data.output_shapes)
mixed_spec, voice_spec, mixed_audio, voice_audio = iterator.get_next()
training_iterator = train_data.make_initializable_iterator()
validation_iterator = val_data.make_initializable_iterator()
testing_iterator = test_data.make_initializable_iterator()
training_handle = sess.run(training_iterator.string_handle())
...
and the full error is:
Traceback (most recent call last):
File "main.py", line 64, in <module>
#ex.automain
File "/home/enterprise.internal.city.ac.uk/acvn728/.local/lib/python3.5/site-packages/sacred/experiment.py", line 137, in automain
self.run_commandline()
File "/home/enterprise.internal.city.ac.uk/acvn728/.local/lib/python3.5/site-packages/sacred/experiment.py", line 260, in run_commandline
return self.run(cmd_name, config_updates, named_configs, {}, args)
File "/home/enterprise.internal.city.ac.uk/acvn728/.local/lib/python3.5/site-packages/sacred/experiment.py", line 209, in run
run()
File "/home/enterprise.internal.city.ac.uk/acvn728/.local/lib/python3.5/site-packages/sacred/run.py", line 221, in __call__
self.result = self.main_function(*args)
File "/home/enterprise.internal.city.ac.uk/acvn728/.local/lib/python3.5/site-packages/sacred/config/captured_function.py", line 46, in captured_function
result = wrapped(*args, **kwargs)
File "main.py", line 95, in do_experiment
training_handle = sess.run(training_iterator.string_handle())
File "/home/enterprise.internal.city.ac.uk/acvn728/.local/lib/python3.5/site-packages/tensorflow/python/debug/wrappers/framework.py", line 455, in run
is_callable_runner=bool(callable_runner)))
File "/home/enterprise.internal.city.ac.uk/acvn728/.local/lib/python3.5/site-packages/tensorflow/python/debug/wrappers/local_cli_wrapper.py", line 255, in on_run_start
self._run_start_response = self._launch_cli()
File "/home/enterprise.internal.city.ac.uk/acvn728/.local/lib/python3.5/site-packages/tensorflow/python/debug/wrappers/local_cli_wrapper.py", line 431, in _launch_cli
title_color=self._title_color)
File "/home/enterprise.internal.city.ac.uk/acvn728/.local/lib/python3.5/site-packages/tensorflow/python/debug/cli/curses_ui.py", line 492, in run_ui
self._screen_launch(enable_mouse_on_start=enable_mouse_on_start)
File "/home/enterprise.internal.city.ac.uk/acvn728/.local/lib/python3.5/site-packages/tensorflow/python/debug/cli/curses_ui.py", line 445, in _screen_launch
curses.cbreak()
_curses.error: cbreak() returned ERR
I'm pretty new to using Ubuntu (and TensorFlow), but as far as I can tell the server does have ncurses installed, which should allow the required curses based interface:
acvn728#america:~/MScFinalProject$ dpkg -l '*ncurses*' | grep '^ii'
ii libncurses5:amd64 6.0+20160213-1ubuntu1 amd64 shared libraries for terminal handling
ii libncursesw5:amd64 6.0+20160213-1ubuntu1 amd64 shared libraries for terminal handling (wide character support)
ii ncurses-base 6.0+20160213-1ubuntu1 all basic terminal type definitions
ii ncurses-bin 6.0+20160213-1ubuntu1 amd64 terminal-related programs and man pages
ii ncurses-term 6.0+20160213-1ubuntu1 all additional terminal type definitions

Problem solved! The solution was to change
sess = tf_debug.LocalCLIDebugWrapperSession(sess)
to
sess = tf_debug.LocalCLIDebugWrapperSession(sess, ui_type="readline")
This is similar to the solution to this question, but I I think it is important to note that they are different because a) it refers to a different function and a different API and b) I wasn't trying to run from an IDE, as mentioned in that solution.

cbreak would return ERR if you run a curses application that is not on a real terminal (i.e., something that works with POSIX termios calls).
From the description,
but the layers in the network include while loops so are not compatible
it does not seem you are running in a terminal.

hdbscan Parallel Error When Running After Import

I am building and fitting an hdbscan model on my data and when I run the script from within the file it works well and quickly, but when I import the file and run it from 'outside' it goes into a weird loop that I don't understand how it started. And I get the following error:
ImportError: [joblib] Attempting to do parallel computing without protecting your import on a system that does not support forking. To use parallel-computing in a script, you must protect your main loop using "if __name__ == '__main__'". Please see the joblib documentation on Parallel for more information
Here is an excerpt of the code:
df_pos_raw, df_pos_training = pre_process_data(df_pos)
df_pos_training_std = standardize_df(df_pos_training) # Standardized data, column-wise
print "generating model"
pos_cls = hdbscan.HDBSCAN(min_cluster_size=10, prediction_data=True)
print "fitting model to data"
pos_cls.fit(df_pos_training_std)
print 'done fitting model'
# sns.distplot(pos_cls.labels_, bins=len(set(pos_cls.labels_)))
df_filtered = filter_cons_types(df, [3, 5])
print "Done. returning variables"
return pos_cls, df_filtered
Here is the output when running from 'outside' the file:
Traceback (most recent call last):
File "<string>", line 1, in <module>
generating model
File "C:\ProgramData\Anaconda2\Lib\multiprocessing\forking.py", line 380, in main
fitting model to data
prepare(preparation_data)
File "C:\ProgramData\Anaconda2\Lib\multiprocessing\forking.py", line 510, in prepare
'__parents_main__', file, path_name, etc
File "C:\Users\sareetn\PycharmProjects\Arad\DataImputation\ClusteringExtrapolation\Dev\run_clustering_based_prediction.py", line 4, in <module>
model, raw_df = clustering()
File "C:\Users\sareetn\PycharmProjects\Arad\DataImputation\ClusteringExtrapolation\Dev\clustering_model_constype_3_5.py", line 86, in main
pos_cls.fit(df_pos_training_std)
File "C:\Users\sareetn\PycharmProjects\Arad\venv\lib\site-packages\hdbscan\hdbscan_.py", line 816, in fit
self._min_spanning_tree) = hdbscan(X, **kwargs)
File "C:\Users\sareetn\PycharmProjects\Arad\venv\lib\site-packages\hdbscan\hdbscan_.py", line 543, in hdbscan
core_dist_n_jobs, **kwargs)
File "C:\Users\sareetn\PycharmProjects\Arad\venv\lib\site-packages\sklearn\externals\joblib\memory.py", line 362, in __call__
return self.func(*args, **kwargs)
File "C:\Users\sareetn\PycharmProjects\Arad\venv\lib\site-packages\hdbscan\hdbscan_.py", line 239, in _hdbscan_boruvka_kdtree
n_jobs=core_dist_n_jobs, **kwargs)
File "hdbscan/_hdbscan_boruvka.pyx", line 375, in hdbscan._hdbscan_boruvka.KDTreeBoruvkaAlgorithm.__init__ (hdbscan/_hdbscan_boruvka.c:5195)
File "hdbscan/_hdbscan_boruvka.pyx", line 411, in hdbscan._hdbscan_boruvka.KDTreeBoruvkaAlgorithm._compute_bounds (hdbscan/_hdbscan_boruvka.c:5915)
File "C:\Users\sareetn\PycharmProjects\Arad\venv\lib\site-packages\sklearn\externals\joblib\parallel.py", line 749, in __call__
n_jobs = self._initialize_backend()
File "C:\Users\sareetn\PycharmProjects\Arad\venv\lib\site-packages\sklearn\externals\joblib\parallel.py", line 547, in _initialize_backend
**self._backend_args)
File "C:\Users\sareetn\PycharmProjects\Arad\venv\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py", line 305, in configure
'[joblib] Attempting to do parallel computing '
ImportError: [joblib] Attempting to do parallel computing without protecting your import on a system that does not support forking. To use parallel-computing in a script, you must protect your main loop using "if __name__ == '__main__'". Please see the joblib documentation on Parallel for more information
generating model
fitting model to data
generating model
fitting model to data
generating model
fitting model to data
Thank you very much in advance!!

A friend helped me figure it out-
Clustering uses a library called joblib that splits the job into parallel processes. When running such functions on a Windows machine, care needs to be taken to make sure we use
if __name__ == '__main__'
in order to protect the code and allow the parallel processing to work.
After adding
if __name__ == '__main__'
and placing all of the code there, the clustering ran smoothly and quickly

How can i create kgring file using python?

I'm new in python programming. I try to learn cProfiler and using pyprof2calltree . I'm using python 2.7 , windows 7 .I installed pyprof2calltree 1.3.2 and qcachegrind074-x86.The problem is that i coudldn't find any tutorial about using qcachegrind074 in windows and all codes are in another operating systems. I wrote the very simple code(the below code) .I just want to create a kgring but raised error !. where is my problem in thi code and how can i create a kgring file ?
def r():
print range(1,1000)
if __name__ =='__main__':
from cProfile import Profile
profiler = Profile()
profiler.run('r()')
from pyprof2calltree import convert, visualize
visualize(profiler.getstats())
convert(profiler.getstats(), 'c:/profiling_results.kgrind')
The errors are :
Traceback(most recent call last): File "C:/..../pyprof2example, line
11, in visualize( profiler.getstats() ) File
"C:...\lib\pyprof2calltree.py, line 306, in visualize
converter.visualize() File "C:...\lib\pyprof2calltree.py", line 145,
in visualize self.output(f) File "C:...\lib\pyprof2calltree.py",
line 133, in output
self._entry(entry) File "C:\P...\lib\pyprof2calltree.py", line 208, in _entry
for subentry, call_info in calls: ValueError: too many values to unpack
Thanks

It seems you have to execute your code via cmd line. Here is the discussion.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

scikit-learn cross_validate parameter n_jobs=-1 causes error - python

Related

PyAutoGui module not working in PyCharm IDE

Python Linear Regression in parallel - Scoop

Why do I get "_curses.error: cbreak() returned ERR" when using TensorFlow CLI Debugger?

hdbscan Parallel Error When Running After Import

How can i create kgring file using python?

Categories

Resources