Node2vec and networkx - python

I am attempting to run node2vec on a directed networkx network I have created. The network looks like this:
OutEdgeDataView([(7, 1, {'senderId': 7, 'weight': 273}), (7, 8, {'senderId': 7, 'weight': 319}), (7, 9, {'senderId': 7, 'weight': 137})....
With each node having an integer ID and a weight linking one node to another.
I am trying to use the node2vec module on this network as:
from node2vec import Node2Vec
node2vec = Node2Vec(mail_n_basic, dimensions=64, walk_length=30, num_walks=200, workers=4)
And am returned with this error, any help explaining the error would be much appreciated:
---------------------------------------------------------------------------
_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\Andrew\Anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 398, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "C:\Users\Andrew\Anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 561, in __call__
return self.func(*args, **kwargs)
File "C:\Users\Andrew\Anaconda3\lib\site-packages\joblib\parallel.py", line 224, in __call__
for func, args, kwargs in self.items]
File "C:\Users\Andrew\Anaconda3\lib\site-packages\joblib\parallel.py", line 224, in <listcomp>
for func, args, kwargs in self.items]
File "C:\Users\Andrew\Anaconda3\lib\site-packages\node2vec\node2vec.py", line 51, in parallel_generate_walks
walk_to = np.random.choice(walk_options, size=1)[0]
File "mtrand.pyx", line 1126, in mtrand.RandomState.choice
ValueError: a must be non-empty
"""
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
<ipython-input-58-3ac160061528> in <module>()
1
----> 2 node2vec = Node2Vec(mail_n_basic, dimensions=64, walk_length=30, num_walks=200, workers=4)
~\Anaconda3\lib\site-packages\node2vec\node2vec.py in __init__(self, graph, dimensions, walk_length, num_walks, p, q, weight_key, workers, sampling_strategy)
111
112 self.d_graph = self._precompute_probabilities()
--> 113 self.walks = self._generate_walks()
114
115 def _precompute_probabilities(self):
~\Anaconda3\lib\site-packages\node2vec\node2vec.py in _generate_walks(self)
178 self.NEIGHBORS_KEY,
179 self.PROBABILITIES_KEY) for idx, num_walks
--> 180 in enumerate(num_walks_lists, 1))
181
182 walks = flatten(walk_results)
~\Anaconda3\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
960
961 with self._backend.retrieval_context():
--> 962 self.retrieve()
963 # Make sure that we get a last message telling us we are done
964 elapsed_time = time.time() - self._start_time
~\Anaconda3\lib\site-packages\joblib\parallel.py in retrieve(self)
863 try:
864 if getattr(self._backend, 'supports_timeout', False):
--> 865 self._output.extend(job.get(timeout=self.timeout))
866 else:
867 self._output.extend(job.get())
~\Anaconda3\lib\site-packages\joblib\_parallel_backends.py in wrap_future_result(future, timeout)
513 AsyncResults.get from multiprocessing."""
514 try:
--> 515 return future.result(timeout=timeout)
516 except LokyTimeoutError:
517 raise TimeoutError()
~\Anaconda3\lib\site-packages\joblib\externals\loky\_base.py in result(self, timeout)
429 raise CancelledError()
430 elif self._state == FINISHED:
--> 431 return self.__get_result()
432 else:
433 raise TimeoutError()
~\Anaconda3\lib\site-packages\joblib\externals\loky\_base.py in __get_result(self)
380 def __get_result(self):
381 if self._exception:
--> 382 raise self._exception
383 else:
384 return self._result
ValueError: a must be non-empty

I'm the author of this library.
If you are using Windows, parallel execution won't work because joblib and Windows issues.
Run the same code with the updated version pip install -U node2vec and when constructing the Node2Vec class, pass workers=1

Related

RolloutWorker problem when try to execute PPOConfig: Exception raised in creation task: The actor died because of an error raised in its creation task

I am trying to follow the steps mentioned on "Getting Started with RLlib" (https://docs.ray.io/en/latest/rllib/rllib-training.html) along with my custom environment.
However my run doesn't work in the first code block show in the guide.
This is actually the script I m trying to run:
import ray
from ray.rllib.algorithms.ppo import PPOConfig
from ray.tune.logger import pretty_print
from gym_sw_env.envs.Examplev2 import Example_v2 #this is my custom env
ray.init(ignore_reinit_error=True)
algo = (
PPOConfig()
.rollouts(num_rollout_workers=1)
.resources(num_gpus=0)
.environment(env=Example_v2)
.build()
)
While this is the error I have:
(RolloutWorker pid=24420) 2022-12-17 11:36:34,235 ERROR worker.py:763 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=24420, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000001693BEB1C10>)
(RolloutWorker pid=24420) File "python\ray\_raylet.pyx", line 859, in ray._raylet.execute_task
(RolloutWorker pid=24420) File "python\ray\_raylet.pyx", line 863, in ray._raylet.execute_task
(RolloutWorker pid=24420) File "python\ray\_raylet.pyx", line 810, in ray._raylet.execute_task.function_executor
(RolloutWorker pid=24420) File "C:\Users\**MYUSER**\Anaconda3\lib\site-packages\ray\_private\function_manager.py", line 674, in actor_method_executor
(RolloutWorker pid=24420) return method(__ray_actor, *args, **kwargs)
(RolloutWorker pid=24420) File "C:\Users\**MYUSER**\Anaconda3\lib\site-packages\ray\util\tracing\tracing_helper.py", line 466, in _resume_span
(RolloutWorker pid=24420) return method(self, *_args, **_kwargs)
(RolloutWorker pid=24420) File "C:\Users\**MYUSER**\Anaconda3\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 492, in __init__
(RolloutWorker pid=24420) self.env = env_creator(copy.deepcopy(self.env_context))
(RolloutWorker pid=24420) File "C:\Users\**MYUSER**\Anaconda3\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 2139, in <lambda>
(RolloutWorker pid=24420) return env_id, lambda cfg: env_specifier(cfg)
(RolloutWorker pid=24420) TypeError: __init__() takes 1 positional argument but 2 were given
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In [3], line 7
2 from ray.tune.logger import pretty_print
3 from gym_sw_env.envs.Examplev2 import Example_v2
6 algo = (
----> 7 PPOConfig()
8 .rollouts(num_rollout_workers=1)
9 .resources(num_gpus=0)
10 .environment(env=Example_v2)
11 .build()
12 )
File ~\Anaconda3\lib\site-packages\ray\rllib\algorithms\algorithm_config.py:311, in AlgorithmConfig.build(self, env, logger_creator)
308 if logger_creator is not None:
309 self.logger_creator = logger_creator
--> 311 return self.algo_class(
312 config=self.to_dict(),
313 env=self.env,
314 logger_creator=self.logger_creator,
315 )
File ~\Anaconda3\lib\site-packages\ray\rllib\algorithms\algorithm.py:414, in Algorithm.__init__(self, config, env, logger_creator, **kwargs)
402 # Initialize common evaluation_metrics to nan, before they become
403 # available. We want to make sure the metrics are always present
404 # (although their values may be nan), so that Tune does not complain
405 # when we use these as stopping criteria.
406 self.evaluation_metrics = {
407 "evaluation": {
408 "episode_reward_max": np.nan,
(...)
411 }
412 }
--> 414 super().__init__(config=config, logger_creator=logger_creator, **kwargs)
416 # Check, whether `training_iteration` is still a tune.Trainable property
417 # and has not been overridden by the user in the attempt to implement the
418 # algos logic (this should be done now inside `training_step`).
419 try:
File ~\Anaconda3\lib\site-packages\ray\tune\trainable\trainable.py:161, in Trainable.__init__(self, config, logger_creator, remote_checkpoint_dir, custom_syncer, sync_timeout)
159 start_time = time.time()
160 self._local_ip = ray.util.get_node_ip_address()
--> 161 self.setup(copy.deepcopy(self.config))
162 setup_time = time.time() - start_time
163 if setup_time > SETUP_TIME_THRESHOLD:
File ~\Anaconda3\lib\site-packages\ray\rllib\algorithms\algorithm.py:549, in Algorithm.setup(self, config)
536 except RayActorError as e:
537 # In case of an actor (remote worker) init failure, the remote worker
538 # may still exist and will be accessible, however, e.g. calling
539 # its `sample.remote()` would result in strange "property not found"
540 # errors.
541 if e.actor_init_failed:
542 # Raise the original error here that the RolloutWorker raised
543 # during its construction process. This is to enforce transparency
(...)
547 # - e.args[0].args[2]: The original Exception (e.g. a ValueError due
548 # to a config mismatch) thrown inside the actor.
--> 549 raise e.args[0].args[2]
550 # In any other case, raise the RayActorError as-is.
551 else:
552 raise e
File python\ray\_raylet.pyx:852, in ray._raylet.execute_task()
File python\ray\_raylet.pyx:906, in ray._raylet.execute_task()
File python\ray\_raylet.pyx:859, in ray._raylet.execute_task()
File python\ray\_raylet.pyx:863, in ray._raylet.execute_task()
File python\ray\_raylet.pyx:810, in ray._raylet.execute_task.function_executor()
File ~\Anaconda3\lib\site-packages\ray\_private\function_manager.py:674, in actor_method_executor()
672 return method(*args, **kwargs)
673 else:
--> 674 return method(__ray_actor, *args, **kwargs)
File ~\Anaconda3\lib\site-packages\ray\util\tracing\tracing_helper.py:466, in _resume_span()
464 # If tracing feature flag is not on, perform a no-op
465 if not _is_tracing_enabled() or _ray_trace_ctx is None:
--> 466 return method(self, *_args, **_kwargs)
468 tracer: _opentelemetry.trace.Tracer = _opentelemetry.trace.get_tracer(
469 __name__
470 )
472 # Retrieves the context from the _ray_trace_ctx dictionary we
473 # injected.
File ~\Anaconda3\lib\site-packages\ray\rllib\evaluation\rollout_worker.py:492, in __init__()
485 # Create a (single) env for this worker.
486 if not (
487 worker_index == 0
488 and num_workers > 0
489 and not policy_config.get("create_env_on_driver")
490 ):
491 # Run the `env_creator` function passing the EnvContext.
--> 492 self.env = env_creator(copy.deepcopy(self.env_context))
494 if self.env is not None:
495 # Validate environment (general validation function).
496 if not self._disable_env_checking:
File ~\Anaconda3\lib\site-packages\ray\rllib\algorithms\algorithm.py:2139, in Algorithm._get_env_id_and_creator.<locals>.<lambda>()
2137 return env_id, lambda cfg: _wrapper.remote(cfg)
2138 else:
-> 2139 return env_id, lambda cfg: env_specifier(cfg)
2141 # No env -> Env creator always returns None.
2142 elif env_specifier is None:
TypeError: __init__() takes 1 positional argument but 2 were given
Does anybody know how to resolve it? I am just following the first step of the guide.
I am just try to remove the build() method for PPOConfigure, so I have:
algo = (
PPOConfig()
.rollouts(num_rollout_workers=1)
.resources(num_gpus=0)
.environment(env=Example_v2)
)
And this doesn't produce any error. So the question may be also: is build() method necessary?
I am trying to search other errors like this, but nothing was found.

Getting error when trying to print class definition with inspect.getsource())

I am defining a class:
class MyFirstClass:
pass
After, I am trying to print the definition of MyFirstClass class:
import inspect
print(inspect.getsource(MyFirstClass))
But I am getting error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_22132/2338486789.py in <module>
1 import inspect
----> 2 print(inspect.getsource(MyFirstClass))
C:\ProgramData\Anaconda3\lib\inspect.py in getsource(object)
971 or code object. The source code is returned as a single string. An
972 OSError is raised if the source code cannot be retrieved."""
--> 973 lines, lnum = getsourcelines(object)
974 return ''.join(lines)
975
C:\ProgramData\Anaconda3\lib\inspect.py in getsourcelines(object)
953 raised if the source code cannot be retrieved."""
954 object = unwrap(object)
--> 955 lines, lnum = findsource(object)
956
957 if istraceback(object):
C:\ProgramData\Anaconda3\lib\inspect.py in findsource(object)
766 is raised if the source code cannot be retrieved."""
767
--> 768 file = getsourcefile(object)
769 if file:
770 # Invalidate cache if needed.
C:\ProgramData\Anaconda3\lib\inspect.py in getsourcefile(object)
682 Return None if no way can be identified to get the source.
683 """
--> 684 filename = getfile(object)
685 all_bytecode_suffixes = importlib.machinery.DEBUG_BYTECODE_SUFFIXES[:]
686 all_bytecode_suffixes += importlib.machinery.OPTIMIZED_BYTECODE_SUFFIXES[:]
C:\ProgramData\Anaconda3\lib\inspect.py in getfile(object)
651 if getattr(module, '__file__', None):
652 return module.__file__
--> 653 raise TypeError('{!r} is a built-in class'.format(object))
654 if ismethod(object):
655 object = object.__func__
TypeError: <class '__main__.MyFirstClass'> is a built-in class
I expected oputput is:
class MyFirstClass:
pass
How to correctly use inspect.getsource()) to get my expected output (definition of MyFirstClass class)?

Error when running gridsearchcv with pipeline

I want to create a pipeline structure that contains all the processes in the model training process. After making the relevant libraries and definitions, I created the following structure to experiment. I used telco churn dataset.
ohe_f =["gender","SeniorCitizen","Partner","Dependents","PhoneService","MultipleLines",
"InternetService","OnlineSecurity","OnlineBackup","DeviceProtection","TechSupport",
"StreamingTV","StreamingMovies","Contract","PaperlessBilling","PaymentMethod"]
X_train, X_test, y_train, y_test = train_test_split(X,
y,
test_size=0.2,
stratify=y,
random_state=11)
pipeline = Pipeline(steps = [['smote', SMOTE(random_state=11)],
['scaler', MinMaxScaler()],
['encoder', OneHotEncoder(),ohe_f],
['classifier', LogisticRegression(random_state=11)]])
stratified_kfold = StratifiedKFold(n_splits=3,
shuffle=True,
random_state=11)
param_grid = {'classifier__C':[0.01, 0.1, 1, 10, 100]}
grid_search = GridSearchCV(estimator=pipeline,
param_grid=param_grid,
scoring='roc_auc',
cv=stratified_kfold,
n_jobs=-1)
When I start training the model I get the following error. How can I solve it?
---------------------------------------------------------------------------
_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\burak\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 436, in _process_worker
r = call_item()
File "C:\Users\burak\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 288, in __call__
return self.fn(*self.args, **self.kwargs)
File "C:\Users\burak\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
return self.func(*args, **kwargs)
File "C:\Users\burak\anaconda3\lib\site-packages\joblib\parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "C:\Users\burak\anaconda3\lib\site-packages\joblib\parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "C:\Users\burak\anaconda3\lib\site-packages\sklearn\utils\fixes.py", line 216, in __call__
return self.function(*args, **kwargs)
File "C:\Users\burak\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 668, in _fit_and_score
estimator = estimator.set_params(**cloned_parameters)
File "C:\Users\burak\anaconda3\lib\site-packages\sklearn\pipeline.py", line 188, in set_params
self._set_params("steps", **kwargs)
File "C:\Users\burak\anaconda3\lib\site-packages\sklearn\utils\metaestimators.py", line 54, in _set_params
super().set_params(**params)
File "C:\Users\burak\anaconda3\lib\site-packages\sklearn\base.py", line 239, in set_params
valid_params = self.get_params(deep=True)
File "C:\Users\burak\anaconda3\lib\site-packages\sklearn\pipeline.py", line 167, in get_params
return self._get_params("steps", deep=deep)
File "C:\Users\burak\anaconda3\lib\site-packages\sklearn\utils\metaestimators.py", line 33, in _get_params
out.update(estimators)
ValueError: dictionary update sequence element #2 has length 3; 2 is required
"""
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_1388/1962240236.py in <module>
23 n_jobs=-1)
24
---> 25 grid_search.fit(X_train, y_train)
26 cv_score = grid_search.best_score_
27 test_score = grid_search.score(X_test, y_test)
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
889 return results
890
--> 891 self._run_search(evaluate_candidates)
892
893 # multimetric is determined here because in the case of a callable
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in _run_search(self, evaluate_candidates)
1390 def _run_search(self, evaluate_candidates):
1391 """Search all candidates in param_grid"""
-> 1392 evaluate_candidates(ParameterGrid(self.param_grid))
1393
1394
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in evaluate_candidates(candidate_params, cv, more_results)
836 )
837
--> 838 out = parallel(
839 delayed(_fit_and_score)(
840 clone(base_estimator),
~\anaconda3\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
1054
1055 with self._backend.retrieval_context():
-> 1056 self.retrieve()
1057 # Make sure that we get a last message telling us we are done
1058 elapsed_time = time.time() - self._start_time
~\anaconda3\lib\site-packages\joblib\parallel.py in retrieve(self)
933 try:
934 if getattr(self._backend, 'supports_timeout', False):
--> 935 self._output.extend(job.get(timeout=self.timeout))
936 else:
937 self._output.extend(job.get())
~\anaconda3\lib\site-packages\joblib\_parallel_backends.py in wrap_future_result(future, timeout)
540 AsyncResults.get from multiprocessing."""
541 try:
--> 542 return future.result(timeout=timeout)
543 except CfTimeoutError as e:
544 raise TimeoutError from e
~\anaconda3\lib\concurrent\futures\_base.py in result(self, timeout)
443 raise CancelledError()
444 elif self._state == FINISHED:
--> 445 return self.__get_result()
446 else:
447 raise TimeoutError()
~\anaconda3\lib\concurrent\futures\_base.py in __get_result(self)
388 if self._exception:
389 try:
--> 390 raise self._exception
391 finally:
392 # Break a reference cycle with the exception in self._exception
ValueError: dictionary update sequence element #2 has length 3; 2 is required
Your need to split your pipeline into 2 parts : one to process the numeric features (with the min max scaler) and another one to process categorical features (with the one hot encoder). You can use the class ColumnTransformer from scikit-learn : https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html

RandomizedSearchCV: All estimators failed to fit

I am currently working on the "French Motor Claims Datasets freMTPL2freq" Kaggle competition (https://www.kaggle.com/floser/french-motor-claims-datasets-fremtpl2freq). Unfortunately I get a "NotFittedError: All estimators failed to fit" error whenever I am using RandomizedSearchCV and I cannot figure out why that is.
Any help is much appreciated.
import numpy as np
import statsmodels.api as sm
import scipy.stats as stats
from matplotlib import pyplot as plt
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import KBinsDiscretizer
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import mean_poisson_deviance
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import VotingRegressor
from sklearn.ensemble import StackingRegressor
from sklearn.metrics import mean_gamma_deviance
from sklearn.metrics import mean_squared_error
from xgboost import XGBRegressor
data_freq = pd.read_csv('freMTPL2freq.csv')
data_freq['Area'] = data_freq['Area'].str.replace('\'','')
data_freq['VehBrand'] = data_freq['VehBrand'].str.replace('\'','')
data_freq['VehGas'] = data_freq['VehGas'].str.replace('\'','')
data_freq['Region'] = data_freq['Region'].str.replace('\'','')
data_freq['frequency'] = data_freq['ClaimNb'] / data_freq['Exposure']
y = data_freq['frequency']
X = data_freq.drop(['frequency', 'ClaimNb', 'IDpol'], axis = 1)
X_train, X_val, y_train, y_val = train_test_split(X,y, test_size=0.2, shuffle = True, random_state = 42)
pt_columns = ['VehPower', 'VehAge', 'DrivAge', 'BonusMalus', 'Density']
cat_columns = ['Area', 'Region', 'VehBrand', 'VehGas']
from xgboost import XGBRegressor
ct = ColumnTransformer([('pt', 'passthrough', pt_columns),
('ohe', OneHotEncoder(), cat_columns)])
pipe_xgbr = Pipeline([('cf_trans', ct),
('ssc', StandardScaler(with_mean = False)),
('xgb_regressor', XGBRegressor())
])
param = {'xgb_regressor__n_estimators':[3, 5],
'xgb_regressor__max_depth':[3, 5, 7],
'xgb_regressor__learning_rate':[0.1, 0.5],
'xgb_regressor__colsample_bytree':[0.5, 0.8],
'xgb_regressor__subsample':[0.5, 0.8]
}
rscv = RandomizedSearchCV(pipe_xgbr, param_distributions = param, n_iter = 2, scoring = mean_squared_error, n_jobs = -1, cv = 5, error_score = 'raise')
rscv.fit(X_train, y_train, xgbr_regressor__sample_weight = X_train['Exposure'])
The first five rows of the original dataframe data_freq look like this:
IDpol ClaimNb Exposure Area VehPower VehAge DrivAge BonusMalus VehBrand VehGas Density Region
0 1.0 1 0.10 D 5 0 55 50 B12 Regular 1217 R82
1 3.0 1 0.77 D 5 0 55 50 B12 Regular 1217 R82
2 5.0 1 0.75 B 6 2 52 50 B12 Diesel 54 R22
3 10.0 1 0.09 B 7 0 46 50 B12 Diesel 76 R72
4 11.0 1 0.84 B 7 0 46 50 B12 Diesel 76 R72
The error I get is as follows:
---------------------------------------------------------------------------
_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\Jan\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 418, in _process_worker
r = call_item()
File "C:\Users\Jan\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 272, in __call__
return self.fn(*self.args, **self.kwargs)
File "C:\Users\Jan\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 608, in __call__
return self.func(*args, **kwargs)
File "C:\Users\Jan\anaconda3\lib\site-packages\joblib\parallel.py", line 256, in __call__
for func, args, kwargs in self.items]
File "C:\Users\Jan\anaconda3\lib\site-packages\joblib\parallel.py", line 256, in <listcomp>
for func, args, kwargs in self.items]
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\utils\fixes.py", line 222, in __call__
return self.function(*args, **kwargs)
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 598, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\pipeline.py", line 340, in fit
fit_params_steps = self._check_fit_params(**fit_params)
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\pipeline.py", line 261, in _check_fit_params
fit_params_steps[step][param] = pval
KeyError: 'xgbr_regressor'
"""
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-68-0c1886d1e985> in <module>
----> 1 rscv.fit(X_train, y_train, xgbr_regressor__sample_weight = X_train['Exposure'])
2 #pipe_xgbr.fit(X_train, y_train)
3 #X_train.describe(include = 'all')
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
839 return results
840
--> 841 self._run_search(evaluate_candidates)
842
843 # multimetric is determined here because in the case of a callable
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in _run_search(self, evaluate_candidates)
1633 evaluate_candidates(ParameterSampler(
1634 self.param_distributions, self.n_iter,
-> 1635 random_state=self.random_state))
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in evaluate_candidates(candidate_params, cv, more_results)
807 (split_idx, (train, test)) in product(
808 enumerate(candidate_params),
--> 809 enumerate(cv.split(X, y, groups))))
810
811 if len(out) < 1:
~\anaconda3\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
1015
1016 with self._backend.retrieval_context():
-> 1017 self.retrieve()
1018 # Make sure that we get a last message telling us we are done
1019 elapsed_time = time.time() - self._start_time
~\anaconda3\lib\site-packages\joblib\parallel.py in retrieve(self)
907 try:
908 if getattr(self._backend, 'supports_timeout', False):
--> 909 self._output.extend(job.get(timeout=self.timeout))
910 else:
911 self._output.extend(job.get())
~\anaconda3\lib\site-packages\joblib\_parallel_backends.py in wrap_future_result(future, timeout)
560 AsyncResults.get from multiprocessing."""
561 try:
--> 562 return future.result(timeout=timeout)
563 except LokyTimeoutError:
564 raise TimeoutError()
~\anaconda3\lib\concurrent\futures\_base.py in result(self, timeout)
433 raise CancelledError()
434 elif self._state == FINISHED:
--> 435 return self.__get_result()
436 else:
437 raise TimeoutError()
~\anaconda3\lib\concurrent\futures\_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
--> 384 raise self._exception
385 else:
386 return self._result
KeyError: 'xgbr_regressor'
I also tried running fit without the sample_weight parameter. In this case the error changes to:
---------------------------------------------------------------------------
_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\Jan\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 418, in _process_worker
r = call_item()
File "C:\Users\Jan\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 272, in __call__
return self.fn(*self.args, **self.kwargs)
File "C:\Users\Jan\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 608, in __call__
return self.func(*args, **kwargs)
File "C:\Users\Jan\anaconda3\lib\site-packages\joblib\parallel.py", line 256, in __call__
for func, args, kwargs in self.items]
File "C:\Users\Jan\anaconda3\lib\site-packages\joblib\parallel.py", line 256, in <listcomp>
for func, args, kwargs in self.items]
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\utils\fixes.py", line 222, in __call__
return self.function(*args, **kwargs)
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 625, in _fit_and_score
test_scores = _score(estimator, X_test, y_test, scorer, error_score)
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
scores = scorer(estimator, X_test, y_test)
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 74, in inner_f
return f(**kwargs)
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\metrics\_regression.py", line 336, in mean_squared_error
y_true, y_pred, multioutput)
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\metrics\_regression.py", line 88, in _check_reg_targets
check_consistent_length(y_true, y_pred)
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 316, in check_consistent_length
lengths = [_num_samples(X) for X in arrays if X is not None]
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 316, in <listcomp>
lengths = [_num_samples(X) for X in arrays if X is not None]
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 249, in _num_samples
raise TypeError(message)
TypeError: Expected sequence or array-like, got <class 'sklearn.pipeline.Pipeline'>
"""
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
<ipython-input-69-a9be9cc5df4a> in <module>
----> 1 rscv.fit(X_train, y_train)#, xgbr_regressor__sample_weight = X_train['Exposure'])
2 #pipe_xgbr.fit(X_train, y_train)
3 #X_train.describe(include = 'all')
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
839 return results
840
--> 841 self._run_search(evaluate_candidates)
842
843 # multimetric is determined here because in the case of a callable
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in _run_search(self, evaluate_candidates)
1633 evaluate_candidates(ParameterSampler(
1634 self.param_distributions, self.n_iter,
-> 1635 random_state=self.random_state))
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in evaluate_candidates(candidate_params, cv, more_results)
807 (split_idx, (train, test)) in product(
808 enumerate(candidate_params),
--> 809 enumerate(cv.split(X, y, groups))))
810
811 if len(out) < 1:
~\anaconda3\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
1015
1016 with self._backend.retrieval_context():
-> 1017 self.retrieve()
1018 # Make sure that we get a last message telling us we are done
1019 elapsed_time = time.time() - self._start_time
~\anaconda3\lib\site-packages\joblib\parallel.py in retrieve(self)
907 try:
908 if getattr(self._backend, 'supports_timeout', False):
--> 909 self._output.extend(job.get(timeout=self.timeout))
910 else:
911 self._output.extend(job.get())
~\anaconda3\lib\site-packages\joblib\_parallel_backends.py in wrap_future_result(future, timeout)
560 AsyncResults.get from multiprocessing."""
561 try:
--> 562 return future.result(timeout=timeout)
563 except LokyTimeoutError:
564 raise TimeoutError()
~\anaconda3\lib\concurrent\futures\_base.py in result(self, timeout)
433 raise CancelledError()
434 elif self._state == FINISHED:
--> 435 return self.__get_result()
436 else:
437 raise TimeoutError()
~\anaconda3\lib\concurrent\futures\_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
--> 384 raise self._exception
385 else:
386 return self._result
TypeError: Expected sequence or array-like, got <class 'sklearn.pipeline.Pipeline'>
When setting verbose = 10 and n_jobs = 1 the following error message shows up:
Fitting 5 folds for each of 2 candidates, totalling 10 fits
[CV 1/5; 1/2] START xgb_regressor__colsample_bytree=0.5, xgb_regressor__learning_rate=0.5, xgb_regressor__max_depth=5, xgb_regressor__n_estimators=5, xgb_regressor__subsample=0.5
C:\Users\Jan\anaconda3\lib\site-packages\sklearn\utils\validation.py:72: FutureWarning: Pass sample_weight=406477 1.0
393150 0.0
252885 0.0
260652 0.0
661256 0.0
...
154663 0.0
398414 0.0
42890 0.0
640774 0.0
114446 0.0
Name: frequency, Length: 108482, dtype: float64 as keyword args. From version 1.0 (renaming of 0.25) passing these as positional arguments will result in an error
"will result in an error", FutureWarning)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-84-74435f74c470> in <module>
----> 1 rscv.fit(X_train, y_train, xgb_regressor__sample_weight = X_train['Exposure'])
2 #pipe_xgbr.fit(X_train, y_train)
3 #X_train.describe(include = 'all')
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
839 return results
840
--> 841 self._run_search(evaluate_candidates)
842
843 # multimetric is determined here because in the case of a callable
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in _run_search(self, evaluate_candidates)
1633 evaluate_candidates(ParameterSampler(
1634 self.param_distributions, self.n_iter,
-> 1635 random_state=self.random_state))
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in evaluate_candidates(candidate_params, cv, more_results)
807 (split_idx, (train, test)) in product(
808 enumerate(candidate_params),
--> 809 enumerate(cv.split(X, y, groups))))
810
811 if len(out) < 1:
~\anaconda3\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
1002 # remaining jobs.
1003 self._iterating = False
-> 1004 if self.dispatch_one_batch(iterator):
1005 self._iterating = self._original_iterator is not None
1006
~\anaconda3\lib\site-packages\joblib\parallel.py in dispatch_one_batch(self, iterator)
833 return False
834 else:
--> 835 self._dispatch(tasks)
836 return True
837
~\anaconda3\lib\site-packages\joblib\parallel.py in _dispatch(self, batch)
752 with self._lock:
753 job_idx = len(self._jobs)
--> 754 job = self._backend.apply_async(batch, callback=cb)
755 # A job can complete so quickly than its callback is
756 # called before we get here, causing self._jobs to
~\anaconda3\lib\site-packages\joblib\_parallel_backends.py in apply_async(self, func, callback)
207 def apply_async(self, func, callback=None):
208 """Schedule a func to be run"""
--> 209 result = ImmediateResult(func)
210 if callback:
211 callback(result)
~\anaconda3\lib\site-packages\joblib\_parallel_backends.py in __init__(self, batch)
588 # Don't delay the application, to avoid keeping the input
589 # arguments in memory
--> 590 self.results = batch()
591
592 def get(self):
~\anaconda3\lib\site-packages\joblib\parallel.py in __call__(self)
254 with parallel_backend(self._backend, n_jobs=self._n_jobs):
255 return [func(*args, **kwargs)
--> 256 for func, args, kwargs in self.items]
257
258 def __len__(self):
~\anaconda3\lib\site-packages\joblib\parallel.py in <listcomp>(.0)
254 with parallel_backend(self._backend, n_jobs=self._n_jobs):
255 return [func(*args, **kwargs)
--> 256 for func, args, kwargs in self.items]
257
258 def __len__(self):
~\anaconda3\lib\site-packages\sklearn\utils\fixes.py in __call__(self, *args, **kwargs)
220 def __call__(self, *args, **kwargs):
221 with config_context(**self.config):
--> 222 return self.function(*args, **kwargs)
~\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, return_estimator, split_progress, candidate_progress, error_score)
623
624 fit_time = time.time() - start_time
--> 625 test_scores = _score(estimator, X_test, y_test, scorer, error_score)
626 score_time = time.time() - start_time - fit_time
627 if return_train_score:
~\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py in _score(estimator, X_test, y_test, scorer, error_score)
685 scores = scorer(estimator, X_test)
686 else:
--> 687 scores = scorer(estimator, X_test, y_test)
688 except Exception:
689 if error_score == 'raise':
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
72 "will result in an error", FutureWarning)
73 kwargs.update(zip(sig.parameters, args))
---> 74 return f(**kwargs)
75 return inner_f
76
~\anaconda3\lib\site-packages\sklearn\metrics\_regression.py in mean_squared_error(y_true, y_pred, sample_weight, multioutput, squared)
334 """
335 y_type, y_true, y_pred, multioutput = _check_reg_targets(
--> 336 y_true, y_pred, multioutput)
337 check_consistent_length(y_true, y_pred, sample_weight)
338 output_errors = np.average((y_true - y_pred) ** 2, axis=0,
~\anaconda3\lib\site-packages\sklearn\metrics\_regression.py in _check_reg_targets(y_true, y_pred, multioutput, dtype)
86 the dtype argument passed to check_array.
87 """
---> 88 check_consistent_length(y_true, y_pred)
89 y_true = check_array(y_true, ensure_2d=False, dtype=dtype)
90 y_pred = check_array(y_pred, ensure_2d=False, dtype=dtype)
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
314 """
315
--> 316 lengths = [_num_samples(X) for X in arrays if X is not None]
317 uniques = np.unique(lengths)
318 if len(uniques) > 1:
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in <listcomp>(.0)
314 """
315
--> 316 lengths = [_num_samples(X) for X in arrays if X is not None]
317 uniques = np.unique(lengths)
318 if len(uniques) > 1:
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in _num_samples(x)
247 if hasattr(x, 'fit') and callable(x.fit):
248 # Don't get num_samples from an ensembles length!
--> 249 raise TypeError(message)
250
251 if not hasattr(x, '__len__') and not hasattr(x, 'shape'):
TypeError: Expected sequence or array-like, got <class 'sklearn.pipeline.Pipeline'>
Wow, that was a mess of a traceback, but I think I've finally found it. You set scoring=mean_squared_error, and should instead use scoring="neg_mean_squared_error".
The metric function mean_squared_error has signature (y_true, y_pred, *, <kwargs>), whereas the scorer obtained by using the string "neg_mean_squared_error" has signature (estimator, X_test, y_test). So in the traceback, where you see
--> 687 scores = scorer(estimator, X_test, y_test)
it is calling mean_squared_error with y_true=estimator, y_test=X_test, and sample_weight=y_test (the first kwarg, and hence the FutureWarning about specifying keyword arguments as positional). Going deeper into the traceback, we see a check that the shapes of y_true and y_pred are compatible, but it thinks the former is your pipeline object (and hence the final error message)!
According to your error message, KeyError: 'xgbr_regressor' the code cant find the key xgbr_regressor in your Pipeline. In your pipeline, you have defined the xgb_regressor:
pipe_xgbr = Pipeline(
[('cf_trans', ct),
('ssc', StandardScaler(with_mean = False)),
('xgb_regressor', XGBRegressor())])
But when you try to fit, you call it with a reference to xgbr_regressor which is why the KeyError is thrown:
rscv.fit(X_train, y_train, xgbr_regressor__sample_weight=X_train['Exposure'])
Therefore, you must change the above line to swap out xgbr_regressor__sample_weight to xgb_regressor__sample_weight and this should eliminate that error.

"Error while extracting" from tensorflow datasets

I want to train a tensorflow image segmentation model on COCO, and thought I would leverage the dataset builder already included. Download seems to be completed but it crashes on extracting the zip files.
Running with TF 2.0.0 on a Jupyter Notebook under a conda environment. Computer is 64-bit Windows 10. The Oxford Pet III dataset used in the official image segmentation tutorial works fine.
Below is the error message (my local user name replaced with %user%).
---------------------------------------------------------------------------
OutOfRangeError Traceback (most recent call last)
~\.conda\envs\tf-tutorial\lib\site-packages\tensorflow_datasets\core\download\extractor.py in _sync_extract(self, from_path, method, to_path)
88 try:
---> 89 for path, handle in iter_archive(from_path, method):
90 path = tf.compat.as_text(path)
~\.conda\envs\tf-tutorial\lib\site-packages\tensorflow_datasets\core\download\extractor.py in iter_zip(arch_f)
176 with _open_or_pass(arch_f) as fobj:
--> 177 z = zipfile.ZipFile(fobj)
178 for member in z.infolist():
~\.conda\envs\tf-tutorial\lib\zipfile.py in __init__(self, file, mode, compression, allowZip64)
1130 if mode == 'r':
-> 1131 self._RealGetContents()
1132 elif mode in ('w', 'x'):
~\.conda\envs\tf-tutorial\lib\zipfile.py in _RealGetContents(self)
1193 try:
-> 1194 endrec = _EndRecData(fp)
1195 except OSError:
~\.conda\envs\tf-tutorial\lib\zipfile.py in _EndRecData(fpin)
263 # Determine file size
--> 264 fpin.seek(0, 2)
265 filesize = fpin.tell()
~\.conda\envs\tf-tutorial\lib\site-packages\tensorflow_core\python\util\deprecation.py in new_func(*args, **kwargs)
506 instructions)
--> 507 return func(*args, **kwargs)
508
~\.conda\envs\tf-tutorial\lib\site-packages\tensorflow_core\python\lib\io\file_io.py in seek(self, offset, whence, position)
166 elif whence == 2:
--> 167 offset += self.size()
168 else:
~\.conda\envs\tf-tutorial\lib\site-packages\tensorflow_core\python\lib\io\file_io.py in size(self)
101 """Returns the size of the file."""
--> 102 return stat(self.__name).length
103
~\.conda\envs\tf-tutorial\lib\site-packages\tensorflow_core\python\lib\io\file_io.py in stat(filename)
726 """
--> 727 return stat_v2(filename)
728
~\.conda\envs\tf-tutorial\lib\site-packages\tensorflow_core\python\lib\io\file_io.py in stat_v2(path)
743 file_statistics = pywrap_tensorflow.FileStatistics()
--> 744 pywrap_tensorflow.Stat(compat.as_bytes(path), file_statistics)
745 return file_statistics
OutOfRangeError: C:\Users\%user%\tensorflow_datasets\downloads\images.cocodataset.org_zips_train20147eQIfmQL3bpVDgkOrnAQklNLVUtCsFrDPwMAuYSzF3U.zip; Unknown error
During handling of the above exception, another exception occurred:
ExtractError Traceback (most recent call last)
<ipython-input-27-887fa0198611> in <module>
1 cocoBuilder = tfds.builder('coco')
2 info = cocoBuilder.info
----> 3 cocoBuilder.download_and_prepare()
~\.conda\envs\tf-tutorial\lib\site-packages\tensorflow_datasets\core\api_utils.py in disallow_positional_args_dec(fn, instance, args, kwargs)
50 _check_no_positional(fn, args, ismethod, allowed=allowed)
51 _check_required(fn, kwargs)
---> 52 return fn(*args, **kwargs)
53
54 return disallow_positional_args_dec(wrapped) # pylint: disable=no-value-for-parameter
~\.conda\envs\tf-tutorial\lib\site-packages\tensorflow_datasets\core\dataset_builder.py in download_and_prepare(self, download_dir, download_config)
285 self._download_and_prepare(
286 dl_manager=dl_manager,
--> 287 download_config=download_config)
288
289 # NOTE: If modifying the lines below to put additional information in
~\.conda\envs\tf-tutorial\lib\site-packages\tensorflow_datasets\core\dataset_builder.py in _download_and_prepare(self, dl_manager, download_config)
946 super(GeneratorBasedBuilder, self)._download_and_prepare(
947 dl_manager=dl_manager,
--> 948 max_examples_per_split=download_config.max_examples_per_split,
949 )
950
~\.conda\envs\tf-tutorial\lib\site-packages\tensorflow_datasets\core\dataset_builder.py in _download_and_prepare(self, dl_manager, **prepare_split_kwargs)
802 # Generating data for all splits
803 split_dict = splits_lib.SplitDict()
--> 804 for split_generator in self._split_generators(dl_manager):
805 if splits_lib.Split.ALL == split_generator.split_info.name:
806 raise ValueError(
~\.conda\envs\tf-tutorial\lib\site-packages\tensorflow_datasets\image\coco.py in _split_generators(self, dl_manager)
237 root_url = 'http://images.cocodataset.org/'
238 extracted_paths = dl_manager.download_and_extract({
--> 239 key: root_url + url for key, url in urls.items()
240 })
241
~\.conda\envs\tf-tutorial\lib\site-packages\tensorflow_datasets\core\download\download_manager.py in download_and_extract(self, url_or_urls)
357 with self._downloader.tqdm():
358 with self._extractor.tqdm():
--> 359 return _map_promise(self._download_extract, url_or_urls)
360
361 #property
~\.conda\envs\tf-tutorial\lib\site-packages\tensorflow_datasets\core\download\download_manager.py in _map_promise(map_fn, all_inputs)
393 """Map the function into each element and resolve the promise."""
394 all_promises = utils.map_nested(map_fn, all_inputs) # Apply the function
--> 395 res = utils.map_nested(_wait_on_promise, all_promises)
396 return res
~\.conda\envs\tf-tutorial\lib\site-packages\tensorflow_datasets\core\utils\py_utils.py in map_nested(function, data_struct, dict_only, map_tuple)
127 return {
128 k: map_nested(function, v, dict_only, map_tuple)
--> 129 for k, v in data_struct.items()
130 }
131 elif not dict_only:
~\.conda\envs\tf-tutorial\lib\site-packages\tensorflow_datasets\core\utils\py_utils.py in <dictcomp>(.0)
127 return {
128 k: map_nested(function, v, dict_only, map_tuple)
--> 129 for k, v in data_struct.items()
130 }
131 elif not dict_only:
~\.conda\envs\tf-tutorial\lib\site-packages\tensorflow_datasets\core\utils\py_utils.py in map_nested(function, data_struct, dict_only, map_tuple)
141 return tuple(mapped)
142 # Singleton
--> 143 return function(data_struct)
144
145
~\.conda\envs\tf-tutorial\lib\site-packages\tensorflow_datasets\core\download\download_manager.py in _wait_on_promise(p)
377
378 def _wait_on_promise(p):
--> 379 return p.get()
380
381 else:
~\.conda\envs\tf-tutorial\lib\site-packages\promise\promise.py in get(self, timeout)
508 target = self._target()
509 self._wait(timeout or DEFAULT_TIMEOUT)
--> 510 return self._target_settled_value(_raise=True)
511
512 def _target_settled_value(self, _raise=False):
~\.conda\envs\tf-tutorial\lib\site-packages\promise\promise.py in _target_settled_value(self, _raise)
512 def _target_settled_value(self, _raise=False):
513 # type: (bool) -> Any
--> 514 return self._target()._settled_value(_raise)
515
516 _value = _reason = _target_settled_value
~\.conda\envs\tf-tutorial\lib\site-packages\promise\promise.py in _settled_value(self, _raise)
222 if _raise:
223 raise_val = self._fulfillment_handler0
--> 224 reraise(type(raise_val), raise_val, self._traceback)
225 return self._fulfillment_handler0
226
~\.conda\envs\tf-tutorial\lib\site-packages\six.py in reraise(tp, value, tb)
694 if value.__traceback__ is not tb:
695 raise value.with_traceback(tb)
--> 696 raise value
697 finally:
698 value = None
~\.conda\envs\tf-tutorial\lib\site-packages\promise\promise.py in handle_future_result(future)
840 # type: (Any) -> None
841 try:
--> 842 resolve(future.result())
843 except Exception as e:
844 tb = exc_info()[2]
~\.conda\envs\tf-tutorial\lib\concurrent\futures\_base.py in result(self, timeout)
423 raise CancelledError()
424 elif self._state == FINISHED:
--> 425 return self.__get_result()
426
427 self._condition.wait(timeout)
~\.conda\envs\tf-tutorial\lib\concurrent\futures\_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
--> 384 raise self._exception
385 else:
386 return self._result
~\.conda\envs\tf-tutorial\lib\concurrent\futures\thread.py in run(self)
54
55 try:
---> 56 result = self.fn(*self.args, **self.kwargs)
57 except BaseException as exc:
58 self.future.set_exception(exc)
~\.conda\envs\tf-tutorial\lib\site-packages\tensorflow_datasets\core\download\extractor.py in _sync_extract(self, from_path, method, to_path)
92 except BaseException as err:
93 msg = 'Error while extracting %s to %s : %s' % (from_path, to_path, err)
---> 94 raise ExtractError(msg)
95 # `tf.io.gfile.Rename(overwrite=True)` doesn't work for non empty
96 # directories, so delete destination first, if it already exists.
ExtractError: Error while extracting C:\Users\%user%\tensorflow_datasets\downloads\images.cocodataset.org_zips_train20147eQIfmQL3bpVDgkOrnAQklNLVUtCsFrDPwMAuYSzF3U.zip to C:\Users\%user%\tensorflow_datasets\downloads\extracted\ZIP.images.cocodataset.org_zips_train20147eQIfmQL3bpVDgkOrnAQklNLVUtCsFrDPwMAuYSzF3U.zip : C:\Users\%user%\tensorflow_datasets\downloads\images.cocodataset.org_zips_train20147eQIfmQL3bpVDgkOrnAQklNLVUtCsFrDPwMAuYSzF3U.zip; Unknown error
The message seems cryptic to me. The folder to which it is trying to extract does not exist when the notebook is started - it is created by Tensorflow, and only at that command line. I obviously tried deleting it completely and running it again, to no effect.
The code that leads to the error is (everything runs fine until the last line):
import tensorflow as tf
from __future__ import absolute_import, division, print_function, unicode_literals
from tensorflow_examples.models.pix2pix import pix2pix
import tensorflow_datasets as tfds
from IPython.display import clear_output
import matplotlib.pyplot as plt
dataset, info = tfds.load('coco', with_info=True)
Also tried breaking down the last command into assigning the tdfs.builder object and then running download_and_extract, and again got the same error.
There is enough space in disk - after download, still 50+GB available, while the dataset is supposed to be 37GB in its largest version (2014).
I have a similar problem with Windows 10 & COCO 2017. My solution is simple. Extract the ZIP file manually according to the folder path in the error message.

Categories

Resources