How can I use CUDA with vaex (a Python library) - python

my code as follow:
df['O_ID'] = (df.apply(get_match_id, arguments=[df['pickup_longitude'], df['pickup_latitude']])).jit_cuda()
When first I used this function——jit_cuda(),there was an error "No Module named cupy"
But, when I have installed the cupy-cuda101(Adapt to my CUDA version)
I get a new error
Traceback (most recent call last):
File "F:\Anaconda3\lib\site-packages\vaex\dataframe.py", line 3580, in table_part
values[name] = df.evaluate(name)
File "F:\Anaconda3\lib\site-packages\vaex\dataframe.py", line 2616, in evaluate
return self._evaluate_implementation(expression, i1=i1, i2=i2, out=out, selection=selection, filtered=filtered, internal=internal, parallel=parallel, chunk_size=chunk_size)
File "F:\Anaconda3\lib\site-packages\vaex\dataframe.py", line 5352, in _evaluate_implementation
dtypes[expression] = df.data_type(expression, internal=False)
File "F:\Anaconda3\lib\site-packages\vaex\dataframe.py", line 1998, in data_type
data = self.evaluate(expression, 0, 1, filtered=True, internal=True, parallel=False)
File "F:\Anaconda3\lib\site-packages\vaex\dataframe.py", line 2616, in evaluate
return self._evaluate_implementation(expression, i1=i1, i2=i2, out=out, selection=selection, filtered=filtered, internal=internal, parallel=parallel, chunk_size=chunk_size)
File "F:\Anaconda3\lib\site-packages\vaex\dataframe.py", line 5427, in _evaluate_implementation
value = scope.evaluate(expression)
File "F:\Anaconda3\lib\site-packages\vaex\scopes.py", line 97, in evaluate
result = self[expression]
File "F:\Anaconda3\lib\site-packages\vaex\scopes.py", line 139, in __getitem__
self.values[variable] = self.evaluate(expression) # , out=self.buffers[variable])
File "F:\Anaconda3\lib\site-packages\vaex\scopes.py", line 103, in evaluate
result = eval(expression, expression_namespace, self)
File "<string>", line 1, in <module>
File "F:\Anaconda3\lib\site-packages\vaex\expression.py", line 1073, in __call__
return self.f(*args, **kwargs)
File "F:\Anaconda3\lib\site-packages\vaex\expression.py", line 1120, in wrapper
return cupy.asnumpy(func(*args))
File "cupy\core\fusion.pyx", line 905, in cupy.core.fusion.Fusion.__call__
File "cupy\core\fusion.pyx", line 754, in cupy.core.fusion._FusionHistory.get_fusion
File "<string>", line 6, in f
NameError: name 'lambda_function_1' is not defined
How should I solve it?

My understanding is that just-in-time compilation in vaex works only for virtual columns, or expressions/columns computed mainly with various arithmetic operations using numpy methods or pure python arithmetics.
When using apply, a function can be quite abstract, basically whatever you want, so it may not be possible for it to be compiled.
If you can rewrite your .apply function using numpy expressions, then you are likely able to use the jit_cuda method to accelerate it. Vaex does not recommend using .apply anyway, since it is hard to parallelize and should be used a "last resort" of sorts.
Source: https://vaex.io/docs/tutorial.html#Just-In-Time-compilation

Related

Can pickle/dill `foo` but not `lambda x: foo(x)`

I am applying some preprocessing to the CIFAR100 dataset
from datasets.load import load_dataset
from datasets import Features, Array3D
from transformers.models.vit.feature_extraction_vit import ViTFeatureExtractor
# Resampling & Normalization
feature_extractor = ViTFeatureExtractor.from_pretrained('google/vit-base-patch16-224-in21k')
dataset = load_dataset('cifar100', split='train[:100]')
features = Features({
'pixel_values': Array3D(dtype="float32", shape=(3, 224, 224)),
**dataset.features,
})
dataset = dataset.map(lambda batch, col_name: feature_extractor(batch[col_name]),
features=features, fn_kwargs={'col_name': 'img'}, batched=True)
I got the following warning, which means datasets cannot cache the transformed dataset.
Reusing dataset cifar100 (/home/qys/.cache/huggingface/datasets/cifar100/cifar100/1.0.0/f365c8b725c23e8f0f8d725c3641234d9331cd2f62919d1381d1baa5b3ba3142)
Parameter 'function'=<function <lambda> at 0x7f3279f3eef0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
Curiously, I can pickle/dill foo, but not lambda x: foo(x), despite the fact that they have exactly the same effect. I guess that's related to the problem?
>>> def foo(x): return x + 1
...
>>> Hasher.hash(foo)
'ff7fae499aa1d820'
>>> Hasher.hash(lambda x: foo(x))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/datasets/fingerprint.py", line 237, in hash
return cls.hash_default(value)
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/datasets/fingerprint.py", line 230, in hash_default
return cls.hash_bytes(dumps(value))
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 564, in dumps
dump(obj, file)
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 539, in dump
Pickler(file, recurse=True).dump(obj)
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/dill/_dill.py", line 620, in dump
StockPickler.dump(self, obj)
File "/home/qys/.pyenv/versions/3.10.4/lib/python3.10/pickle.py", line 487, in dump
self.save(obj)
File "/home/qys/.pyenv/versions/3.10.4/lib/python3.10/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 862, in save_function
dill._dill._save_with_postproc(
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/dill/_dill.py", line 1153, in _save_with_postproc
pickler.write(pickler.get(pickler.memo[id(dest)][0]))
KeyError: 139847629663936
I have also tried making the function accessible from the top level of a module, i.e.
preprocessor = lambda batch: feature_extractor(batch['img'])
dataset = dataset.map(preprocessor, features=features, batched=True)
However, it still doesn't work
>>> from datasets.fingerprint import Hasher
>>> preprocessor = lambda batch: feature_extractor(batch['img'])
>>> Hasher.hash(preprocessor)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/datasets/fingerprint.py", line 237, in hash
return cls.hash_default(value)
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/datasets/fingerprint.py", line 230, in hash_default
return cls.hash_bytes(dumps(value))
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 564, in dumps
dump(obj, file)
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 539, in dump
Pickler(file, recurse=True).dump(obj)
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/dill/_dill.py", line 620, in dump
StockPickler.dump(self, obj)
File "/home/qys/.pyenv/versions/3.10.4/lib/python3.10/pickle.py", line 487, in dump
self.save(obj)
File "/home/qys/.pyenv/versions/3.10.4/lib/python3.10/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 862, in save_function
dill._dill._save_with_postproc(
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/dill/_dill.py", line 1153, in _save_with_postproc
pickler.write(pickler.get(pickler.memo[id(dest)][0]))
KeyError: 140408024252096
In Python 3.9, pickle hashes the glob_ids dictionary in addition to the globs of a function. To make hashing deterministic when the globals are not in the same order, the order of glob_ids needs to be made deterministic. PR to fix: https://github.com/huggingface/datasets/pull/4516
(Until merged, a temporary fix is to use an older version of dill:
pip install "dill<0.3.5"
see https://github.com/huggingface/datasets/issues/4506#issuecomment-1157417219)

Multiprocessing; How to debug: _pickle.PicklingError: Could not pickle object as excessively deep recursion required

I have a simulation which I can run using Python code and want to create multiple instances of it using a SubProcVecEnv from stable-baselines3. This uses subprocessing to run the simulations on different cores and it was working before I made a number of changes to my code. However, now I receive the error below and do not know how to debug it, because I don't understand which part of my code is causing it. Is there a way to find out which object/ method is causing the recursion depth being exceeded? I also do not remember writing a recursive method anywhere in my code. Researching the error message was not successful.
/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
Traceback (most recent call last):
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 563, in dump
return Pickler.dump(self, obj)
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 639, in reducer_override
if sys.version_info[:2] < (3, 7) and _is_parametrized_type_hint(obj): # noqa # pragma: no branch
RecursionError: maximum recursion depth exceeded in comparison
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/philipp/Code/ba_pw/train.py", line 84, in <module>
venv = utils.make_venv(env_class, network, params, remote_ports, monitor_log_dir)
File "/home/philipp/Code/ba_pw/sumo_rl/utils/utils.py", line 170, in make_venv
return vec_env.SubprocVecEnv(env_fs)
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 106, in __init__
process.start()
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/multiprocessing/context.py", line 291, in _Popen
return Popen(process_obj)
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/multiprocessing/popen_forkserver.py", line 35, in __init__
super().__init__(process_obj)
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/multiprocessing/popen_forkserver.py", line 47, in _launch
reduction.dump(process_obj, buf)
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 372, in __getstate__
return cloudpickle.dumps(self.var)
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 570, in dump
raise pickle.PicklingError(msg) from e
_pickle.PicklingError: Could not pickle object as excessively deep recursion required.
I finally figured out a solution using the answer to this question:
I looks like the object which I want to pickle has too many layers. I called:
sys.setrecursionlimit(3000)
and now it works.

Searching for a string with Web.py

I'm trying to build a python function with web.py and SQLite that will allow users to search for a given string within a description field and will return all matching results.
Right now I've gotten to the below function, which works but only if the input is an exact match.
def getItem(params, max_display):
query_string = 'SELECT * FROM items WHERE 1=1'
description = params['description']
if params['description']:
query_string = query_string + ' AND description LIKE $description'
result = query(query_string, {
'description': params['description']
I've tried to implement this feature with LIKE "%$description%"' , however I keep getting the below web.py error.
Traceback (most recent call last):
File "lib/web/wsgiserver/__init__.py", line 1245, in communicate
req.respond()
File "lib/web/wsgiserver/__init__.py", line 775, in respond
self.server.gateway(self).respond()
File "lib/web/wsgiserver/__init__.py", line 2018, in respond
response = self.req.server.wsgi_app(self.env, self.start_response)
File "lib/web/httpserver.py", line 306, in __call__
return self.app(environ, xstart_response)
File "lib/web/httpserver.py", line 274, in __call__
return self.app(environ, start_response)
File "lib/web/application.py", line 279, in wsgi
result = self.handle_with_processors()
File "lib/web/application.py", line 249, in handle_with_processors
return process(self.processors)
File "lib/web/application.py", line 246, in process
raise self.internalerror()
File "lib/web/application.py", line 478, in internalerror
return debugerror.debugerror()
File "lib/web/debugerror.py", line 305, in debugerror
return web._InternalError(djangoerror())
File "lib/web/debugerror.py", line 290, in djangoerror
djangoerror_r = Template(djangoerror_t, filename=__file__, filter=websafe)
File "lib/web/template.py", line 846, in __init__
code = self.compile_template(text, filename)
File "lib/web/template.py", line 926, in compile_template
ast = compiler.parse(code)
File "/Users/sokeefe/homebrew/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/compiler/transformer.py", line 51, in parse
return Transformer().parsesuite(buf)
File "/Users/sokeefe/homebrew/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/compiler/transformer.py", line 128, in parsesuite
return self.transform(parser.suite(text))
AttributeError: 'module' object has no attribute 'suite'
Any thoughts on what might be going wrong with this function?
Thanks in advance!
What do you think is going on with parser.py?
Here is the relevant portion of the error message:
File
"/Users/sokeefe/homebrew/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/compiler/transformer.py",
line 128, in parsesuite
return self.transform(parser.suite(text)) AttributeError: 'module' object has no attribute 'suite'
So, somewhere there is a file called parser.py, which defines a function called suite(), which is used by some library code that executes when your program executes. But because you named one of your files parser.py, when the library code executes, python searches for a file named parser.py, and python found your file first, and there was no function named suite() in your file.

Create HDF5 file using pytables with table format and data columns

I want to read a h5 file previously created with PyTables.
The file is read using Pandas, and with some conditions, like this:
pd.read_hdf('myH5file.h5', 'anyTable', where='some_conditions')
From another question, I have been told that, in order for a h5 file to be "queryable" with read_hdf's where argument it must be writen in table format and, in addition, some columns must be declared as data columns.
I cannot find anything about it in PyTables documentation.
The documentation on PyTable's create_table method does not indicate anything about it.
So, right now, if I try to use something like that on my h5 file createed with PyTables I get the following:
>>> d = pd.read_hdf('test_file.h5','basic_data', where='operation==1')
C:\Python27\lib\site-packages\pandas\io\pytables.py:3070: IncompatibilityWarning:
where criteria is being ignored as this version [0.0.0] is too old (or
not-defined), read the file in and write it out to a new file to upgrade (with
the copy_to method)
warnings.warn(ws, IncompatibilityWarning)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 323, in read_hdf
return f(store, True)
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 305, in <lambda>
key, auto_close=auto_close, **kwargs)
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 665, in select
return it.get_result()
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 1359, in get_result
results = self.func(self.start, self.stop, where)
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 658, in func
columns=columns, **kwargs)
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 3968, in read
if not self.read_axes(where=where, **kwargs):
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 3196, in read_axes
values = self.selection.select()
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 4482, in select
start=self.start, stop=self.stop)
File "C:\Python27\lib\site-packages\tables\table.py", line 1567, in read_where
self._where(condition, condvars, start, stop, step)]
File "C:\Python27\lib\site-packages\tables\table.py", line 1528, in _where
compiled = self._compile_condition(condition, condvars)
File "C:\Python27\lib\site-packages\tables\table.py", line 1366, in _compile_condition
compiled = compile_condition(condition, typemap, indexedcols)
File "C:\Python27\lib\site-packages\tables\conditions.py", line 430, in compile_condition
raise _unsupported_operation_error(nie)
NotImplementedError: unsupported operand types for *eq*: int, bytes
EDIT:
The traceback mentions something about IncompatibilityWarning and version [0.0.0], however if I check my versions of Pandas and Tables I get:
>>> import pandas
>>> pandas.__version__
'0.15.2'
>>> import tables
>>> tables.__version__
'3.1.1'
So, I am totally confused.
I had the same issue, and this is what I have done.
Create a HDF5 file by PyTables;
Read this HDF5 file by pandas.read_hdf and use parameters like "where = where_string, columns = selected_columns"
I got the warning message like below and other error messages:
D:\Program
Files\Anaconda3\lib\site-packages\pandas\io\pytables.py:3065:
IncompatibilityWarning: where criteria is being ignored as this
version [0.0.0] is too old (or not-defined), read the file in and
write it out to a new file to upgrade (with the copy_to method)
warnings.warn(ws, IncompatibilityWarning)
I tried commands like this:
hdf5_store = pd.HDFStore(hdf5_file, mode = 'r')
h5cpt_store_new = hdf5_store.copy(hdf5_new_file, complevel=9, complib='blosc')
h5cpt_store_new.close()
And run the command exactly like step 2, it works.
pandas.version
'0.17.1'
tables.version
'3.2.2'

pyramid + jinja2 and new GAE runtime

I am trying to run Pyramid with Jinja2 using new Python 2.7 runtime in threadsafe mode and GAE 1.6.0 pre-release SDK. I've made modifications to my app as outlined here, i.e. I've set runtime: python27, threadsafe: true in app.yaml and got rid of main() function. When I generate response by myself it works fine, but when I try to bring jinja2 into the equation, I get the following exception:
ERROR 2011-11-07 00:10:34,356 wsgi.py:170]
Traceback (most recent call last):
File "/gae/google/appengine/runtime/wsgi.py", line 168, in Handle
[...]
File "/myapp/source/myapp-tip/main.py", line 29, in <module>
config.include('pyramid_jinja2')
File "/myapp/source/myapp-tip/lib/dist/pyramid/config/__init__.py", line 616, in include
c(configurator)
File "lib/dist/pyramid_jinja2/__init__.py", line 390, in includeme
_get_or_build_default_environment(config.registry)
File "/lib/dist/pyramid_jinja2/__init__.py", line 217, in _get_or_build_default_environment
_setup_environment(registry)
File "/lib/dist/pyramid_jinja2/__init__.py", line 253, in _setup_environment
package = _caller_package(('pyramid_jinja2', 'jinja2', 'pyramid.config'))
File "/lib/dist/pyramid_jinja2/__init__.py", line 136, in caller_package
for t in self.inspect.stack():
File "/usr/lib/python2.7/inspect.py", line 1056, in stack
return getouterframes(sys._getframe(1), context)
File "/usr/lib/python2.7/inspect.py", line 1034, in getouterframes
framelist.append((frame,) + getframeinfo(frame, context))
File "/usr/lib/python2.7/inspect.py", line 1009, in getframeinfo
lines, lnum = findsource(frame)
File "/usr/lib/python2.7/inspect.py", line 534, in findsource
module = getmodule(object, file)
File "/usr/lib/python2.7/inspect.py", line 506, in getmodule
main = sys.modules['__main__']
KeyError: '__main__'
I tried to mess around a bit with pyramid_jinja2 code to work around this issue, only to be left with another exception:
ERROR 2011-11-04 12:06:38,720 wsgi.py:170]
Traceback (most recent call last):
File "/gae/google/appengine/runtime/wsgi.py", line 168, in Handle
handler = _config_handle.add_wsgi_middleware(self._LoadHandler())
[...]
File "/myapp/source/myapp-tip/main.py", line 29, in <module>
config.add_jinja2_search_path("templates")
File "/myapp/source/myapp-tip/lib/dist/pyramid/config/util.py", line 28, in wrapper
result = wrapped(self, *arg, **kw)
File "/lib/dist/pyramid_jinja2/__init__.py", line 311, in add_jinja2_search_path
env.loader.searchpath.append(abspath_from_resource_spec(d))
File "/myapp/source/myapp-tip/lib/dist/pyramid/asset.py", line 38, in abspath_from_asset_spec
return pkg_resources.resource_filename(pname, filename)
File "/myapp/source/myapp-tip/pkg_resources.py", line 840, in resource_filename
return get_provider(package_or_requirement).get_resource_filename(
File "/myapp/source/myapp-tip/pkg_resources.py", line 160, in get_provider
__import__(moduleOrReq)
File "/gae/google/appengine/tools/dev_appserver_import_hook.py", line 640, in Decorate
return func(self, *args, **kwargs)
File "/gae/google/appengine/tools/dev_appserver_import_hook.py", line 1756, in load_module
return self.FindAndLoadModule(submodule, fullname, search_path)
File "/gae/google/appengine/tools/dev_appserver_import_hook.py", line 640, in Decorate
return func(self, *args, **kwargs)
File "/gae/google/appengine/tools/dev_appserver_import_hook.py", line 1628, in FindAndLoadModule
description)
File "/gae/google/appengine/tools/dev_appserver_import_hook.py", line 640, in Decorate
return func(self, *args, **kwargs)
File "/gae/google/appengine/tools/dev_appserver_import_hook.py", line 1571, in LoadModuleRestricted
description)
ImportError: Cannot re-init internal module __main__
I'd be happy if anybody could shed some light on what pyramid is trying to do under the hood. Judging by the latter stack trace it seems it's trying to resolve an asset, but why is it trying to reload __main__? I'm not even sure my problem is caused by pyramid or GAE.
Thanks for any insight on this issue.
I'm not familiar with pyramid, but the problem really does seem to be with this line:
config.include('pyramid_jinja2')
Whatever that config thing is, it seems to be doing some dynamic import magic.
Don't do that.
The app engine environment doesn't handle imports the way you would in normal python. Step through that line with a debugger and you'll wind up in the replacement version of the import system, which you'll soon see, only implements a small part of what real python does.
If possible, just use a normal import statement... Otherwise, you're going to have to dig into config.include and get it to play nice with the restricted importing features on GAE.
I managed to make it work using Pyramid 1.3's AssetResolver. First attempt is here. Just not sure what the lifetime/scope of the resolver should be in this case, I will figure it out later.
In pyramid_jinja2/__init__.py add the following code before _get_or_build_default_environment()
class VirtualModule(object):
def __init__(self,name):
import sys
sys.modules[name]=self
def __getattr__(self,name):
return globals()[name]
VirtualModule("__main__")
def _get_or_build_default_environment(registry):
(http://www.inductiveautomation.com/forum/viewtopic.php?f=70&p=36917)

Categories

Resources