Please help me to write a customized partitioner function in python for spark.
I have a file telling the mapping between the entry data key and partition id, I first load it into a dict variable "data_to_partition_map" in main.py
then in Spark
sc.parallelize(input_lines).partitionBy(numPartitions=xx, partitionFunc=lambda x : data_to_partition_map[x])
When I run this code locally, it gives error:
Traceback (most recent call last):
File "/home/weiyu/workspace/dice/process_platform_spark/process/roadCompile/main.py", line 111, in <module>
.partitionBy(numPartitions=tile_partitioner.num_partitions, partitionFunc=lambda x: tile_tasks_in_partitions[x])
File "/home/weiyu/app/odps-spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1785, in partitionBy
File "/home/weiyu/app/odps-spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1392, in __call__
File "/home/weiyu/app/odps-spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 289, in get_command_part
AttributeError: 'function' object has no attribute '_get_object_id'
It seems Spark cannot serialize lambda object, does someone have any idea about this error and tell me how to fix it ? Thanks very much
Have u tried to use a function that simply return the dict item, and pass it as partiction function?
def return_key(x):
return your_dict[x]
Pass it as partitionFunction.
Related
I am really new with ontologies especially with owlready2. I loaded an Ontology the basic example Pizza and imported I think successfully on python (I checked whether I can see the classes which I can so..)
Than I used the following code to search one class specifically with the method search():
from owlready2 import *
onto_path.append(r"C:/Users/AyselenKuru/Desktop/owl_docs/owlpizza.owl")
onto=get_ontology(r"C:/Users/AyselenKuru/Desktop/owl_docs/owlpizza.owl")
onto.load()
am= onto.search_one(is_a= onto.American)
for x in onto.classes():
print(x)
I want to know how can I search/get one specific Class and an Attribute and I get the following error message:
Traceback (most recent call last):
File "c:\Users\AyselenKuru\Desktop\pizza_ex1.py", line 6, in <module>
am= onto.search_one(is_a = onto.American)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\AyselenKuru\AppData\Local\Programs\Python\Python311\Lib\site-packages\owlready2\namespace.py", line 395, in search_one
def search_one(self, **kargs): return self.search(**kargs).first()
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\AyselenKuru\AppData\Local\Programs\Python\Python311\Lib\site-packages\owlready2\namespace.py", line 364, in search
else: v2 = v.storid
^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'storid'
Problem solved itself, the example owl File had errors with IRI that caused a Problem with the search
I am trying to execute a udf function and it returns an error
from pyspark.sql.functions import udf
mytab = spark.read.jdbc(url=jdbcUrl, table="mytab",properties=connectionProperties)
def buscarx(Alm_r, Pro, Data_mat):
data_s = mytab.where(col("doc")==Data_mat).where(col("alm")!=Alm_r).limit(1)
if(data_s.count()==0):
return Pro
else:
temp = "0"
for item in data_s.collect():
temp = data_s.Alm
return temp
buscarx_udf = udf(buscarx)
df_temp = mytab.withColumn("alm_origen", buscarx_udf(mytab.Alm,mytab.Proveedor,mytab.Doc_mat))
Error:
Traceback (most recent call last):
File "/databricks/spark/python/pyspark/serializers.py", line 473, in dumps
return cloudpickle.dumps(obj, pickle_protocol)
File "/databricks/spark/python/pyspark/cloudpickle/cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "/databricks/spark/python/pyspark/cloudpickle/cloudpickle_fast.py", line 563, in dump
return Pickler.dump(self, obj)
TypeError: cannot pickle '_thread.RLock' object
PicklingError: Could not serialize object: TypeError: cannot pickle '_thread.RLock' object
I ran some tests and found that the problem is caused by:
data_s = mytab.where(col("doc")==Data_mat).where(col("alm")!=Alm_r).limit(1)
Any suggestions to fix this? I need to perform a query within the function.
A USER DEFINED FUNCTION operates on data within a dataframe and not on the dataframe as a whole like spark sql functions. Hence you cannot use pyspark sql methods like where, filter etc.
I'm trying to use statsmodels.tsa.x13 with my Python 3.6 (anaconda\spider). I`ve already installed x13as and write this code:
X13PATH= os.chdir("C:\\x13\WinX13\\x13as")
x13results = x13_arima_analysis(endog = mb["G"], x12path=X13PATH, outlier=True,print_stdout=True)
where mb["G"] is pandas.core.series.Series. So, the result is following:
C:\Anaconda\lib\site-packages\statsmodels\tsa\x13.py:460: IOWarning: Failed to delete resource C:\Users\SERGEY~1\AppData\Local\Temp\tmp2iwvb0uo.spc
IOWarning)
C:\Anaconda\lib\site-packages\statsmodels\tsa\x13.py:463: IOWarning: Failed to delete resource C:\Users\SERGEY~1\AppData\Local\Temp\tmp_h3vwxc9
IOWarning)
Traceback (most recent call last):
File "<ipython-input-3-8e98768a4534>", line 2, in <module>
x13results = x13_arima_analysis(endog = mb["G"], x12path=X13PATH, outlier=True,print_stdout=True)
File "C:\Anaconda\lib\site-packages\statsmodels\tsa\x13.py", line 434, in x13_arima_analysis
ftempin.write(spec)
File "C:\Anaconda\lib\tempfile.py", line 483, in func_wrapper
return func(*args, **kwargs)
TypeError: a bytes-like object is required, not 'str'
What's the problem? I will be grateful for any help.
You need to pass as a string. Change
X13PATH= os.chdir("C:\\x13\WinX13\\x13as")
to
X13PATH= "C:\\x13\WinX13\\x13as"
From the statsmodels docs: "x12path (str or None) – The path to x12 or x13 binary. If None, the program will attempt to find x13as or x12a on the PATH or by looking at X13PATH or X12PATH depending on the value of prefer_x13."
I was coding something at work and it seems that some C API functions provided by python are not working. I tried mainly the function that check types, for example:
import ctypes
python33_dll = ctypes.CDLL('python33.dll')
a_float = python33_dll.PyFloat_FromDouble(ctypes.c_float(2.0))
python33_dll.PyFloat_Check(a_float)
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
python33_dll.PyFloat_Check(a_float)
File "C:\Python33\lib\ctypes\__init__.py", line 366, in __getattr__
func = self.__getitem__(name)
File "C:\Python33\lib\ctypes\__init__.py", line 371, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: function 'PyFloat_Check' not found
Is there anything specific I need to do to use this function, or is it a bug?
docs.python.org/3.3/c-api/float.html?highlight=double#PyFloat_Check
PyFloat_Check() is a macro. You will need to expand it manually and call the correct function instead.
I'm trying to get the name of a WMI win32 class. But the __name__ attribute is not defined for it.
>> import wmi
>> machine = wmi.WMI()
>> machine.Win32_ComputerSystem.__name__
I get the following error:
Traceback (most recent call last):
File "<pyshell#21>", line 1, in <module>
machine.Win32_ComputerSystem.__name__
File "C:\Python27\lib\site-packages\wmi.py", line 796, in __getattr__
return _wmi_object.__getattr__ (self, attribute)
File "C:\Python27\lib\site-packages\wmi.py", line 561, in __getattr__
return getattr (self.ole_object, attribute)
File "C:\Python27\lib\site-packages\win32com\client\dynamic.py", line 457, in __getattr__
raise AttributeError(attr)
AttributeError: __name__
I thought that the __name__ attribute is defined for all Python functions, so I don't know what the problem is here. How is it possible that this function doesn't have that attribute?
OK, The reason that I thought it was a method is because machine.Win32_ComputerSystem() is defined, but I guess that isn't enough for something to be a method. I realise that it isn't a method.
However, this doesn't work:
>> machine.Win32_ComputerSystem.__class__.__name__
'_wmi_class'
I want it to return 'Win32_ComputerSystem'. How can I do this?
From what I can tell looking at the documentation (specifically, based on this snippet), wmi.Win32_ComputerSystem is a class, not a method. If you want to get its name you could try:
machine.Win32_ComputerSystem.__class__.__name__
I've found a way to get the output that I want, however it doesn't satisfy me.
repr(machine.Win32_ComputerSystem).split(':')[-1][:-1]
returns: 'Win32_ComputerSystem'
There must be a more Pythonic way to do this.