Import MongoDB data to Azure ML Studio from Python Script - python

Currently in Azure ML's while executing python script, with following code. (Python 2.7.11)
In which results obtained from the mongoDB are trying to return in DataFrame using pyMongo.
I got an error like ::
"C:\pyhome\lib\site-packages\pymongo\topology.py", line 97, in select_servers
self._error_message(selector))
ServerSelectionTimeoutError: ... ('The write operation timed out',)
Please let me know if you know about the cause of the error and what to improve.
My Source code :
import pymongo as m
import pandas as pd
def azureml_main(dataframe1 = None, dataframe2 = None):
uri = "mongodb://xxxxx:yyyyyyyyyyyyyyy#zzz.mongodb.net:xxxxx/?ssl=true&replicaSet=globaldb"
client = m.MongoClient(uri,connect=False)
db = client['dbName']
coll = db['colectionName']
cursor = coll.find()
df = pd.DataFrame(list(cursor))
return df,
Error Details:
Error 0085: The following error occurred during script evaluation, please view the output log for more information:
---------- Start of error message from Python interpreter ----------
Caught exception while executing function: Traceback (most recent call last):
File "C:\server\invokepy.py", line 199, in batch
odfs = mod.azureml_main(*idfs)
File "C:\temp\55a174d8dc584942908423ebc0bac110.py", line 32, in azureml_main
result = pd.DataFrame(list(cursor))
File "C:\pyhome\lib\site-packages\pymongo\cursor.py", line 977, in next
if len(self.__data) or self._refresh():
File "C:\pyhome\lib\site-packages\pymongo\cursor.py", line 902, in _refresh
self.__read_preference))
File "C:\pyhome\lib\site-packages\pymongo\cursor.py", line 813, in __send_message
**kwargs)
File "C:\pyhome\lib\site-packages\pymongo\mongo_client.py", line 728, in _send_message_with_response
server = topology.select_server(selector)
File "C:\pyhome\lib\site-packages\pymongo\topology.py", line 121, in select_server
address))
File "C:\pyhome\lib\site-packages\pymongo\topology.py", line 97, in select_servers
self._error_message(selector))
ServerSelectionTimeoutError: xxxxx-xxx.mongodb.net:xxxxx: ('The write operation timed out',)
Process returned with non-zero exit code 1

As I known, there are a limitation of Execute Python Scripts which will cause this issue, please refer to the section Limitations to know it, as below.
Limitations
The Execute Python Script currently has the following limitations:
Sandboxed execution. The Python runtime is currently sandboxed and, as a result, does not allow access to the network or to the local file system in a persistent manner. All files saved locally are isolated and deleted once the module finishes. The Python code cannot access most directories on the machine it runs on, the exception being the current directory and its subdirectories.
Due to the reason above, you can not directly import the data from Azure Cosmos DB online via pymongo driver in Execute Python Script module. But you can use Import Data module with the connection and parameters information of your Azure Cosmos DB and connect its output to the input of Execute Python Script to get the data, as the figure below.
For more information to import data online, please refer to the section Import from online data sources of the offical document Import your training data into Azure Machine Learning Studio from various data sources.

Related

Why does this code only work intermittently? Sometimes I get an error that I can't find on Google

I'm using the below code to connect to the Google BigQuery API. It only works intermittently. After this part, I attempt to send a dataframe to a BQ table via the "to_gbq" function. That part also only sometimes works, and other times fails due to the same error message.
Code:
from google.cloud import bigquery
from google.oauth2 import service_account
path_to_json = 'gbq_key.json'
my_credentials = service_account.Credentials.from_service_account_file(path_to_json)
my_project = 'myprojectnameishere'
client = bigquery.Client(credentials=my_credentials, project=my_project)
print('connected')
AttributeError: module 'google.api_core' has no attribute 'client_options'
Detailed error:
File "/home/my_user_name/get_mini_bq_helper.py", line 8, in <module>
client = bigquery.Client(credentials=my_credentials, project=my_project)
File "/home/my_user_name/.local/lib/python3.10/site-packages/google/cloud/bigquery/client.py", line 226, in __init__
super(Client, self).__init__(
File "/home/my_user_name/.local/lib/python3.10/site-packages/google/cloud/client/__init__.py", line 321, in __init__
Client.__init__(
File "/home/my_user_name/.local/lib/python3.10/site-packages/google/cloud/client/__init__.py", line 157, in __init__
client_options = google.api_core.client_options.ClientOptions()
AttributeError: module 'google.api_core' has no attribute 'client_options'
I am hosting this on PythonAnywhere but I randomly get this error locally as well. Sometimes it connects with no issue, but most of the time I get this error. Also, I couldn't find this error anywhere on Google/Stack Overflow. Any ideas what it might be?

AWS Glue An error occurred while calling o128.resolveChoice

I have an AWS Glue job that is currently running nightly and scanning about 20 TB worth of raw JSON data and converting it to parquet. I just have the generic Python script that is generated when creating the job. I've run into an issue that is causing the job to fail and resulting in the following error.
py4j.protocol.Py4JError: An error occurred while calling o131.resolveChoice
This job has run successfully before without any issues. I made a change and added partition keys and it seems to be failing now after that change. This job takes almost 24 hours to run so coming up with a solution has been a slow process. I've not been able to find any kind of errors that match this one so i'm curious to find out what is happening here. Anybody have any ideas?
Here is the traceback from Cloudwatch
Traceback (most recent call last):
File "script_2020-09-14-02-00-49.py", line 17, in <module>
resolvechoice = ResolveChoice.apply(frame = applymapping, choice = "make_struct", transformation_ctx = "resolvechoice")
File "/mnt/yarn/usercache/root/appcache/application_1600045621100_0001/container_1600045621100_0001_02_000001/PyGlue.zip/awsglue/transforms/transform.py", line 24, in apply
File "/mnt/yarn/usercache/root/appcache/application_1600045621100_0001/container_1600045621100_0001_02_000001/PyGlue.zip/awsglue/transforms/resolve_choice.py", line 17, in __call__
File "/mnt/yarn/usercache/root/appcache/application_1600045621100_0001/container_1600045621100_0001_02_000001/PyGlue.zip/awsglue/dynamicframe.py", line 420, in resolveChoice
File "/mnt/yarn/usercache/root/appcache/application_1600045621100_0001/container_1600045621100_0001_02_000001/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/mnt/yarn/usercache/root/appcache/application_1600045621100_0001/container_1600045621100_0001_02_000001/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/mnt/yarn/usercache/root/appcache/application_1600045621100_0001/container_1600045621100_0001_02_000001/py4j-0.10.4-src.zip/py4j/protocol.py", line 327, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o128.resolveChoice
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:41623)
It seems to be failing on this line.
resolvechoice = ResolveChoice.apply(frame = applymapping, choice = "make_struct", transformation_ctx = "resolvechoice")
This script is currently working in my pre-production AWS env but not in my production env.

Python PermissionError uploading to an Azure Datalake folder

I am trying to upload a file to azure datalake using python script.
I am able to download a file from the datalake, but the uploading raise a permission error, whereas i checked all permissions at all levels (Read Write Execute and the option for the decendants).
## works fine
multithread.ADLDownloader(adls, lpath='C:\\Users\\User1\\file1.txt', rpath='/Test/', nthreads=64, overwrite=True,
buffersize=4194304,
blocksize=4194304)
## Raise error
multithread.ADLUploader(adls, rpath='/Test', lpath='C:\\Users\\User1\\HC',
nthreads=64 , chunksize=268435456, buffersize=4194304, blocksize=4194304, client=None, run=True,
overwrite=False, verbose=True)
the error:
File "C:\Users\Python37-32\test_azure.py", line 64, in <module>
overwrite=False, verbose=True)
File "C:\Users\Python37-32\lib\site-packages\azure\datalake\store\multithread.py", line 442, in __init__
self.run()
File "C:\Users\Python37-32\lib\site-packages\azure\datalake\store\multithread.py", line 548, in run
self.client.run(nthreads, monitor)
File "C:\Users\Python37-32\lib\site-packages\azure\datalake\store\transfer.py", line 525, in run
raise DatalakeIncompleteTransferException('One more more exceptions occured during transfer, resulting in an incomplete transfer. \n\n List of exceptions and errors:\n {}'.format('\n'.join(error_list)))
azure.datalake.store.exceptions.DatalakeIncompleteTransferException: One more more exceptions occured during transfer, resulting in an incomplete transfer.
List of exceptions and errors:
C:\Users\User1\HC\AC.TXT -> \Test\AC.TXT, chunk \Test\AC.TXT 0: errored, "PermissionError('/Test/AC.TXT')"
Does somebody have an idea of the problem ?
The azure account I am using has got all the privileges on the Datalake, but the Azure Application didn't.

How to fill table using mysql connector in Python 3

I have data in a text file which I need to upload into a table. My script in python 3 and uses mysql.connector (https://launchpad.net/myconnpy) to connect to DB and execute commands. I have been able successfully use mysql.connector in past without any problems but I am facing problem in using the command that uploads file to a table. My code is as follows:
def TableUpload(con2):
cur = con2.cursor()##Connect to destination server with table
res_file = 'extend2'
cur.execute("TRUNCATE TABLE data.results")## Clear table before writing
cur.execute("LOAD DATA LOCAL INFILE './extend2' INTO TABLE data.results FIELDS TERMINATED BY ','")
The code clears the table and than try to upload data from text file to table. It successfully clears the table but generated following error while filling table:
Traceback (most recent call last):
File "cl3.py", line 575, in <module>
TableUpload(con2)
File "cl3.py", line 547, in TableUpload
cur.execute("LOAD DATA LOCAL INFILE './extend2' INTO TABLE kakrana_data.mir_page_results FIELDS TERMINATED BY ','")
File "/usr/local/lib/python3.2/site-packages/mysql/connector/cursor.py", line 333, in execute
res = self.db().protocol.cmd_query(stmt)
File "/usr/local/lib/python3.2/site-packages/mysql/connector/protocol.py", line 137, in deco
return func(*args, **kwargs)
File "/usr/local/lib/python3.2/site-packages/mysql/connector/protocol.py", line 495, in cmd_query
return self.handle_cmd_result(self.conn.recv())
File "/usr/local/lib/python3.2/site-packages/mysql/connector/connection.py", line 180, in recv_plain
errors.raise_error(buf)
File "/usr/local/lib/python3.2/site-packages/mysql/connector/errors.py", line 84, in raise_error
raise get_mysql_exception(errno,errmsg)
mysql.connector.errors.NotSupportedError: 1148: The used command is not allowed with this MySQL version
When I use the command for uploading file directly from terminal than it works well. It is just that command is not working from script. The error says that command is not allowed with mysql version though it works from terminal. Please suggest what mistake I am making or alternative way to achieve data upload to a table from local file.

NI-VISA 5.1.2 + python 2.7 + OS 10.6.8 TCPIP ERROR

I have a Keithley 2701 DMM and I am trying to communicate with it via TCPIP using python 2.7 and pyVISA. I am running python 2.7 with virtualenv and wxPython. I know the device is active because I can ping the IP address, I am trying to access the machine using the following code:
from pyvisa.vpp43 import visa_library
visa_library.load_library("/Library/Frameworks/Visa.framework/VISA")
import visa import instrument
Keithley = visa.instrument("TCPIP::192.168.0.2::INSTR")
when I run the code I get the following error:
Traceback (most recent call last):
File "Keithley.py", line 4, in <module>
Keithley = visa.instrument("TCPIP::192.168.0.2::INSTR")
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/PyVISA-1.4-py2.7.egg/pyvisa/visa.py", line 294, in instrument
return Instrument(resource_name, **keyw)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/PyVISA-1.4-py2.7.egg/pyvisa/visa.py", line 358, in __init__
"lock")))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/PyVISA-1.4-py2.7.egg/pyvisa/visa.py", line 132, in __init__
keyw.get("lock", VI_NO_LOCK))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/PyVISA-1.4-py2.7.egg/pyvisa/vpp43.py", line 753, in open
byref(vi))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/PyVISA-1.4-py2.7.egg/pyvisa/vpp43.py", line 398, in check_status
raise visa_exceptions.VisaIOError, status
pyvisa.visa_exceptions.VisaIOError: VI_ERROR_RSRC_NFOUND: Insufficient location information or the requested device or resource is not present in the system.
Any help will be greatly appreciated....
V
I have not played with this particular DMM, but I have connected to several other devices using your same setup.
1) Check your documentation / DMM to ensure that the board number is in fact zero. Otherwise you'll need to change the following line:
Keithley = visa.instrument("TCPIP::192.168.0.2::INSTR")
To something more like
Keithley = visa.instrument("TCPIP1::192.168.0.2::INSTR")
2) Try to use a raw SOCKET connectional rather than the typical INSTR method.
NI Socket Examples
Keithley 2701 Examples
If I can dream up anything else I will update my response.

Categories

Resources