I have written a Datadog Agent check in Python following the instructions on this page: https://docs.datadoghq.com/developers/agent_checks/.
The agent check is supposed to read all files in a specified network folder and then send certain metrics to Datadog.
The folder to be read is specified like this in the Yaml file:
init_config:
taskResultLocation: "Z:/TaskResults"
This is the code used to read the folder, it is Python 2.7 because that is required by Datadog
task_result_location = self.init_config.get('taskResultLocation')
# Loop through all the XML files in the specified folder
for file in os.listdir(task_result_location):
If I just run the Python script in my IDE everything works correctly.
When the check is added to the Datadog Agent Manager on the same machine that the IDE is on and the check is run an error is thrown in the Datadog Agent Manager Log saying:
2018-08-14 14:33:26 EEST | ERROR | (runner.go:277 in work) | Error running check TaskResultErrorReader: [{"message": "[Error 3] The system cannot find the path specified: 'Z:/TaskResults/.'", "traceback": "Traceback (most recent call last):\n File \"C:\Program Files\Datadog\Datadog Agent\embedded\lib\site-packages\datadog_checks\checks\base.py\", line 294, in run\n self.check(copy.deepcopy(self.instances[0]))\n File \"c:\programdata\datadog\checks.d\TaskResultErrorReader.py\", line 42, in check\n for file in os.listdir(task_result_location):\nWindowsError: [Error 3] The system cannot find the path specified: 'Z:/TaskResults/.'\n"}]
I have tried specifying the folder location in multiple ways with single and double quotes, forward and back slashes and double slashes but the same error is thrown.
Would anyone know if this is a Yaml syntax error or some sort of issue with Datadog or the Python?
Even though datadog is being run from the same machine, it is setting up a separate server on your machine. Because of that, it sounds like the datadog agent doesn't have access to your z:/ driver.
Try to put the "TaskResults" folder in your root directory (when running from datadog - where the mycheck.yaml file is) and change the path accordingly.
If this works and you still want to have a common drive to be able to share files from your computer to datadog's agent, you have to find a way to mount a drive\folder to the agent. They probably have a way to do that in the documentation
The solution to this is to create a file share on the network drive and use that path instead of the full network drive path.
May be obvious to some but it didn't occur to me right away since the normal Python code worked without any issue outside of Datadog.
So instead of:
init_config:
taskResultLocation: "Z:/TaskResults"
use
init_config:
taskResultLocation: '//FileShareName/d/TaskResults'
Related
I used the following code to read a shapefile from dbfs:
geopandas.read_file("file:/databricks/folderName/fileName.shp")
Unfortunately, I don't have access to do so and I get the following error
DriverError: dbfs:/databricks/folderName/fileName.shp: Permission denied
Any idea how to grant the access? File exist there (I have a permission to save a file there using dbutils, also - I can read a file from there using spark, but I have no idea how to read a file using pyspark).
After adding those lines:
dbutils.fs.cp("/databricks/folderName/fileName.shp", "file:/tmp/fileName.shp", recurse = True)
geopandas.read_file("/tmp/fileName.shp")
...from suggestion below I get another error.
org.apache.spark.api.python.PythonSecurityException: Path 'file:/tmp/fileName.shp' uses an untrusted filesystem 'org.apache.hadoop.fs.LocalFileSystem', but your administrator has configured Spark to only allow trusted filesystems: (com.databricks.s3a.S3AFileSystem, shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem, shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem, com.databricks.adl.AdlFileSystem, shaded.databricks.V2_1_4.com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem, shaded.databricks.org.apache.hadoop.fs.azure.NativeAzureFileSystem, shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem)
GeoPandas doesn't know anything about DBFS - it's working with the local files. So you either need:
to use the DBFS Fuse to read file from DBFS (but there are some limitations):
geopandas.read_file("/dbfs/databricks/folderName/fileName.shp")
or use dbutils.fs.cp command to copy file from DBFS to the local filesystem, and read from it:
dbutils.fs.cp("/databricks/folderName/fileName.shp", "file:/tmp/fileName.shp", recurse = True)
geopandas.read_file("/tmp/fileName.shp")
P.S. But if the file is already copied to the driver node, then you just need to remove file: from the name.
Updated after updated question:
There are limitations on what could be done on the AAD passthrough clusters, so your administrator needs to change cluster configuration as it's described in the documentation on troubleshooting if you want to copy file from DBFS to local file system.
But the /dbfs way should work for passthrough clusters as well, although it should be at least DBR 7.3 (docs)
Okay, the answer is easier than I could thought:
geopandas.read_file("/dbfs/databricks/folderName")
(folder name as it is a folder with all shape files)
Why it should be like that? Easy. Enable the possibility of checking files on DBFS in admin control panel ('Advanced' tab), click on the file you need and you will get two possible paths to the file. One is dedicated to Spark API and another one for File API (this is what I needed).
:)
This is the error I get when I try to invoke my lambda function as a ZIP file.
"The file lambda_function.py could not be found. Make sure your
handler upholds the format: file-name.method."
What am I doing wrong?
Mostly it is because of the way zipping the files making the problem. Instead of zipping the root folder you have to select all files and zip it like below,
Please upload all files and subfolders. My example is using node.js but you can do the same for python
Just to clarify: If I want to invoke Keras all I have to do is download the Keras directories and put my lambda code and Keras directories as a zip folder and upload it directly from my desktop right?
Just wanted to know if this is the right method to invoke Keras.
Whenever getting these kind of messages, if you see all files and handlers have the right name, format, location, etc., check also if other parts of the Lambda configuration are set up properly for what the code is trying to do.
For example, you might receive that unrelated error if your code is trying to execute against an RDS database that is in a private subnet and you are missing the correct VPC configuration that allows connectivity to that database.
I'm trying to solve a question I have created about how to read the recorded date of the videos I took with a windows phone. It seems that the creation date are overwritten when the files are "sync" to my computer.
I'm trying to get around this by looking at the files in the phone directly. So I need to access to
"Computer\Windows Phone\Phone\Pictures\Camera Roll"
My problem is that I can only get os.chdir() to work on paths that has C:// as root
Any suggestions?
Update
I tried to place and run a file that prints the current directory.
Which gave me the result
C:\Users\<myUser~1.COM>\AppData\Local\Temp\WPDNSE\{<a lot of numbers and dashes>}
I am not familiar with Windows Phone paths in particular, but you should be able to figure out the "real" path by using the Windows file explorer to look at the properties of a file or folder. Right-click, choose Properties and look for a Location field.
Note that some "folders", such as the ones under "Libraries", are actually XML files pointing to multiple other locations.
Maybe the phone is connect via MTP.
How to access an MTP USB device with python
could help then.
[EDIT] They mentioned calibre there. The source code of calibre mabye already contain functions for getting file informations on mobile devices.
Working on a python scraper/spider and encountered a URL that exceeds the char limit with the titled IOError. Using httplib2 and when I attempt to retrieve the URL I receive a file name too long error. I prefer to have all of my projects within the home directory since I am using Dropbox. Anyway around this issue or should I just setup my working directory outside of home?
You are probably hitting limitation of the encrypted file system, which allows up to 143 chars in file name.
Here is the bug:
https://bugs.launchpad.net/ecryptfs/+bug/344878
The solution for now is to use any other directory outside your encrypted home directory. To double check this:
mount | grep ecryptfs
and see if your home dir is listed.
If that's the case either use some other dir above home, or create a new home directory without using encryption.
The fact that the filename that's too long starts with '.cache/www.example.com' explains the problem.
httplib2 optionally caches requests that you make. You've enabled caching, and you've given it .cache as the cache directory.
The easy solution is to put the cache directory somewhere else.
Without seeing your code, it's impossible to tell you how to fix it. But it should be trivial. The documentation for FileCache shows that it takes a dir_name as the first parameter.
Or, alternatively, you can pass a safe function that lets you generate a filename from the URI, overriding the default. That would allow you to generate filenames that fit within the 144-character limit for Ubuntu encrypted fs.
Or, alternatively, you can create your own object with the same interface as FileCache and pass that to the Http object to use as a cache. For example, you could use tempfile to create random filenames, and store a mapping of URLs to filenames in an anydbm or sqlite3 database.
A final alternative is to just turn off caching, of course.
As you apparently have passed '.cache' to the httplib.Http constructor, you should change this to something more appropriate or disable the cache.
I need to read in a dictionary file to filter content specified in the hdfs_input, and I have uploaded it to the cluster using the put command, but I don't know how to access it in my program.
I tried to access it using path on the cluster like normal files, but it gives the error information: IOError: [Errno 2] No such file or directory
Besides, is there any way to maintain only one copy of the dictionary for all the machines that runs the job ?
So what's the correct way of access files other than the specified input in hadoop jobs?
Problem solved by adding the file needed with the -file option or file= option in conf file.