Clarify how to specify file paths inside Docker coming from Python

Clarify how to specify file paths inside Docker coming from Python - python

I have created a Docker image with for a Flask app. Inside my Flask app, this is how I specify the file paths.
dPath = os.getcwd() + "\\data\\distanceMatrixv2.csv"
This should ideally resolve in a filepath similar to app/data/distanceMatrixv2.csv. This works when I do a py main.py run in the CMD. After making sure I am in the same directory, creating the image and running the Docker image via docker run -it -d -p 5000:5000 flaskapp, it throws me the error below when I try to do any processing.
FileNotFoundError: /backend\data\distanceMatrixv2.csv not found.
I believe this is due to how Docker resolves file paths but I am a bit lost on how to "fix" this. I am using Windows while my Docker image is built with FROM node:lts-alpine.

node:lts-alpine is based on Alpine which is a Linux distro. So your python program runs under Linux and should use Linux file paths, i.e. forward slashes like
dPath = os.getcwd() + "/data/distanceMatrixv2.csv"
It looks like os.getcwd() returns /backend. So the full path becomes /backend/data/distanceMatrixv2.csv.

The os.path library can do this portably across operating systems. If you write this as
import os
import os.path
dPath = os.path.join(os.getcwd(), "data", "distanceMatrixv2.csv")
it will work whether your (Windows) system uses DOS-style backslash-separated paths or your (Linux) system uses Unix-style forward slashes.
Python 3.4 also added a pathlib library, if you prefer a more object-oriented approach. This redefines the Python / division operator to concatenate paths, using the native path separator. So you could also write
from pathlib import Path
dPath = Path.cwd() / "data" / "distanceMatrixv2.csv"

Related

Why does Colab not find the parent directory for relative imports?

I have a python file that I tested locally using VS-Code and the shell. The file contains relative imports and worked as intended on my local machine.
After uploading the associated files to Colab I did the following:
py_file_location = '/content/gdrive/content/etc'
os.chdir(py_file_location)
# to verify whether I got the correct path
!python3
>>> import os
>>> os.getcwd()
output: '/content/gdrive/MyDrive/content/etc'
However, when I run the file I get the following error:
ImportError: attempted relative import with no known parent package
Why is that? Using the same file system and similar shell commands the file worked locally.

A solution that worked for me was to turn my relative paths into static ones, so instead of using:
directory = pathlib.Path(__file__).parent
sys.path.append(str(directory.parent))
sys.path.append(str(directory.parent.parent.parent))
__package__ = directory.name
I needed to make the path static by using resolve():
directory = pathlib.Path(__file__).resolve().parent
sys.path.append(str(directory.parent))
sys.path.append(str(directory.parent.parent.parent))
__package__ = directory.name
It is however still not fully clear to me why this is required when running on Colab but not when running locally.

Run all Python scripts in a folder - from Python

How can I write a Python program that runs all Python scripts in the current folder? The program should run in Linux, Windows and any other OS in which python is installed.
Here is what I tried:
import glob, importlib
for file in glob.iglob("*.py"):
importlib.import_module(file)
This returns an error: ModuleNotFoundError: No module named 'agents.py'; 'agents' is not a package
(here agents.py is one of the files in the folder; it is indeed not a package and not intended to be a package - it is just a script).
If I change the last line to:
importlib.import_module(file.replace(".py",""))
then I get no error, but also the scripts do not run.
Another attempt:
import glob, os
for file in glob.iglob("*.py"):
os.system(file)
This does not work on Windows - it tries to open each file in Notepad.

You need to specify that you are running the script through the command line. To do this you need to add python3 plus the name of the file that you are running. The following code should work
import os
import glob
for file in glob.iglob("*.py"):
os.system("python3 " + file)
If you are using a version other than python3, just change the argument from python3 to python

Maybe you can make use of the subprocess module; this question shows a few options.
Your code could look like this:
import os
import subprocess
base_path = os.getcwd()
print('base_path', base_path)
# TODO: this might need to be 'python3' in some cases
python_executable = 'python'
print('python_executable', python_executable)
py_file_list = []
for dir_path, _, file_name_list in os.walk(base_path):
for file_name in file_name_list:
if file_name.endswith('.csv'):
# add full path, not just file_name
py_file_list.append(
os.path.join(dir_path, file_name))
print('PY files that were found:')
for i, file_path in enumerate(py_file_list):
print(' {:3d} {}'.format(i, file_path))
# call script
subprocess.run([python_executable, file_path])
Does that work for you?
Note that the docs for os.system() even suggest using subprocess instead:
The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function.

If you have control over the content of the scripts, perhaps you might consider using a plugin technique, this would bring the problem more into the Python domain and thus makes it less platform dependent. Take a look at pyPlugin as an example.
This way you could run each "plugin" from within the original process, or using the Python::multiprocessing library you could still seamlessly use sub-processes.

Can't run binary from within python aws lambda function

I am trying to run this tool within a lambda function: https://github.com/nicolas-f/7DTD-leaflet
The tool depends on Pillow which depends on imaging libraries not available in the AWS lambda container. To try and get round this I've ran pyinstaller to create a binary that I can hopefully execute. This file is named map_reader and sits at the top level of the lambda zip package.
Below is the code I am using to try and run the tool:
command = 'chmod 755 map_reader'
args = shlex.split(command)
print subprocess.Popen(args)
command = './map_reader -g "{}" -t "{}"'.format('/tmp/mapFiles', '/tmp/tiles')
args = shlex.split(command)
print subprocess.Popen(args)
And here is the error, which occurs on the second subprocess.Popen call:
<subprocess.Popen object at 0x7f08fa100d10>
[Errno 13] Permission denied: OSError
How can I run this correctly?

You may have been misled into what the issue actually is.
I don't think that the first Popen ran successfully. I think that it just dumped a message in standard error and you're not seeing it. It's probably saying that
chmod: map_reader: No such file or directory
I suggest you can try either of these 2:
Extract the map_reader from the package into /tmp. Then reference it with /tmp/map_reader.
Do it as recommended by Tim Wagner, General Manager of AWS Lambda who said the following in the article Running Arbitrary Executables in AWS Lambda:
Including your own executables is easy; just package them in the ZIP file you upload, and then reference them (including the relative path within the ZIP file you created) when you call them from Node.js or from other processes that you’ve previously started. Ensure that you include the following at the start of your function code:
process.env[‘PATH’] = process.env[‘PATH’] + ‘:’ + process.env[‘LAMBDA_TASK_ROOT’]
The above code is for Node JS but for Python, it's like the following
import os
os.environ['PATH']
That should make the command command = './map_reader <arguments> work.
If they still don't work, you may also consider running chmod 755 map_reader before creating the package and uploading it (as suggested in this other question).

I know I'm a bit late for this but if you want a more generic way of doing this (for instance if you have a lot of binaries and might not use them all), this how I do it, provided you put all your binaries in a bin folder next to your py file, and all the libraries in a lib folder :
import shutil
import time
import os
import subprocess
LAMBDA_TASK_ROOT = os.environ.get('LAMBDA_TASK_ROOT', os.path.dirname(os.path.abspath(__file__)))
CURR_BIN_DIR = os.path.join(LAMBDA_TASK_ROOT, 'bin')
LIB_DIR = os.path.join(LAMBDA_TASK_ROOT, 'lib')
### In order to get permissions right, we have to copy them to /tmp
BIN_DIR = '/tmp/bin'
# This is necessary as we don't have permissions in /var/tasks/bin where the lambda function is running
def _init_bin(executable_name):
start = time.clock()
if not os.path.exists(BIN_DIR):
print("Creating bin folder")
os.makedirs(BIN_DIR)
print("Copying binaries for "+executable_name+" in /tmp/bin")
currfile = os.path.join(CURR_BIN_DIR, executable_name)
newfile = os.path.join(BIN_DIR, executable_name)
shutil.copy2(currfile, newfile)
print("Giving new binaries permissions for lambda")
os.chmod(newfile, 0775)
elapsed = (time.clock() - start)
print(executable_name+" ready in "+str(elapsed)+'s.')
# then if you're going to call a binary in a cmd, for instance pdftotext :
_init_bin('pdftotext')
cmdline = [os.path.join(BIN_DIR, 'pdftotext'), '-nopgbrk', '/tmp/test.pdf']
subprocess.check_call(cmdline, shell=False, stderr=subprocess.STDOUT)

There were two issues here. First, as per Jeshan's answer, I had to move the binary to /tmp before I could properly access it.
The other issue was that I'd ran pyinstaller on ubuntu, creating a single file. I saw elsewhere some comments about being sure to compile on the same architecture as the lambda container runs. Therefore I ran pyinstaller on ec2 based on the Amazon Linux AMI. The output was multiple .os files, which when moved to tmp, worked as expected.

copyfile('/var/task/yourbinary', '/tmp/yourbinary')
os.chmod('/tmp/yourbinary', 0555)
Moving the binary to /tmp and making it executable worked for me

There is no need to copy the files the /tmp. You can just use ld-linux to execute any file including those not marked executable.
So, for running a non-executable on AWS Lambda, you use the following command:
/lib64/ld-linux-x86-64.so.2 /opt/map_reader
P.S. It would make more sense to add the map_reader binary or any other static files in a Lambda Layer, thus the /opt folder.

Like the docs mention for Node.js, you need to update the $PATH, else you'll get command not found when trying to run the executables you added at the root of your Lambda package. In Node.js, that's:
process.env['PATH'] = process.env['PATH'] + ':' + process.env['LAMBDA_TASK_ROOT']
Now, the same thing in Python:
import os
# Make the path stored in $LAMBDA_TASK_ROOT available to $PATH, so that we
# can run the executables we added at the root of our package.
os.environ["PATH"] += os.pathsep + os.environ['LAMBDA_TASK_ROOT']
Tested OK with Python 3.8.
(As a bonus, here are some more env variables used by Lambda.)

How to get Desktop location?

I'm using Python on Windows and I want a part of my script to copy a file from a certain directory (I know its path) to the Desktop.
I used this:
shutil.copy(txtName, '%HOMEPATH%/desktop')
While txtName is the txt File's name (with full path).
I get the error:
IOError: [Errno 2] No such file or directory: '%HOMEPATH%/DESKTOP'
Any help?
I want the script to work on any computer.

On Unix or Linux:
import os
desktop = os.path.join(os.path.join(os.path.expanduser('~')), 'Desktop')
on Windows:
import os
desktop = os.path.join(os.path.join(os.environ['USERPROFILE']), 'Desktop')
and to add in your command:
shutil.copy(txtName, desktop)

You can use os.environ["HOMEPATH"] to get the path. Right now it's literally trying to find %HOMEPATH%/Desktop without substituting the actual path.
Maybe something like:
shutil.copy(txtName, os.path.join(os.environ["HOMEPATH"], "Desktop"))

This works on both Windows and Linux:
import os
desktop = os.path.expanduser("~/Desktop")
# the above is valid on Windows (after 7) but if you want it in os normalized form:
desktop = os.path.normpath(os.path.expanduser("~/Desktop"))

For 3.5+ you can use pathlib:
import pathlib
desktop = pathlib.Path.home() / 'Desktop'

I can not comment yet, but solutions based on joining location to a user path with 'Desktop' have limited appliance because Desktop could and often is being remapped to a non-system drive.
To get real location a windows registry should be used... or special functions via ctypes like https://stackoverflow.com/a/626927/7273599

All those answers are intrinsecally wrong : they only work for english sessions.
You should check the XDG directories instead of supposing it's always 'Desktop'.
Here's the correct answer: How to get users desktop path in python independent of language install (linux)

Try this:
import os
file1 =os.environ["HOMEPATH"] + "\Desktop\myfile.txt"

Just an addendum to #tpearse accepted answer:
In an embedded environment (c++ programm calling a python environment)
os.path.join(os.environ["HOMEPATH"], "Desktop")
was the only one that worked. Seems like
os.path.expanduser("~/Desktop")
does not return a usable path for the embedded environment (at least not in my one; But some environmental settings in visual studio could be missing in my setup)

Simple and Elegant Try this to access file in Desktop:
import os
import pathlib
import pandas as pd
desktop = pathlib.Path.home() / 'Desktop' / "Panda's" / 'data'
print(desktop) #check you path correct ?
data = pd.read_csv(os.path.joinn(desktop),'survey_results_public.csv')
Congrants!

File not cacheing on AWS Elastic Map Reduce

I'm running the following MapReduce on AWS Elastic MapReduce:
./elastic-mapreduce --create --stream --name CLI_FLOW_LARGE --mapper
s3://classify.mysite.com/mapper.py --reducer
s3://classify.mysite.com/reducer.py --input
s3n://classify.mysite.com/s3_list.txt --output
s3://classify.mysite.com/dat_output4/ --cache
s3n://classify.mysite.com/classifier.py#classifier.py --cache-archive
s3n://classify.mysite.com/policies.tar.gz#policies --bootstrap-action
s3://classify.mysite.com/bootstrap.sh --enable-debugging
--master-instance-type m1.large --slave-instance-type m1.large --instance-type m1.large
For some reason the cacheFile classifier.py is not being cached, it would seem. I get this error when the reducer.py tries to import it:
File "/mnt/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201204290242_0001/attempt_201204290242_0001_r_000000_0/work/./reducer.py", line 12, in <module>
from classifier import text_from_html, train_classifiers
ImportError: No module named classifier
classifier.py is most definitely present at s3n://classify.mysite.com/classifier.py. For what it's worth, the policies archive seems to load in just fine.

I don't know how to fix this problem in EC2, but I've seen it before with Python in traditional Hadoop deployments. Hopefully the lesson translates over.
What we need to do is add the directory reduce.py is in to the python path, because presumably classifier.py is in there too. For whatever reason, this place is not in the python path, so it is failing to find classifier.
import sys
import os.path
# add the directory where reducer.py is to the python path
sys.path.append(os.path.dirname(__file__))
# __file__ is the location of reduce.py, along with "reduce.py"
# dirname strips the file name and only gives the directory
# sys.path is the python path where it looks for modules
from classifier import text_from_html, train_classifiers
The reason why your code might work locally is because of the current working directory in which you are running it from. Hadoop might not be running it from the same place you are in terms of the current working directory.

orangeoctopus deserves credit for this from his comment. Had to append the working directory system path:
sys.path.append('./')
Also, I recommend anyone who has similar issues to me to read this great article on using Distributed Cache on AWS:
https://forums.aws.amazon.com/message.jspa?messageID=152538

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Clarify how to specify file paths inside Docker coming from Python - python

Related

Why does Colab not find the parent directory for relative imports?

Run all Python scripts in a folder - from Python

Can't run binary from within python aws lambda function

How to get Desktop location?

File not cacheing on AWS Elastic Map Reduce

Categories

Resources