scrapy command line in Azure Function App - python

I cannot get the following to work in a Python Function App (Azure):
subprocess.run(["scrapy"])
Why do I need this to work? I am using advertools (which runs that command, see https://github.com/eliasdabbas/advertools/blob/master/advertools/spider.py)
What are the issues:
First when deploying, the command line does not add the scrapy command line executable to the path
Deploying with oryx, the is an additional issue which oryx ads the wrong python interpreter to the scrapy executable (#!/tmp/orxy/.../python3)
What do I do try to fix this:
add the scrapy exec to my project: lib/advertools/scrapy_path/scrapy (with the correct path the python)
add that file to my path:
os.environ["PATH"] += os.pathsep + str(scrapy_bin_path)
What is the result:
running subprocess.run(["ls", '-la', str(scrapy_bin_path)], capture_output=True, text=True) returns:
CompletedProcess(args=['ls', '-la', '/home/site/wwwroot/lib/advertools/scrapy_path'], returncode=0, stdout='total 0\n-rwxr-xr-x 1 root root 230 Dec 2 10:10 scrapy\n', stderr='')
so file is present and executable
running subprocess.run(["which", "scrapy"], capture_output=True, text=True) returns:
CompletedProcess(args=['which', 'scrapy'], returncode=0, stdout='/home/site/wwwroot/lib/advertools/scrapy_path/scrapy\n', stderr='')
encouraging...
but now finally running subprocess.run(["scrapy"], capture_output=True, text=True) returns:
[Information] Traceback (most recent call last):
File "/home/site/wwwroot/lib/advertools/test.py", line 74, in exec
result_scrapy = subprocess.run(["scrapy"], capture_output=True, text=True)
File "/usr/local/lib/python3.9/subprocess.py", line 505, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/local/lib/python3.9/subprocess.py", line 951, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/local/lib/python3.9/subprocess.py", line 1821, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'scrapy'
file not found?!
I don't understand why it cannot find 'scrapy' if which scrapy does find it

Please check the below troubleshooting steps to fix the issue:
Try by creating the Virtual Environment > Install Scrapy > Do Next Steps
Scrapy is a part of the Spider framework. Reinstall the scrapy module and check once.
No such file or directory
This error comes when any files or directory are not found or accessible. We need to provide the absolute path to that module or file or directory.
Working directory might be changed sometime when you run the Python Script through Scrapy, try creating the breakpoint and check the present directory in terms of code perspective to find the root cause of the error.
Refer to this SO Thread for more detailed information on No Such File or directory: Scrapy Python issue and AppsLoveWorld article.

Related

subprocess FileNotFoundError

I'm running python code on Ubuntu with python3 test.py. However got following error:
File
"/opt/anaconda3/lib/python3.8/site-packages/geowombat/util/web.py",
line 1679, in list_gcp
proc = subprocess.run(gsutil_str.split(' '),
File "/opt/anaconda3/lib/python3.8/subprocess.py", line 489, in run
with Popen(*popenargs, **kwargs) as process:
File "/opt/anaconda3/lib/python3.8/subprocess.py", line 854, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/opt/anaconda3/lib/python3.8/subprocess.py", line 1702, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'gsutil'
It seems that subprocess imported in web.py cannot find gsutil to call.
Any idea how to solve this? I'm totally new to these, any hint helps! Thanks in advance!
More details:
test.py can be simplified as following:
from geowombat.util import GeoDownloads
path = xx
row = xx
gdl = GeoDownloads()
gdl.list_gcp('l5', f'{path:03d}/{row:03d}')
And I went to GeoDownloads.list_gcp() in web.py mentioned in errormsg, it's calling gsutil as subprocess.run(['gsutil', ...]). However, if I change test.py to following snippet it works fine:
import subprocess
subprocess.run(["gsutil"])
Additionally, I tried adding path of gsutil (/opt/anaconda3/bin/gsutil) to PATH env, the error will change to "NotADirectoryError: Not a directory: 'gsutil'".
Supplement:
/opt/anaconda3/bin already exists in PATH env. PATH looks like this: /opt/anaconda3/bin:/opt/anaconda3/condabin:...
Alright, thanks a lot to #edemaine and #tdelaney. I tried adding /opt/anaconda3/bin to PATH again and it seems to pass the problem and move forward. Not sure why, it seems that it can only resolve newly added PATH

Creating an .app file using Platypus failing with subprocess

import os
import subprocess
pathName = "FilePath"
os.chdir(r'Directory Path')
process = subprocess.Popen(["scrapy", "crawl", "homeDepotSpider", "-t" , "csv" , "-o", pathName])
Here's the error message I get:
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 394, in __init__ errread, errwrite) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1047, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory
I can't really find any documentation online, but is Platypus just not compatible with subprocess or is there something wrong with my code. When I compile it runs fine, when I create an .app file it doesnt work.
Edit: Here is the software that I used to make my program into an executable: https://sveinbjorn.org/platypus
The Exception says the reason: "No such file"
What subprocess.Popen line is trying to do is roughly the same as running scrapy in a cmd prompt. That is (in Windows terms), in the PATH of the subprocess you are trying to launch there is no folder containing 'scrapy.exe' .
I read that you used some packaging SW to bundle your script. The packaging program has not packaged the scrapy executable.

Unable to run shell script from the Pydev environment in Eclipse

I am using Centos 7.0 and have installed Eclipse Kepler in the Pydev environment. I want to run a simple c shell script through Python using subprocess as follows:
import subprocess
subprocess.call(["./test1.csh"])
This c shell script executes in the terminal and also if I run command like "ls" or ""pwd then I get the correct output e.g.
subprocess.call(["ls"]) # give me the list of all files
subprocess.call(["pwd"]) # gives me the location of current directory.
But when I run subprocess.call(["./test1.csh"]), I get the following error:
Traceback (most recent call last):
File "/home/nishant/workspace/codec_implement/src/NTTool/raw2waveconvert.py", line 8, in <module>
subprocess.call(["./test1.csh"])
File "/usr/lib64/python2.7/subprocess.py", line 524, in call
return Popen(*popenargs, **kwargs).wait()
File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
errread, errwrite)
File "/usr/lib64/python2.7/subprocess.py", line 1308, in _execute_child
raise child_exception
OSError: [Errno 13] Permission denied
Where am I going wrong? Please suggest
Make sure that the file test1.csh is executable. As Lukas Graf commented, also check the shebang (#!...) in the first line.
To confirm that, before run it through Python, run it in the shell.
$ ls -l test1.csh
...
$ ./test1.csh
The current working directory will be different from when you run it in the terminal. Specify the full path of the shell script. Or change the working directory configuration in the PyDev.
UPDATE
Prepend the shell executable:
import subprocess
subprocess.call(["csh", "./test1.csh"])

Is there a way to run Python's subprocess.check_output without the cwd to be the same directory as the exe being called

I am trying to automate some tasks - the process requires I call some exe's and pass parameters. The particular directories for the exe's are in the PATH variable for windows. However, I consistently get a
WindowsError: [Error 2] The system cannot find the file specified
My current workaround is to set the os.cwd to the directory with the exe but that imposes some other limits on how we distribute the code. I want to note that in every case if I start a cmd window and type the same code I am passing to subprocess.check_output the code works no matter what directory I am in on the computer.
Just to be clear I am afraid for example of trying to automate a WinRAR task and WinRAR.exe is in a different folder on their computer.
Okay in response to the comment below here is the input and the output after I changed the cwd to root (c:)
The call to subprocess
rarProcess = check_output('''WinRAR a -r -v700m -sfx -agYYYYMMDD-NN -iiconD:\\RarResources\\de96.ico -iimgd:\\RarResources\\accounting2013.bmp d:\\testFTP\\compressed_test_ d:\\files_to_compress''')
and here is the Traceback message in all of it's glory
Traceback (most recent call last):
File "<pyshell#93>", line 1, in <module>
rarProcess = check_output('''WinRAR a -r -v700m -sfx -agYYYYMMDD-NN -iiconD:\\RarResources\\de96.ico -iimgd:\\RarResources\\accounting2013.bmp d:\\testFTP\\compressed_test_ d:\\files_to_compress''')
File "C:\Program Files (x86)\python\lib\subprocess.py", line 537, in check_output
process = Popen(stdout=PIPE, *popenargs, **kwargs)
File "C:\Program Files (x86)\python\lib\subprocess.py", line 679, in __init__
errread, errwrite)
File "C:\Program Files (x86)\python\lib\subprocess.py", line 893, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified
Now I can't prove that this is not a hypothetical question/problem. I get the intended results when I use the same command (adjusting for path separators) through the cmd window and if I change the directory to the directory with the exe before running the command as pasted above.
You don't need to set os.cwd and run the process. Instead you pass the location of your "Winrar.exe" file to the subprocess as a dict.
proc = subprocess.Popen(args, env={'PATH': '/path/to/winrar.exe'})

Windows cannot find scrapy file in dir

So I got a simple runner script that will use Popen to call the spider.
The runner script is as follows:
from subprocess import Popen
import time
def runSpider():
p = Popen(["scrapy", "crawl", "spider1"],
cwd="C:\Users\Kasutaja\Desktop\scrapyTest")
stdout, stderr = p.communicate()
time.sleep(15)
runSpider()
The directory is like this:
-----scrapyTest:
--------------------scrapyTest[folder]: spider[folder], items.py, pipelines.py, settings.py
--------------------runner.py
--------------------scrapy.cfg
The spider runs perfectly from the dir: C:\Users\Kasutaja\Desktop\scrapyTest when I run it from the cmd line.
When I run my runner.py script I get:
The system cannot find the file specified
EDIT:
After changing the Popen to this:
p = Popen(["C:\Users\Kasutaja\Desktop\scrapyTest","scrapy", "crawl", "spider1"])
I get the error:
C:\Users\Kasutaja\Desktop\scrapyTest>python runner.py
Traceback (most recent call last):
File "runner.py", line 13, in <module>
runSpider()
File "runner.py", line 8, in runSpider
p = Popen(["C:\Users\Kasutaja\Desktop\scrapyTest","scrapy", "crawl", "spider
1"])
File "C:\Python27\lib\subprocess.py", line 679, in __init__
errread, errwrite)
File "C:\Python27\lib\subprocess.py", line 896, in _execute_child
startupinfo)
WindowsError: [Error 5] Access is denied
If it matters, I have admin rights.
I have also tried now, running the script with cmd specifically opened from start menu and with admin rights, but still get the same error.
From the docs
If cwd is not None, the child’s current directory will be changed to
cwd before it is executed. Note that this directory is not considered
when searching the executable, so you can’t specify the program’s path
relative to cwd. (Emphasis mine)
to me that means you have to do something like
Popen(["C:\Users\Kasutaja\Desktop\scrapyTest\scrapy", "crawl", "spider1"])
this will execute a program scrapy with arguments crawl and spider1

Categories

Resources