Scrapyd: Writing CSV file to remote server - python

I'm trying to schedule a crawler on EC2 and have the output export to a csv file cppages-nov.csv, while creating a jobdir encase I need to pause the crawl, but it is not creating any files. Am I using the correct feed exports?
curl http://awsserver:6800/schedule.json -d project=wallspider -d spider=cppages -d JOBDIR=/home/ubuntu/scrapy/sitemapcrawl/crawls/cppages-nov -d FEED_URI=/home/ubuntu/scrapy/sitemapcrawl/cppages-nov.csv -d FEED_FORMAT=csv

curl http://amazonaws.com:6800/schedule.json -d project=wallspider -d spider=cppages -d setting=FEED_URI=/home/ubuntu/scrapy/sitemapcrawl/results/cppages.csv -d setting=FEED_FORMAT=csv -d setting=JOBDIR=/home/ubuntu/scrapy/sitemapcrawl/crawl/cppages-nov

use this feed in your settings file
FEED_EXPORTERS = {
'csv': 'scrapy.contrib.exporter.CsvItemExporter',
}
FEED_FORMAT = 'csv'

Related

Python: curl -d json will not work without apostrophe

I have to send the curl exact in that form to get a response in cli
curl -d '{"id":0,"params":["aa:aa:00:00:00:00",["mixer","volume","80"]],"method":"slim.request"}' http://localhost:9000/jsonrpc.js
curl -d '{"id":0,"params":["aa:aa:00:00:00:00",["playlist","play","/home/pi/mp3/File.mp3"]],"method":"slim.request"}' http://localhost:9000/jsonrpc.js
Running the command above in my script, I get a syntax error for the apostrophe
File "./updateTimers.py", line 126
strVolume = curl -d '{"id":0,"params":["aa:aa:00:00:00:00",["mixer","volume","80"]],"method":"slim.request"}' http://localhost:9000/jsonrpc.js
^
SyntaxError: invalid syntax
If I change apostrophes and quotation than my python script does not like the syntax also. Would be happy any advice. At least I need to run both commands one by one.
strVolume = curl -d '{"id":0,"params":["aa:aa:00:00:00:00",["mixer","volume","80"]],"method":"slim.request"}' http://localhost:9000/jsonrpc.js
strPlayMP3Command = curl -d '{"id":0,"params":["aa:aa:00:00:00:00",["playlist","play","/home/pi/adhan/mp3/Adhan-Makkah.mp3"]],"method":"slim.request"}' http://localhost:9000/jsonrpc.js
If I understand your question correctly, you want the curl -d ... to be a string in Python. To do this, you will also have to wrap curl in quotes: strVolume = "curl -d '{\"id\": 0, ...}' ...". Make sure you escape the quotes inside.

Telegram Bot how to set parsing mode in curl?

Im editing some script for telegram bot, and I only want to add parsing mode html, so it allows me to use bold,italic etc..
I cant seem to find way to adopt parse_mode: "HTML" to curl line
if [ -n "${TOKEN}" ];
then
echo "Sending telegram...";
#Telegram notification
curl -s -X POST https://api.telegram.org/bot${TOKEN}/sendMessage -d chat_id=${CHAT_ID} -d text="${1}" >> /dev/null
fi
parse_mode is just another parameter like text or chat_id. You can use -d!
curl -s -X POST https://api.telegram.org/bot${TOKEN}/sendMessage -d chat_id=${CHAT_ID} -d text="${1}" -d "parse_mode='HTML'" >> /dev/null
Documentation

Using a python string output in curl

I have this python script
users=['mark','john','steve']
text=''
for user in users:
text+=str(user + " ")
print(text)
I want to output that string "text" into a curl terminal command.
I tried:
curl -d "#python-scirpt.py" --insecure -i -X POST https://10.10.10.6/hooks/84kk9emcdigz8xta1bykiymn5e
and
curl --insecure -i -X POST -H 'Content-Type: application/json' -d '{"text": 'python /home/scripts/python-script.py'}' https://10.10.10.6/hooks/84kk9emcdigz8xta1bykiymn5e
or without the quotations in the text option
Everything returns this error
{"id":"Unable to parse incoming data","message":"Unable to parse incoming data","detailed_error":"","request_id":"fpnmfds8zifziyc85oe5eyf3pa","status_code":400}
How to approach this ? Any help is appreciated thank you.
another approach would be to curl inside python but would need help in that too.
Thank you
Use command substitution (i.e. $(...)) to make the shell run the python code first.
So
curl -d "$(python-scirpt.py)" --insecure -i -X POST https://10.10.10.6/hooks/84kk9emcdigz8xta1bykiymn5e

Convert the streamed json-unicode response of Docker's remote API into something more readable

When I hit an endpoint of the Docker remote API, for example with cUrl in Bash, I get a response streamed to the console which might look like
[...]
{"stream":"\u001b[91m.\u001b[0m"}
{"stream":"\u001b[91m.. .....\u001b[0m"}
{"stream":"\u001b[91m.\u001b[0m"}
{"stream":"\u001b[91m.... ...\u001b[0m"}
{"stream":"\u001b[91m.....\u001b[0m"}
{"stream":"\u001b[91m.. .... 14.2M=0.5s\u001b[0m"}
{"stream":"\u001b[91m\n\n\u001b[0m"}
{"stream":"\u001b[91m2015-08-06 09:41:20 (10.1 MB/s) - ‘workspace.zip’ saved [5063084]\n\n\u001b[0m"}
{"stream":" ---\u003e aa6d979beeec\n"}
{"stream":"Removing intermediate container fa73eeb4531d\n"}
{"stream":"Step 3 : WORKDIR ./workspace\n"}
{"stream":" ---\u003e Running in 1dc8301bfd34\n"}
{"stream":" ---\u003e 4bddbc0282c9\n"}
{"stream":"Removing intermediate container 1dc8301bfd34\n"}
{"stream":"Step 4 : EXPOSE 8080\n"}
{"stream":" ---\u003e Running in 187a95569e84\n"}
{"stream":" ---\u003e b26c7b990996\n"}
{"stream":"Removing intermediate container 187a95569e84\n"}
{"stream":"Step 5 : CMD /bin/bash some_script.sh\n"}
{"stream":" ---\u003e Running in a5027b1082c3\n"}
{"stream":" ---\u003e 276ee1506ea0\n"}
{"stream":"Removing intermediate container a5027b1082c3\n"}
{"stream":"Successfully built 276ee1506ea0\n"}
[...]
This is really annoying to read with all the escape and unicode characters. How can I print the cUrl response on the console in an easier readable form without escaping all the special characters?
This answer suggests to pipe the response to Python and use its json module, dumping it again in UTF-8. However, when using it as in the following example which is the remote API way to build a Docker image from a local Dockerfile:
tar -cvf - Dockerfile | \
curl --silent --show-error -X POST -H "Content-Type:application/tar" --data-binary #- \
"http://myDockerHost:4243/build?t=myRepo/myImage" | \
python -c 'import json, sys; sys.stdout.write(json.load(sys.stdin)[0].encode("utf-8"))'
then I get an error like
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib/python2.7/json/__init__.py", line 290, in load
**kw)
File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 10 column 1 (char 48 - 764)
Looking it up told me that this occurs because Python's json module can only read a single json string but not a streamed multiline response from cUrl.
What else could be done to solve this?
Your input is multiple json documents -- one per line. Feed each line to json.loads() separately:
>>> print(json.loads(r'{"stream":"\u001b[91m.\u001b[0m"}')['stream'])
.
It is displayed as a red dot on my screen (due to ANSI escape sequences):
>>> json.loads(r'{"stream":"\u001b[91m.\u001b[0m"}')['stream']
u'\x1b[91m.\x1b[0m'
You could use jq, to work with json on the command line:
$ echo '{"stream":"\u001b[91m.\u001b[0m"}' | jq -r .stream
.
Unrelated: Don't encode to utf-8, print Unicode directly instead. Don't hardcode the encoding of your environment inside your script. If you want to change the output encoding, set PYTHONIOENCODING envvar instead.
Solution, based on J.F. Sebastian's answer
Both suggested approaches work well. After installing jq, it can be used like this:
tar -C ./ -cvf - Dockerfile | curl --silent --show-error -X POST -H "Content-Type:application/tar" --data-binary #- "$DOCKERHOST/build?t=repo/imageName" | jq -r .stream
Using Python instead it, handling each line of the stream separately, it looks like this:
tar -C ./ -cvf - Dockerfile | curl --silent --show-error -X POST -H "Content-Type:application/tar" --data-binary #- "$DOCKERHOST/build?t=repo/imageName" | python -c 'import json, sys; [sys.stdout.write(json.loads(line)["stream"]) for line in sys.stdin]'
Both solutions give the desired "readable" output on stdout, e.g.
Dockerfile
Step 0 : FROM ubuntu
---> 91e54dfb1179
Step 1 : RUN apt-get update -y
---> Using cache
---> 211dc37ab584
Step 2 : RUN apt-get install -y default-jre
---> Using cache
---> 0045a653edb9
Successfully built 0045a653edb9

gunicorn - not taking config file if executed from shell script

This is the code for my shell script :
#!/bin/bash
source /path/to/active
gunicorn_django -c /path/to/conf.py -D
The above sh file when executed, starts gunicorn process but it is not using the config file .
But, if i execute the command directly from the command line, like
gunicorn_django -c path/to/conf.py -D
then it is using the config file .
Also, in the sh file, if i give options directly like -w 3 -error-logfile etc.. then it is taking options .
Use this script, worked for me :
#!/bin/bash
source /path/to/active
gunicorn_django -c $(pwd)/path/to/conffilefrom/presentworkingdirectory -D

Categories

Resources