I have generated a list of Soundcloud track id's with the following python code:
import soundcloud
import urllib
client = soundcloud.Client(client_id='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
client_secret='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
username='XXXXXXXXXXXXXXXXXXXXXXXXXX',
password='XXXXXXXXXXXXXXXXXX')
f=open('soundcloud-track-ids', 'w+')
count = 0
while count < 6000:
tracks = client.get('/me/tracks', limit=200, offset=count)
for track in tracks:
print >>f, track.id, "\t", track.title .encode('utf-8')
count += 200
f.close()
I have then run a bash script to backup the entire archive to contents to a hard drive:
#!/bin/bash
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
while read line; do
if [ ! -f /mnt/drobo_1/Soundcloud/$(echo $line | cut -f 2- | sed 's,/,\ ,g').mp3 ]; then
wget https://api.soundcloud.com/tracks/"$(echo $line | awk '{print $1}')"/download?oauth_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \
-O /mnt/drobo_1/Soundcloud/"$(echo $line | cut -f 2- | sed 's,/,\ ,g').mp3"
fi
done < ./soundcloud-track-ids
IFS=$SAVEIFS
Nearly all of the 5317 tracks are private, and most have downloaded without a problem, however about 600 tracks have failed to download with the following error:
--2015-01-05 12:46:09-- https://api.soundcloud.com/tracks/152288957/download?oauth_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Resolving api.soundcloud.com (api.soundcloud.com)... 93.184.220.127
Connecting to api.soundcloud.com (api.soundcloud.com)|93.184.220.127|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2015-01-05 12:46:10 ERROR 404: Not Found.
Does anyone know what the error could be?
That 404 error is saying the file couldn't be found at SoundCloud's end. It could be SoundCloud's rate limiter doing this, preventing you from hammering it so much.
See https://developers.soundcloud.com/docs/api/terms-of-use#quotas
If you try those failed downloads later, do they work?
Related
I have a shell script downloaded from EarthData Search as follows:
download.sh
#!/bin/bash
GREP_OPTIONS=''
cookiejar=$(mktemp cookies.XXXXXXXXXX)
netrc=$(mktemp netrc.XXXXXXXXXX)
chmod 0600 "$cookiejar" "$netrc"
function finish {
rm -rf "$cookiejar" "$netrc"
}
trap finish EXIT
WGETRC="$wgetrc"
prompt_credentials() {
echo "Enter your Earthdata Login or other provider supplied credentials"
read -p "Username (tylersingleton): " username
username=${username:-tylersingleton}
read -s -p "Password: " password
echo "machine urs.earthdata.nasa.gov login $username password $password" >> $netrc
echo
}
exit_with_error() {
echo
echo "Unable to Retrieve Data"
echo
echo $1
echo
echo "https://n5eil01u.ecs.nsidc.org/DP4/SMAP/SPL3SMP.008/2019.03.09/SMAP_L3_SM_P_20190309_R18290_001.h5"
echo
exit 1
}
prompt_credentials
detect_app_approval() {
approved=`curl -s -b "$cookiejar" -c "$cookiejar" -L --max-redirs 5 --netrc-file "$netrc" https://n5eil01u.ecs.nsidc.org/DP4/SMAP/SPL3SMP.008/2019.03.09/SMAP_L3_SM_P_20190309_R18290_001.h5 -w %{http_code} | tail -1`
if [ "$approved" -ne "302" ]; then
# User didn't approve the app. Direct users to approve the app in URS
exit_with_error "Please ensure that you have authorized the remote application by visiting the link below "
fi
}
setup_auth_curl() {
# Firstly, check if it require URS authentication
status=$(curl -s -z "$(date)" -w %{http_code} https://n5eil01u.ecs.nsidc.org/DP4/SMAP/SPL3SMP.008/2019.03.09/SMAP_L3_SM_P_20190309_R18290_001.h5 | tail -1)
if [[ "$status" -ne "200" && "$status" -ne "304" ]]; then
# URS authentication is required. Now further check if the application/remote service is approved.
detect_app_approval
fi
}
setup_auth_wget() {
# The safest way to auth via curl is netrc. Note: there's no checking or feedback
# if login is unsuccessful
touch ~/.netrc
chmod 0600 ~/.netrc
credentials=$(grep 'machine urs.earthdata.nasa.gov' ~/.netrc)
if [ -z "$credentials" ]; then
cat "$netrc" >> ~/.netrc
fi
}
fetch_urls() {
if command -v curl >/dev/null 2>&1; then
setup_auth_curl
while read -r line; do
# Get everything after the last '/'
filename="${line##*/}"
# Strip everything after '?'
stripped_query_params="${filename%%\?*}"
curl -f -b "$cookiejar" -c "$cookiejar" -L --netrc-file "$netrc" -g -o $stripped_query_params -- $line && echo || exit_with_error "Command failed with error. Please retrieve the data manually."
done;
elif command -v wget >/dev/null 2>&1; then
# We can't use wget to poke provider server to get info whether or not URS was integrated without download at least one of the files.
echo
echo "WARNING: Can't find curl, use wget instead."
echo "WARNING: Script may not correctly identify Earthdata Login integrations."
echo
setup_auth_wget
while read -r line; do
# Get everything after the last '/'
filename="${line##*/}"
# Strip everything after '?'
stripped_query_params="${filename%%\?*}"
wget --load-cookies "$cookiejar" --save-cookies "$cookiejar" --output-document $stripped_query_params --keep-session-cookies -- $line && echo || exit_with_error "Command failed with error. Please retrieve the data manually."
done;
else
exit_with_error "Error: Could not find a command-line downloader. Please install curl or wget"
fi
}
fetch_urls <<'EDSCEOF'
https://n5eil01u.ecs.nsidc.org/DP4/SMAP/SPL3SMP.008/2019.03.09/SMAP_L3_SM_P_20190309_R18290_001.h5
...
https://n5eil01u.ecs.nsidc.org/DP4/SMAP/SPL3SMP.008/2019.03.08/SMAP_L3_SM_P_20190308_R18290_001.h5
EDSCEOF
At the bottom, a list of URLs is redirected as a heredoc into the the fetch_urls function. I have been attempting to remove this portion and have the URLs housed in a text file that I can pass as an argument to downloads.sh in python as
import subprocess
subprocess.run(['download.sh, URLs.txt'], shell=True)
I have tried editing my bash script to have fetch_urls accept a variable as an input.
fetch_urls $1
and
URLs=$(cat URLs.txt)
fetch_urls <<'EDSCEOF'
$URLs
EDSCEOF
and
while read -r url; do fetch_urls <<< echo "$url"; done < URLs.txt
and
fetch_urls <<'EDSCEOF'
while read -r url; do echo "$url"; done < URLs.txt
EDSCEOF
But I know nothing about bash, and cannot figure out how this should be done. Additionally, I would like to set the output of the downloaded files to be redirected to its own file. i.e. I am attempting to have a file structure like this:
.
|--- main.py
|--- data_folder
|--- download.sh
|--- URLs_1.txt
|--- URLs_2.txt
|--- folder_1
|--- URLs_1_Data
|--- folder_2
|--- URLs_2_Data
So any direction as to where in docs to search would be helpful for this. In python's subprocess, I can change the CWD, but this will cause my data to be downloaded in the same file as the bash script. I would rather avoid this, and simply be able to pass two variables to the bash script. 1) Location of the URL txt file to use; 2) Where to save the downloaded data.
I could not figure out how to accomplish this with bash. The answers provided only partly worked for my uses. If the file only contained one url, then it would work; however, any more would freeze the program. I have found a solution using python, which my project is based in.
import requests # get the requests library from https://github.com/requests/requests
# overriding requests.Session.rebuild_auth to maintain headers when redirected
class SessionWithHeaderRedirection(requests.Session):
AUTH_HOST = 'urs.earthdata.nasa.gov'
def __init__(self, username, password):
super().__init__()
self.auth = (username, password)
# Overrides from the library to keep headers when redirected to or from
# the NASA auth host.
def rebuild_auth(self, prepared_request, response):
headers = prepared_request.headers
url = prepared_request.url
if 'Authorization' in headers:
original_parsed = requests.utils.urlparse(response.request.url)
redirect_parsed = requests.utils.urlparse(url)
if (original_parsed.hostname != redirect_parsed.hostname) and \
redirect_parsed.hostname != self.AUTH_HOST and \
original_parsed.hostname != self.AUTH_HOST:
del headers['Authorization']
return
# create session with the user credentials that will be used to authenticate access to the data
username = "**********"
password = "**********"
session = SessionWithHeaderRedirection(username, password)
# the url of the file we wish to retrieve
# and remove newline character from end of urls
with open('test.txt') as url_file:
urls = [url.strip('\n') for url in list(url_file)]
for url in urls:
# extract the filename from the url to be used when saving the file
filename = url[url.rfind('/') + 1:]
try:
# submit the request using the session
response = session.get(url, stream=True)
print(response.status_code)
# raise an exception in case of http errors
response.raise_for_status()
# save the file
with open(filename, 'wb') as fd:
for chunk in response.iter_content(chunk_size=1024 * 1024):
fd.write(chunk)
except requests.exceptions.HTTPError as e:
# handle any errors here
print(e)
This code was generate from https://urs.earthdata.nasa.gov/documentation/for_users/data_access/python
I have only slightly modified it to work with my project.
I am using tornado framework and trying implementing static file download functionality (I know we can use like below by specifying static_url_prefix so It will create handler and whenever we will use curl -0 -o nzcli -k https://127.0.0.1:8443/v2/download/testdata then It'll download testdata from where I am running script).
self.application = tornado.web.Application( self.url_all,
static_url_prefix=f"/{version}/download/",
static_path=str(static_path), )
But above implementation does not required username and password so I have decided to create separate handler for this like below.
#class_logger
class Static_file(AuthHandler):
URL_PATTERN = r"/(v\d+)/download_handler/([A-Za-z0-9_.]+)"
def do_get(self,ver,file_passed):
try:
print("^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^")
static_path= pathlib.Path(pkg_resources.resource_filename("folder.app.data", "web"))
files = [ f for f in os.listdir(static_path) if os.path.isfile(os.path.join(static_path,f)) ]
print("All files in dir2 : ", files)
if file_passed in files:
pass //return file(It can be of any extension) to enter user(show progress in terminal)
else:
pass
status="File Downloading Completed."
return HTTPResponse(200, 'OK',status)
except Exception as exp:
status_code = 500
return HTTPResponse(status_code, str(exp),500)
Now My requirement is that whenever user send GET request that is
`curl -k -X GET https://127.0.0.1:PORT/v2/download_handler/testdata -H "Authorization:`cat /ips/abc/token`" -H "Content-Type: application/json"
with filename so that filename I will check if file_passed in files: but how to download file and show progress to the user from where he is running script.
any suggestions/solution would be highly appreciated. Thanks.
Does StackOverflow really autodeletes Hey guys from beginning of text? :D Hello, i have a problem i cant seem to wrap my mind around.
from influxdb_client import InfluxDBClient
from influxdb_client.client.write_api import SYNCHRONOUS
# You can generate a Token from the "Tokens Tab" in the UI
org = "myorg"
bucket = "mybucket"
token = 'valid_token'
client = InfluxDBClient(url="http://localhost:8086", token=token)
write_api = client.write_api(write_options=SYNCHRONOUS)
d='airSensors,sensor_id=TLM0201 temperature=70.97038159354763,humidity=55.23103248356096,co=0.78445310567793615 1637124357000000000'
write_api.write(bucket, org, d)
This runs and returns no error, i tried making a mistake in eg. bucket and it raises, bad token raises unauthorized, etc..
BUT there is no data in database when i check. BUT when i run this exact line through curl:
curl --request POST \
"http://localhost:8086/api/v2/write?org=myorg&bucket=mybucket&precision=ns" \
--header "Authorization: Token valid_token" \
--header "Content-Type: text/plain; charset=utf-8" \
--header "Accept: application/json" \
--data-binary '
airSensors,sensor_id=TLM0201 temperature=73.97038159354763,humidity=35.23103248356096,co=0.48445310567793615 1637024357000000000
airSensors,sensor_id=TLM0202 temperature=75.30007505999716,humidity=35.651929918691714,co=0.5141876544505826 1637024357000000000
'
This runs also with no errors but this time it actually writes into db.
Am i crazy or what? I tried everything, writing through Points, series,... u name it but it refuses to commit or smthn? Anyone had similar problem?
I run influxdb-client=1.23.0 on python=3.8.10 and Influxdb=2.0.7
Thanks for ur time. Q.
I guess you should use write_api.close() in the end of your write or use with:
with client.write_api() as write_api:
write_api.write(bucket, org, d)
https://github.com/influxdata/influxdb-client-python#writes
I am using a python flask swagger server for my api. I wish to create an API that can return some json data OR a video file.
My function definition is:
from flask import send_file
def my_function():
some_case = True #This could be False
try:
some_json = {
"Data1": "Blah",
"Data2": "0Blah"
}
if some_case:
return some_json, 200
else:
return send_file('/home/video_file.mp4')
except Exception as e:
return str(e)
I am using curl to call my API like so:
curl -X GET "http://0.0.0.0:8888/MyFunction"
{
"Data1": "Blah",
"Data2": "0Blah"
}
But when I call it when some_case=False and hence the video file will download I get:
curl -X GET "http://0.0.0.0:8888/MyFunction"
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
And I then have to specify an output file like so in order to actually download it:
curl -X GET "http://0.0.0.0:8888/MyFunction" --output my_video.mp4
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 6575k 100 6575k 0 0 642M 0 --:--:-- --:--:-- --:--:-- 642M
How can I call this API using curl without knowing what the return is going to be? (json or video file)
I have already gone through few StackOverflow existing links for this query, did not help me.
I would like to run few curl command(4) and each curl commands give output. From that output, I would like to parse the few group ids for next command.
curl --basic -u admin:admin -d \'{ "name" : "test-dev" }\' --header \'Content-Type: application/json\' http://localhost:8080/mmc/api/serverGroups
I have tried with as ,
#!/usr/bin/python
import subprocess
bash_com = 'curl --basic -u admin:admin -d '{ "name" : "test-dev" }' --header 'Content-Type: application/json' http://localhost:8080/mmc/api/serverGroups'
subprocess.Popen(bash_com)
output = subprocess.check_output(['bash','-c', bash_com]) # subprocess has check_output method
It gives me the syntax error, though I have changed from a single quote to double quote for that curl command.
I have been trying with Pycurl but i have to look more into that. Is there any way we can run curl commands in python and can parse the output values and pass it to next curl command.
You can use os.popen with
fh = os.popen(bash_com, 'r')
data = fh.read()
fh.close()
Or you can use subprocess like this
cmds = ['ls', '-l', ]
try:
output = subprocess.check_output(cmds, stderr=subprocess.STDOUT)
retcode = 0
except subprocess.CalledProcessError, e:
retcode = e.returncode
output = e.output
print output
There you have to organize your command and params in a list.
Or you just go the easy way and use requests.get(...).
And do not forget: Using popen you can get shell injections via parameters of your command!
Better output using os.open(bash_com,'r') and then fh.read()
python api.py
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
199 172 0 172 0 27 3948 619 --:--:-- --:--:-- --:--:-- 4027
{"href":"http://localhost:8080/mmc/api/serverGroups/39a28908-3fae-4903-adb5-06a3b7bb06d8","serverCount":0,"name":"test-dev","id":"39a28908-3fae-4903-adb5-06a3b7bb06d8"}
trying understand that fh.read() has executed the curl command? please correct me
I am trying to redirect the curl commands output to text file and then parse the file via JSON. All I am trying to get "id" from about output.
fh = os.popen(bash_com,'r')
data = fh.read()
newf = open("/var/tmp/t1.txt",'w')
sys.stdout = newf
print data
with open("/var/tmp/t1.txt") as json_data:
j = json.load(json_data)
print j['id']
I have checked the files content in JSONlint.com and got VALID JSON on it. It is throwing "ValueError: No JSON object could be decoded" at json.load line. Is there anything need to perform before parsing the redirected file.