Unable to GET entire page with Python request - python

I'm trying to get a long JSON response (~75 Mbytes) from a webpage, However I can only receive the first 25 Mbytes or so.
I've used urllib2 and python-requests but neither work. I've tried reading parts in separately and streaming the data, but this doesn't work either.
An example of the data can be found here:
http://waterservices.usgs.gov/nwis/iv/?site=14377100&format=json&parameterCd=00060&period=P260W
My code is as follows:
r = requests.get("http://waterservices.usgs.gov/nwis/iv/?site=14377100&format=json&parameterCd=00060&period=P260W")
usgs_data = r.json() # script breaks here
# Save Longitude and Latitude of river
latitude = usgs_data["value"]["timeSeries"][0]["sourceInfo"]["geoLocation"]["geogLocation"]["latitude"]
longitude = usgs_data["value"]["timeSeries"][0]["sourceInfo"]["geoLocation"]["geogLocation"]["longitude"]
# dictionary of all past river flows in cubic feet per second
river_history = usgs_data['value']['timeSeries'][0]['values'][0]['value']
It breaks with:
ValueError: Expecting object: line 1 column 13466329 (char 13466328)
When the script tries to decode the JSON (i.e. usgs_data = r.json()).
This is because the full data hasn't been received and is therefore not a valid JSON object.

The problem seems to be that the server won't serve more than 13MB of data at a time.
I have tried that URL using a number of HTTP clients including curl and wget, and all of them bomb out at about 13MB. I have also tried enabling gzip compression (as should you), but the results were still truncated at 13MB after decompression.
You are requesting too much data because the period=P260W specifies 260 weeks. If you try setting period=P52W you should find that you are able to retrieve a valid JSON response.
To reduce the amount of data transferred, set the Accept-Encoding header like this:
url = 'http://waterservices.usgs.gov/nwis/iv/'
params = {'site': 11527000, 'format': 'json', 'parameterCd': '00060', 'period': 'P52W'}
r = requests.get(url, headers={'Accept-Encoding': 'gzip,deflate'})

Related

Data sent with the requests.delete method is disregarded

While trying to connect to an API from Alpaca (a broker that accepts automated orders), I found that I was unable to send data with the requests.delete method. Here is some code:
def close_position(symbol, percentage, api_key=api_key, secret_key=secret_key, base_url=base_url):
data = {"percentage": percentage}
headers = {
"APCA-API-KEY-ID": api_key,
"APCA-API-SECRET-KEY": secret_key
}
url = f"{base_url}/v2/positions/{symbol}"
order = requests.delete(url, json=data, headers=headers)
return order.json()
This code is supposed to close (i.e., liquidate) a specified percentage of a position. However, it seems that the data sent using the delete method is disregarded; this function always closes my entire position instead of the specified percentage. Here is the documentation for the Alpaca API: https://alpaca.markets/docs/api-references/trading-api/positions/#close-a-position
I have also tried the data and params parameter and json.dumps(data), but to no avail.
Any idea how to send data with the requests.delete method would be appreciated.

how can I get candles for an asset using the alapca crypto api?

IN the documentation, the alpaca crypto market api says that the base url, for the crypto trading api, is:
https://data.alpaca.markets/v1beta1/crypto
I want to get bars for BTCUSD. My code is:
import requests
headers = {
'APCA-API-KEY-ID':alpaca_key,
'APCA-API-SECRET-KEY':alpaca_secret
}
url_crypto = 'https://data.alpaca.markets/v1beta1/crypto'
resp = requests.get(url_crypto + '/BTCUSD/bars', headers=headers)
This gets an "endpoint not found" error message. So I tired to get latest prices with this:
resp = requests.get(crypto_url + '/BTCUSD/latest&exchange=CBSE', headers=headers)
Again "endpoint not found".
When I request my account info, that works.
How can I get candles and latest prices for a crypto asset. And how can I get a list of all assets available?
Many thanks!
Your first code sample doesn't get an "endpoint not found" error. If you add
print(resp.json())
it will print
{'code': 42210000, 'message': 'timeframe missing'}
which is exactly the problem: you don't say what kind of bar you'd like to get. The fix is this:
resp = requests.get(url_crypto + "/BTCUSD/bars?timeframe=1Min", headers=headers)
assuming you want minute bars.
However, this will return the minute bars for all exchanges. If you only want CBSE, use this instead:
resp = requests.get(url_crypto + "/BTCUSD/bars?timeframe=1Min&exchanges=CBSE", headers=headers)
Your second code sample does return "endpoint not found", because of two issues:
There is no "latest" endpoint
The query parameter should start with a ?, not a &
You need to specify what kind of latest do you want. You can get latest trades, quotes or bars. If you need the latest price, the correct query is:
resp = requests.get(url_crypto + "/BTCUSD/trades/latest?exchange=CBSE", headers=headers)
and the price will be the p field of the result JSON.
However, instead of manually constructing these requests, I strongly recommend using our SDK instead. With the SDK, it's much easier to get what you want:
import alpaca_trade_api
api = alpaca_trade_api.REST(key_id=alpaca_key, secret_key=alpaca_secret)
# Today's bars
bars = api.get_crypto_bars("BTCUSD", "1Min")
# Latest price
latest_price = api.get_latest_crypto_trade("BTCUSD", "CBSE").price
One huge advantage of the SDK is that it takes care of the pagination for you. If you request a lot of things, they won't fit one response, and you need to send more requests using the returned page_token. The SDK does this automatically.

API gives only the headers in Python but not the data

I am trying to access an API from this website. (https://www.eia.gov/opendata/qb.php?category=717234)
I am able to call the API but I am getting only headers. Not sure if I am doing correctly or any additions are needed.
Code:
import urllib
import requests
import urllib.request
locu_api = 'WebAPI'
def locu_search(query):
api_key = locu_api
url = 'https://api.eia.gov/category?api_key=' + api_key
locality = query.replace(' ', '%20')
response = urllib.request.urlopen(url).read()
json_obj = str(response, 'utf-8')
data = json.loads(json_obj)
When I try to print the results to see whats there in data:
data
I am getting only the headers in JSON output. Can any one help me figure out how to do extract the data instead of headers.
Avi!
Look, the data you posted seems to be an application/json response. I tried to reorganize your snippet a little bit so you could reuse it for other purposes later.
import requests
API_KEY = "insert_it_here"
def get_categories_data(api_key, category_id):
"""
Makes a request to gov API and returns its JSON response
as a python dict.
"""
host = "https://api.eia.gov/"
endpoint = "category"
url = f"{host}/{endpoint}"
qry_string_params = {"api_key": api_key, "category_id": category_id}
response = requests.post(url, params=qry_string_params)
return response.json()
print(get_categories_data(api_key=API_KEY, category_id="717234"))
As far as I can tell, the response contains some categories and their names. If that's not what you were expecting, maybe there's another endpoint that you should look for. I'm sure this snippet can help you if that's the case.
Side note: isn't your API key supposed to be private? Not sure if you should share that.
Update:
Thanks to Brad Solomon, I've changed the snippet to pass query string arguments to the requests.post function by using the params parameter which will take care of the URL encoding, if necessary.
You haven't presented all of the data. But what I see here is first a dict that associates category_id (a number) with a variable name. For example category_id 717252 is associated with variable name 'Import quantity'. Next I see a dict that associates category_id with a description, but you haven't presented the whole of that dict so 717252 does not appear. And after that I would expect to see a third dict, here entirely missing, associating a category_id with a value, something like {'category_id': 717252, 'value': 123.456}.
I think you are just unaccustomed to the way some APIs aggressively decompose their data into key/value pairs. Look more closely at the data. Can't help any further without being able to see the data for myself.

What is the format for adding a compliance standard to an existing policy with the Prisma Cloud API?

I'm having trouble adding a Compliance Standard to an existing Policy via the Pal Alto Prisma Cloud API.
Everytime I send the request, I'm returned with a 500 Server Error (and, unfortunately, the API documentation is super unhelpful with this). I'm not sure if I'm sending the right information to add a compliance standard as the API documentation doesn't show what info needs to be sent. If I leave out required fields (name, policyType, and severity), I'm returned a 400 error (bad request, which makes sense). But I can't figure out why I keep getting the 500 Server Error.
In essence, my code looks like:
import requests
url = https://api2.redlock.io/policy/{policy_id}
header = {'Content-Type': 'application/json', 'x-redlock-auth': 'token'}
payload = {
'name': 'policy_name',
'policyType': 'policy_type',
'severity': 'policy_severity',
'complianceMetadata': [
{
'standardName': 'standard_name',
'requirementId': 'requirement_ID',
'sectionId': 'section_id'
}
]
}
response = requests.request('PUT', url, json=payload, header=header)
The response should be a 200 with the policy's metadata returned in JSON format with the new compliance standard.
For those using the RedLock API, I managed to figure it out.
Though non-descriptive, 500 errors generally mean the JSON being sent to the server is incorrect. In this case, the payload was incorrect.
The correct JSON for updating a policy's compliance standard is:
req_header = {'Content-Type':'application/json','x-redlock-auth':jwt_token}
# This is a small function to get a policy by ID
policy = get_redlock_policy_by_ID(req_header, 'policy_ID')
new_standard = {
"standardName":"std-name",
"requirementId":"1.1",
"sectionId":"1.1.1",
"customAssigned":true,
"complianceId":"comp-id",
"requirementName":"req-name"
}
policy['complianceMetadata'].append(new_standard)
requests.put('{}/policy/{}'.format(REDLOCK_API_URL, policy['policyId']), json=policy, headers=req_header)

Reading website data automatically with POST requests using Python

I am trying to automatically read data from a website where first I need to fill in some fields, submit the form and then read the data that appears. I am new to this but I wrote a code which obviously doesn't work and the result is HTTP Error 500. What am I missing here? or How do I fix this?
Also, I am happy to do this using BS4 as well because I will need to build upon this code.
Website: http://www.mlindex.ml.com/GISPublic/bin/SnapShot.asp
Inputs required: Index Ticker = H0A0 , Base Curr = LOC , Date = 09/22/2017
I checked the source code and went through the js form that submits the POST request and created the code and payload accordingly:
import requests
post_data = {'hdnDate':'1/1/2016', 'hdnAction':'SS', 'hdnSelCurr':'0,LOC', 'hdnCurrDesc':'USD', 'hdnSelTitle':'Hedged', 'txtSSCUSIP':'H0A0'}
# POST some form-encoded data:
post_response = requests.post(url='http://www.mlindex.ml.com/GISPublic/bin/Snapshot.asp', data=post_data)
print post_response
You are missing 'cboSnapCurr': 0, 'cboSSHedge' : 1 from the payload data, as the server that handles the request is expecting those values.
post_data = {'hdnDate':'1/1/2016', 'hdnAction':'SS', 'hdnSelCurr':'0,LOC', 'hdnCurrDesc':'USD', 'hdnSelTitle':'Hedged', 'txtSSCUSIP':'H0A0', 'cboSnapCurr': 0, 'cboSSHedge' : 1}

Categories

Resources