I want to use the Forex-python module to convert amounts in various currencies to a specific currency ("DKK") according to a specific date (The last day of a previous month according to a date in the dataframe)
This is the structure of my code:
pd.DataFrame(data={'Date':['2017-4-15','2017-6-12','2017-2-25'],'Amount':[5,10,15],'Currency':['USD','SEK','EUR']})
def convert_rates(amount,currency,PstngDate):
PstngDate = datetime.strptime(PstngDate, '%Y-%m-%d')
if currency != 'DKK':
return c.convert(base_cur=currency,dest_cur='DKK',amount=amount \
,date_obj=PstngDate - timedelta(PstngDate.day))
else:
return amount
and the the new column with the converted amount:
df['Amount, DKK'] = np.vectorize(convert_rates)(
amount=df['Amount'],
currency=df['Currency'],
PstngDate=df['Date']
)
I get the RatesNotAvailableError "Currency Rates Source Not Ready"
Any idea what can cause this? It has previously worked with small amounts of data, but I have many rows in my real df...
I inserted a small print statement into convert.py (part of forex-python) to debug this.
print(response.status_code)
Currently I receive:
502
Read these threads about the HTTP 502 error:
In HTTP 502, what is meant by an invalid response?
https://www.lifewire.com/502-bad-gateway-error-explained-2622939
These errors are completely independent of your particular setup,
meaning that you could see one in any browser, on any operating
system, and on any device.
502 indicates that currently there is a problem with the infrastructure this API uses to provide us with the required data. As I am in need of the data myself I will continue to monitor this issue and keep my post on this site updated.
There is already an issue on Github regarding this issue:
https://github.com/MicroPyramid/forex-python/issues/100
From the source: https://github.com/MicroPyramid/forex-python/blob/80290a2b9150515e15139e1a069f74d220c6b67e/forex_python/converter.py#L73
Your error means the library received a non 200 response code to your request. This could mean the site is down, or it's blocked you momentarily because you're hammering it with requests.
Try replacing the call to c.convert with something like:
from time import sleep
def try_convert(amount, currency, PstngDate):
success = False
while success == False:
try:
res = c.convert(base_cur=currency,dest_cur='DKK',amount=amount \
,date_obj=PstngDate - timedelta(PstngDate.day))
except:
#wait a while
sleep(10)
return res
Or even better, use a library like backoff, to do the retrying for you:
https://pypi.python.org/pypi/backoff/1.3.1
Related
I am scraping some data and making a lot of requests from Reddit's pushshift API, along the way I keep encountering http errors, which halt all the progress, is there any way in which I can continue where I left off if an error occurs?
X = []
for i in ticklist:
f = urlopen("https://api.pushshift.io/reddit/search/submission/?q={tick}&subreddit=wallstreetbets&metadata=true&size=0&after=1610928000&before=1613088000".format(tick=i))
j = json.load(f)
subs = j['metadata']['total_results']
X.append(subs)
print('{tick} has been scraped!'.format(tick=i))
time.sleep(1)
I've so far mitigated the 429 error by waiting for a second in between requests - although I am experiencing connection time outs, I'm not sure how to efficiently proceed with this without wasting a lot of my time rerunning the code and hoping for the best.
Python sqlitedb approach: Refrence: https://www.tutorialspoint.com/sqlite/sqlite_python.htm
Create sqlitedb.
Create a table with urls to be scraped with schema like CREATE TABLE COMPANY (url NOT NULL UNIQUE, Status NOT NULL default "Not started")
Now read the rows only for which the status is "Not started".
you can change the status column of the URL to success once scraping is done.
So wherever the script starts it will only run run for the not started once.
My task is to get the number of open issues using github.api. Unfortunately, when I parsing any repositories, I get the same number - 30.
import requests
r = requests.get('https://api.github.com/repos/grpc/grpc/issues')
count = 0
for item in r.json():
if item['state'] == 'open':
count += 1
print(count)
Is there any way to get a real quantity of issues?
See the documentation about the Link response header, also you can pass the state or filters.
https://developer.github.com/v3/guides/traversing-with-pagination/
https://developer.github.com/v3/issues/
You'll have to page through.
http://.../issues?page=1&state=open
http://.../issues?page=2&state=open
The /issues/ endpoint is paginated: it means that you will have to iterate through several pages to get all the issues.
https://api.github.com/repos/grpc/grpc/issues?page=1
https://api.github.com/repos/grpc/grpc/issues?page=2
...
But there is a better way to get what you want. The GET /repos/:owner/:repo directly gives the number of open issues on a repository.
For instance, on https://api.github.com/repos/grpc/grpc, you can see:
"open_issues_count": 1052,
Click here to have a look at the documentation for this endpoint.
I am currently trying to make a call using mailchimp3:
client.reports.email_activity.all(campaign_id = '#######', getall=True, fields = '######')
When I call large amounts of email data I get the 401 status error. I am able to call smaller amounts with no error.
I tried increasing:
request.get(timeout=10000)
Pulling large amounts of data in one call can cause this timeout error. Instead it is best to use the offset method.
client.reports.email_activity.all(campaign_id = '#####', offset = #, count = #, fields = '######')
With this you can loop through and make a bunch of smaller calls instead of one large call that causes the timeout issue.
Thanks to the support staff at chimp mail that helped me trouble shoot this issue.
UPDATE: I've put together the following script to use the url for the XML without the time-code-like suffix as recommended in the answer below, and report the downlink powers which clearly fluctuate on the website. I'm getting three hour old, unvarying data.
So it looks like I need to properly construct that (time code? authorization? secret password?) in order to do this successfully. Like I say in the comment below, "I don't want to do anything that's not allowed and welcome - NASA has enough challenges already trying to talk to a forty year old spacecraft 20 billion kilometers away!"
def dictify(r,root=True):
"""from: https://stackoverflow.com/a/30923963/3904031"""
if root:
return {r.tag : dictify(r, False)}
d=copy(r.attrib)
if r.text:
d["_text"]=r.text
for x in r.findall("./*"):
if x.tag not in d:
d[x.tag]=[]
d[x.tag].append(dictify(x,False))
return d
import xml.etree.ElementTree as ET
from copy import copy
import urllib2
url = 'https://eyes.nasa.gov/dsn/data/dsn.xml'
contents = urllib2.urlopen(url).read()
root = ET.fromstring(contents)
DSNdict = dictify(root)
dishes = DSNdict['dsn']['dish']
dp_dict = dict()
for dish in dishes:
powers = [float(sig['power']) for sig in dish['downSignal'] if sig['power']]
dp_dict[dish['name']] = powers
print dp_dict['DSS26']
I'd like to keep track of which spacecraft that the NASA Deep Space Network (DSN) is communicating with, say once per minute.
I learned how to do something similar from Flight Radar 24 from the answer to my previous question, which also still represents my current skills in getting data from web sites.
With FR24 I had explanations in this blog as a great place to start. I have opened the page with the Developer Tools function in the Chrome browser, and I can see that data for items such as dishes, spacecraft and associated numerical data are requested as an XML with urls such as
https://eyes.nasa.gov/dsn/data/dsn.xml?r=293849023
so it looks like I need to construct the integer (time code? authorization? secret password?) after the r= once a minute.
My Question: Using python, how could I best find out what that integer represents, and how to generate it in order to correctly request data once per minute?
above: screen shot montage from NASA's DSN Now page https://eyes.nasa.gov/dsn/dsn.html see also this question
Using a random number (or a timestamp...) in a get parameter tricks the browser into really making the request (instead of using the browser cache).
This method is some kind of "hack" the webdevs use so that they are sure the request actually happens.
Since you aren't using a web browser, I'm pretty sure you could totally ignore this parameter, and still get the refreshed data.
--- Edit ---
Actually r seems to be required, and has to be updated.
#!/bin/bash
wget https://eyes.nasa.gov/dsn/data/dsn.xml?r=$(date +%s) -O a.xml -nv
while true; do
sleep 1
wget https://eyes.nasa.gov/dsn/data/dsn.xml?r=$(date +%s) -O b.xml -nv
diff a.xml b.xml
cp b.xml a.xml -f
done
You don't need to emulate a browser. Simply set r to anything and increment it. (Or use a timestamp)
Regarding your updated question, why avoid sending the r query string parameter when it is very easy to generate it? Also, with the requests module, it's easy to send the parameter with the request too:
import time
import requests
import xml.etree.ElementTree as ET
url = 'https://eyes.nasa.gov/dsn/data/dsn.xml'
r = int(time.time() / 5)
response = requests.get(url, params={'r': r})
root = ET.fromstring(response.content)
# etc....
I've got a list of ~100,000 links that I'd like to check the HTTP Response Code for. What might be the best method to use for doing this check programmatically?
I'm considering using the below Python code:
import requests
try:
for x in range(0, 100000):
r = requests.head(''.join(["http://stackoverflow.com/", str(x)]))
# They'll actually be read from a file, and aren't sequential
print r.status_code
except requests.ConnectionError:
print "failed to connect"
.. but am not aware of the potential side effects of checking such a large number of URLs in a single take. Thoughts?
The only side effect I can think of is time, which you can mitigate by making the requests in parallel. (use http://gevent.org/ or https://docs.python.org/2/library/thread.html).