Fetching Doi metadata using python - python

I want to fetch Bibtex citation from doi in Python. So the way I achieved it is by using this function:
def BibtexFromDoi(doi):
url = "http://dx.doi.org/" + doi
headers = {"accept": "application/x-bibtex"}
r = requests.get(url, headers=headers)
return r.text
The problem with that, is that it takes so long to run this code, it takes from 10 to 15 minutes to get a response. I was wondering on what can be done to enhance the code and make it run faster In addition, I tried using Curl on the command line and it turns out to be faster with Curl, it only takes 1 to 2 seconds. I would like to achieve the same spead but on python.

Related

Python requests not utilizing network

I'm trying to download a medium-sized APK file ( 10-300 MB ) and save it locally. My connection speed should be about 90 mbps, yet the process rarely surpasses 1 mbps, and my network doesn't seem to be anywhere near cap.
I've verified the part that's getting stuck is indeed the SSL download with cProfile, and I've tried various advice on StackOverflow like reducing or increasing chunk size, to no avail. I'd love a way to either test if this could be a server-side issue, or advice on what am I doing wrong on the clientside.
Relevant code:
session = requests.Session() # I've heard session is better due to the persistent HTTP connection
session.trust_env = False
r = session.get(<url>, headers=REQUEST_HEADERS, stream=True, timeout=TIMEOUT) # timeout=60.
r.raise_for_status()
filename = 'myFileName'
i = 0
with open(filename, 'wb') as result:
for chunk in r.iter_content(chunk_size=1024*1024):
if chunk:
i += 1
if(i % 5 == 0):
print(f'At chunk {i} with timeout {TIMEOUT}')
result.write(chunk)
I was trying to download many different files; Upon printing the urls I'm trying to download and testing in chrome I saw some of the URLs were significantly slower than other ones to download.
It seems like a server issue, which I ended up solving by picking a good timeout to the requests.

Using xlwings, how do I print JSON data into multiple columns from multiple URLs?

I am using grequest to pull json data from multiple urls. Now, I want to print those results to excel using xlwings. Here is the code I have now.
import xlwings as xw
import grequests
import json
urls = [
'https://bittrex.com/api/v1.1/public/getorderbook?market=BTC-1ST&type=both&depth=50',
'https://bittrex.com/api/v1.1/public/getorderbook?market=BTC-AMP&type=both&depth=50',
'https://bittrex.com/api/v1.1/public/getorderbook?market=BTC-ARDR&type=both&depth=50',
]
requests = (grequests.get(u) for u in urls)
responses = grequests.map(requests)
for response in responses:
BQuantity = [response.json()['result']['buy'][0]['Quantity'],
response.json()['result']['buy'][1]['Quantity'],
response.json()['result']['buy'][2]['Quantity'],
response.json()['result']['buy'][3]['Quantity'],
response.json()['result']['buy'][4]['Quantity']
]
wb = xw.Book('Book2')
sht = wb.sheets['Sheet1']
sht.range('A1:C5').options(transpose=True).value = BQuantity
This works just fine, but only if I comment out all but one url, otherwise the results from the first url are overwritten by the results from the second url, an expected result. However, this is not what I want. In the end I want the results from the first URL to dump into column A, the results from the second URL dump into column B, etc...
I am able to pull in each individual link with 'requests' (instead of grequests) one by one, however, this operation is going to consist of a couple hundred urls in which simply pulling the data in one by one is very time consuming. Grequests pulls in 200 urls, and dumps to a JSON file in about 8 seconds compared to normal requests that took about 2 minutes.
Any help would be appreciated.

How to use Python requests to simultaneously download and upload a file

I'm trying to stream a download from an nginx server and simultaneously upload it. The download is using requests stream implementation; the upload is using chunking - the intention is to be able to report progress as the down/upload is occurring.
The overall code of what I've got so far is like so:
with closing(requests.get(vmdk_url, stream=True, timeout=60 + 1)) as vmdk_request:
chunk_in_bytes = 50 * 1024 * 1024
total_length = int(vmdk_request.headers['Content-Length'])
def vmdk_streamer():
sent_length = 0
for data in vmdk_request.iter_content(chunk_in_bytes):
sent_length += len(data)
progress_in_percent = (sent_length / (total_length * 1.0)) * 100
lease.HttpNfcLeaseProgress(int(progress_in_percent))
yield data
result = requests.post(
upload_url, data=vmdk_streamer(), verify=False,
headers={'Content-Type': 'application/x-vnd.vmware-streamVmdk'})
Which, in a certain set of contexts, works fine. I put it into another (a Cloudify plugin, if you're interested) and when it reaches around 60s it fails to read data.
So I'm looking for an alternative - or simply better - way of streaming a download/upload as my 60s issue might revolve around how I'm streaming (I hope). Preferably with requests but really I'd use anything up to and including raw urllib3.

how can I read a value from an XML-formatted web page

What I am trying to do is the following. There is this web page: http://xml.buienradar.nl .
From that, I want to extract a value every n minutes, preferably with Python. Let's say the windspeed at the Gilze-Rijen station. That is located on this page at:
<buienradarnl>.<weergegevens>.<actueel_weer>.<weerstations>.<weerstation id="6350">.<windsnelheidMS>4.80</windsnelheidMS>
Now, I can find loads of questions with answers that use Python to read a local XML file. But, I would rather not need to wget or curl this page every couple of minutes.
Obviously, I'm not very familiar with this.
There must be a very easy way to do this. The answer either escapes me or is drowned in all the answers that solve problems with a local file.
I would use urllib2 and BeautifulSoup.
from urllib2 import Request, urlopen
from bs4 import BeautifulSoup
req = Request("http://xml.buienradar.nl/")
response = urlopen(req)
output = response.read()
soup = BeautifulSoup(output)
print soup.prettify()
Then you can traverse the output like you were suggesting:
soup.buienradarnl.weergegevens (etc)

How to use Python to pipe a .htm file to a website

I have a file, gather.htm which is a valid HTML file with header/body and forms. If I double click the file on the Desktop, it properly opens in a web browser, auto-submits the form data (via <SCRIPT LANGUAGE="Javascript">document.forms[2].submit();</SCRIPT>) and the page refreshes with the requested data.
I want to be able to have Python make a requests.post(url) call using gather.htm. However, my research and my trail-and-error has provided no solution.
How is this accomplished?
I've tried things along these lines (based on examples found on the web). I suspect I'm missing something simple here!
myUrl = 'www.somewhere.com'
filename='/Users/John/Desktop/gather.htm'
f = open (filename)
r = requests.post(url=myUrl, data = {'title':'test_file'}, files = {'file':f})
print r.status_code
print r.text
And:
htmfile = 'file:///Users/John/Desktop/gather.htm'
files = {'file':open('gather.htm')}
webbrowser.open(url,new=2)
response = requests.post(url)
print response.text
Note that in the 2nd example above, the webbrowser.open() call works correctly but the requests.post does not.
It appears that everything I tried failed in the same way - the URL is opened and the page returns default data. It appears the website never receives the gather.htm file.
Since your request is returning 200 OK, there is nothing wrong getting your post request to the server. It's hard to give you an exact answer, but the problem lies with how the server is handling the request. Either your post request is being formatted in a way that the server doesn't recognise, or the server hasn't been set up to deal with them at all. If you're managing the website yourself, some additional details would help.
Just as a final check, try the following:
r = requests.post(url=myUrl, data={'title':'test_file', 'file':f})

Categories

Resources