I have this small piece of code in node.js, which makes an API request and I need to convert this into python requests.get()....
import got from 'got'
got.get(`https://example.com/api`, {
json: true,
query: {
4,
data: `12345`
},
})
So my python code would start like this:
import requests
requests.get('https://example.com/api')
But how can I add the parameters
json: true,
query: {
4,
data: `12345`
},
in the python request ?
I would highly recommend looking at available docs when trying to solve problems like this, you'll get the answer a lot quicker generally and learn a lot more - I have linked the docs within this answer to make it easier to explore them for future use. I have never used the nodejs got library, but looked at the docs to identify what each of the parameters mean, the npm page has good documentation for this:
json - Sets content type header to "application/json", sets
accept header to "application/json", and will automatically run
JSON.parse(response). I am not aware of your familiarity with http
headers, but more information can be looked up on MDN and a list
of headers can be found on the wikipedia article for header
fields.
query - This sets the query string for the request.
I assume you are familiar with this, but more information can be
checked on the wikipedia for query string.
So, from the above it looks lke you are trying to send the following request to the server:
URL (with query string): https://example.com/api?4&data=12345
Headers
type: json/appication
accept: json/application
I would recommend reading through the python requests library user guide to get a better understanding of how to use the library.
For setting custom headers, the optional "headers" parameter can be used.
For query string, the optional "params" parameter allows for this. The only problem with params is the lack of support for a valueless key (the 4 in your example), to get around this encoding the query string in the url directly may be the best approach until the requests library supports this feature. Not sure when support will be available, but I did find a closed issue on GitHub mentioning potential support in a later version.
Related
I am kind of stucked in trying to solve following issue: I try to access a web-page in order to get some data for a supplier (need to do it for work) in an automated way, using an api
The API is called https://wl-api.mf.gov.pl and shall provide information stored in json for supplier which can be found over their tax ID.
I use the request package and I am able to manage to get positive response:
import requests
nip=7393033097
response=requests.get("https://wl-api.mf.gov.pl")
print(response) --> Response [200]
If I click on the link and scroll until I find the specific part for the tax information, I find the following line
GET /api/search/nip/{nip}
So what I did is to add this line into my response variable, since this is how I understood it - and there is the point where I think I am wrong
response=requests.get("https://wl-api.mf.gov.pl/search/7393033097/{7393033097}")
However, I cannot access it.
Am I doing something wrong - I do believe yes - and can anyone give me a little help :)
Update: If I check the requirements / documentation I find following information where I need a bit support to implement it
GET /api/search/nip/{nip}
(nip?date)
Single entity search by nip
**Path parameters**
nip (required)
*Path Parameter — Nip*
**Query parameters**
date (required)
*Query Parameter — format: date*
**Return type**
EntityResponse
Example data
Content-Type: application/json
I think this line:https://wl-api.mf.gov.pl/search/7393033097/{7393033097} should be like this:
https://wl-api.mf.gov.pl/search/nip/7393033097
HTTP request mentioned in Microsoft Graph API's documentation found at this link
GET /reports/getMailboxUsageDetail(period='{period_value}')
I cannot understand how to incorporate the data mentioned within the round parenthesis
(period='{period_value}')
I tried adding this to query parameters, but it didn't work.
URL="https://graph.microsoft.com/beta/reports/getMailboxUsageDetail"
queryParams={"period":"D7"}
requests.get(URI, params=queryParams)
But, it didn't work.
It's actually simpler than you would think.
You just use the period parameter shown in the round brackets in the URL directly as shown in the documentation.
So, if you want to get the same report you're trying as shown in HTTP format:
GET /reports/getMailboxUsageDetail(period='{period_value}')
You will use the URL as:
reportsURI="https://graph.microsoft.com/beta/reports/getMailboxUsageDetail(period='D7')"
requests.get(reportsURI, headers=authHeaders)
This will give you a report in CSV format.
If you want in JSON format, you can use query parameters to mention format
formatParams = {"format":"application/json"}
requests.get(reportsURI, headers=auth, params=formatParams)
This will give you JSON report.
I would like to know how to get the crawling data (list of URLs manually input through the GUI) from my import.io extractors.
The API documentation is very scarce and it does not specify if the GET requests I make actually start a crawler (and consume one of my crawler available runs) or just query the result of manually launched crawlers.
Also I would like to know how to obtain the connector ID, as I understand, an extractor is nothing more than a specialized connector, but when I use the extractor_id as the connector id for querying the API, I get the connector does not exist.
A way I thought I could have listed the URLs I have in one off my extractors is this:
https://api.import.io/store/connector/_search?
_sortDirection=DESC&_default_operator=OR&_mine=true&_apikey=123...
But the only result I get is:
{ "took": 2, "timed_out": false, "hits": {
"total": 0,
"hits": [],
"max_score": 0 } }
Nevertheless, even if I would get a more complete response, the example result I see in the documentation ddoes not mention any kind of list or element containing the URLs I'm trying to get from my import.io account.
I am using python to create this API
The legacy API will not work for any non-legacy connectors, so you will have to use the new Web Extractor API. Unfortunately, there is no documentation for this.
Luckily, with some snooping you can find the following call to list connectors connected to your apikey:
https://store.import.io/store/extractor/_search?_apikey=YOUR_API_KEY
From here, You check each hit and verify the _type property is set to EXTRACTOR. This will give you access to, among other things, the GUID associated with the extractor and the name you chose for it when you created it.
You can then do the following to download the latest run from the extractor in CSV format:
https://data.import.io/extractor/{{GUID}}/csv/latest?_apikey=YOUR_API_KEY
This was found in the Integrations tab of every Web Extractor. There are other queries there as well.
Hope this helps.
I have a library which third-party developers use for obtaining information off a few specific websites. The library is responsible for connecting to the website, grabbing pages, parsing necessary information, and returning it to the developer.
However, I'm having issues coming up with an acceptable way to handle storing potentially malformed HTML. Since I can only account for so many things when testing, parsing may fail in the future and it would be helpful if I could find a way to store the HTML that failed parsing for future bug fixing.
Right now I'm using the internal logging module of Python to handle logging in my library. I'm allowing the third-party developer to supply a configuration dictionary to configure how the logging outputs error data. However, printing HTML to the console or even to a file is to me not ideal as I think it would clutter the terminal or error log. I considered storing HTML files on the local hard drive, but that seems extremely intrusive.
I've determined how I'm going to pass HTML internally. My plan is to pass it via the parameters of an exception and then catch it with a Filter. However, what to do with it is really troubling me.
Any feedback on a method to accomplish this is appreciated.
Services based on websites that you don't control are likely to be somewhat fragile, so storing the HTML to avoid recrawling in the event of parsing problems makes perfect sense to me. Since uncompressed HTML can consume a lot of space on the disk, you might want to store it in a compressed form in a database.
I've found MongoDB to be convenient for this. The underlying storage format is BSON (i.e. binary JSON). It's also easy to install and use.
Here's a toy example using PyMongo to store this page in MongoDB:
from pymongo import MongoClient
import urllib2
import time
# what will be stored in the document
ts = time.time()
url = 'http://stackoverflow.com/questions/26683772/logging-html-content-in-a-library-environment-with-python'
html = urllib2.urlopen(url).read()
# create a dict and store it in MongoDB
htmlDict = {'url':url, 'ts':ts, 'html':html}
client = MongoClient()
db = client.html_log
collection = db.html
collection.insert(htmlDict)
Check to see that the document is stored in MongoDB:
$ mongo
> use html_log;
> db.html.find()
{ "_id" : ObjectId("54544d96164a1b22d3afd887"), "url" : "http://stackoverflow.com/questions/26683772/logging-html-content-in-a-library-environment-with-python", "html" : "<!DOCTYPE html> [...] </html>", "ts" : 1414810778.001168 }
I am coding a Python2 script to perform some automatic actions in a website. I'm using urllib/urllib2 to accomplish this task. It involves GET and POST requests, custom headers, etc.
I stumbled upon an issue which seems to be not mentioned in the documentation. Let's pretend we have the following valid url: https://stackoverflow.com/index.php?abc=def&fgh=jkl and we need to perform a POST request there.
How my code looks like (please ignore if you find any typo errors):
data = urllib.urlencode({ "data": "somedata", "moredata": "somemoredata" })
urllib2.urlopen(urllib2.Request("https://stackoverflow.com/index.php?abc=def&fgh=jkl", data))
No errors are shown, but according to the web server, the petition is being received to "https://stackoverflow.com/index.php" and not to "https://stackoverflow.com/index.php?abc=def&fgh=jkl". What is the problem here?
I know that I could use Requests, but I'd like to use urllib/urllib2 first.
If I'm not wrong, you should pass your request data in data dictionary you passed to the url open() function.
data = urllib.urlencode({'abc': 'def', 'fgh': 'jkl'})
urllib2.urlopen(urllib2.Request('http://stackoverflow.com/index.php'))
Also, just like you said, use Requests unless you absolutely need the low level access urllib provides.
Hope this helps.