Adding the time-cond flag to the python requests.get call?

Adding the time-cond flag to the python requests.get call? - python

I am trying to add the --time-cond flag to a GET call using the python request package. I want to check that the file is older than Jan 12 2022 for example before downloading.
I want to convert the cURL command:
curl -z "Jan 12, 2022" https://registry.verra.org/mymodule/ProjectDoc/Project_ViewFile.asp?FileID=11032&IDKEY=dkjalskjf098234kj28098sfkjlf098098kl32lasjdflkj909j15213128
[Note that in curl -z is short hand for --time-cond]
to something like this:
shapefile = requests.get('https://registry.verra.org/mymodule/ProjectDoc/Project_ViewFile.asp?FileID=11032&IDKEY=dkjalskjf098234kj28098sfkjlf098098kl32lasjdflkj909j15213128',
params = {`time-cond`:"Jan 12 2022"})
i.e. only fetching files modified since Jan the 12th
I am uncertain where in the docs this is mentioned.
https://requests.readthedocs.io/en/latest/api/
What I tried...
shapefile = requests.get('https://registry.verra.org/mymodule/ProjectDoc/Project_ViewFile.asp?FileID=11032&IDKEY=dkjalskjf098234kj28098sfkjlf098098kl32lasjdflkj909j15213128',
params = {`time-cond`:"Jan 12 2022"})
There is no error, but changing the cut off date to wide values in the future and the past shows no change. Requests just seems to ignore the unrecognised parameters
Tried the params = {name:value} as I interpreted from the docs, in the first instance https://requests.readthedocs.io/en/latest/api/

"time-cond" is just the name of a CURL flag, it doesn't have any meaning in HTTP or the requests package.
To understand what it does you need to look at the request CURL generates. If you do, you'll find that it creates the If-Modified-Since or If-Unmodified-Since header. So to generate the equivalent request using the requests package you need to add the appropriate header.
You can do that with the headers parameter (not params, which controls query parameters). Something like:
import datetime
requests.get(url, headers = {
"If-Unmodified-Since": datetime.datetime(2022, 1, 12)
})

Related

JSONDecodeError while trying to post csv value via python's requests.put method

I am currently working in python's requests library and accessing Salesforce API. I have successfully
Accessed the access_token from Salesforce API
Obtained the Session_ID
Now I need to do a upsert operation in salesforce using requests.put in text/csv format (as requested by the API developer)
Please see below for the code snippet and I have not shown the above two steps of the code
# Data for upsert operation
data = {
"Name":["ABC"],
"Model_Score__c":['Low'],
"Email__c":['Y'],
"Call__c":['N'],
"Processing_Date__c":['2022-02-24']
}
dfData = pd.DataFrame(data)
dfData_csv = dfData.to_csv(index=False, encoding='utf8')
# Headers to be sent with the put request
header = {}
header['Authorization'] = 'Bearer xxxxxx...xxxx'
header['Content-Type']='text/csv'
header['Connection'] = 'keep-alive'
header['Accept-Encoding'] = 'gzip, deflate, br'
# Points to the URL where we need to perform the operation and test_api_url is a sandbox url
put_url = test_api_url+f'/{session_id}/batches'
# Calling the put request
put_call_response = requests.put(put_url, data=dfData_csv, headers=header)
I get the following error
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Please note that I am able to test it successfully in Postman and please see below the data sent in the body of the postman put request. I have also sent the raw text as shown below using python but I still get the same error
"Name",Model_Score__c,Email__c,Call__c,Processing_Date__c
"ABC",Low,Y,N,2022-02-24
Any help is much appreciated.

It appears that you are trying to use the Bulk API (the REST API does not accept CSV data), but you're calling the wrong endpoint and haven't performed any of the required setup to do so.
put_url = test_api_url+f'/{session_id}/batches'
Never, ever put your session id in a URL! In any case, it's not the right URL for the Bulk API.
If you're trying to work with the more recent Bulk API 2.0, you want to do an ingest operation. (Detail).
With Bulk API 1.0, you'd do a more complex sequence of creating a job, adding batches, closing the job and monitoring.
I recommend reading the entire Bulk API Developer Guide. These APIs are complex and tricky.

Actually my answer was correct. The reason why I was getting the error is because I was executing the below statement. My bad for not posting the below line of code in my question.
print(put_call_response.json())
The above print statement was throwing the error because I was trying to parse an object that is not of json type. If I check the status_code, I was getting 201 (which means successfully updated the record) and when I checked the backend, it was reflecting as well.
print(put_call_response.status_code)

How to complete URL's with only domain name

So i am calling out urls i.e. "domain.xyz" from a .csv file. The purpose is to use the requests module to GET/HEAD responses. Using this code as a work around to add a string.
x = "http://www."+str('domain.com')
response = requests.head(x)
The problem here is not all "domain.com" entries in my .csv start with standard http://www.. What's the best way to complete the URL before using the requests module?
p.s. I am looking for something similar to what Chromes address bar does to complete a url. For instance when we enter 'abc.com'. it completes it to "http://www.abc.xyz".

Fetching Doi metadata using python

I want to fetch Bibtex citation from doi in Python. So the way I achieved it is by using this function:
def BibtexFromDoi(doi):
url = "http://dx.doi.org/" + doi
headers = {"accept": "application/x-bibtex"}
r = requests.get(url, headers=headers)
return r.text
The problem with that, is that it takes so long to run this code, it takes from 10 to 15 minutes to get a response. I was wondering on what can be done to enhance the code and make it run faster In addition, I tried using Curl on the command line and it turns out to be faster with Curl, it only takes 1 to 2 seconds. I would like to achieve the same spead but on python.

How to get more insight into why documents fail to be ingested in Watson Discovery Service

I'm using the DiscoveryV1 module of the watson_developer_cloud python library to ingest 700+ documents into a WDS collection. Each time I attempt a bulk-ingestion many of the documents fail to be ingested, it is nondeterministic, usually around 100 documents fail.
Each time I call discovery.add_document(env_id, cold_id, file_info=file_info) I find that the response contains a WDS document_id. After I've made this call for all documents in my corpus I use the corresponding document_ids to call discovery.get_document(env_id, col_id, doc_id) and check the document's status. Around 100 of these calls will return the status Document failed to be ingested and indexed. There is no pattern among the files that fail, they range in size and of both msword (doc) and pdf file types.
My code to ingest a document was written based on the WDS Documentation, it looks something like this:
with open(f_path) as file_data:
if f_path.endswith('.doc') or f_path.endswith('.docx'):
re = discovery.add_document(env_id, col_id, file_info=file_data, mime_type='application/msword')
else:
re = discovery.add_document(env_id, col_id, file_info=file_data)
Because my corpus is relatively large, ~3gb in size, I recieve Service is busy processing... responses from discovery.add_document(env_id, cold_id, file_info=file_info) calls in which case I call sleep(5) and try again.
I've exhausted the WDS documentation without any luck. How can I get more insight into the reason that these files are failing to be ingested?

You should be able to use the https://watson-api-explorer.mybluemix.net/apis/discovery-v1#!/Queries/queryNotices API to see errors/warnings that happen during ingestion along with details that might give more information on why the ingestion failed.
Unfortunately, at the time of this posting it does not look like the python SDK has a method to wrap this API yet, so you can use the Watson Discovery Tooling or use curl to query the API directly (replacing the values in {} with your collection-specific values)
curl -u "{username}:{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/notices?version=2017-01-01

The python-sdk now supports querying notices.
from watson_developer_cloud import DiscoveryV1
discovery = DiscoveryV1(
version='2017-10-16',
## url is optional, and defaults to the URL below. Use the correct URL for your region.
url='https://gateway.watsonplatform.net/discovery/api',
iam_api_key='your_api_key')
discovery.federated_query_notices('env_id', ['collection_id']])

How can I pass curl options through Python requests?

I am working with the Desk.com api. One of the limitations is that they only allow 500 pages of records to be called. If you have more than 500 pages that need to be downloaded, it is necessary to sort/filter the data with the curl -d option. Typically, I do this by setting the 'since_id' option to a higher ID and downloading 500 more pages. This is essentially telling the desk database to send me up to 500 pages of data since_id=x
Typically I run this in python using os.popen(), but I want to try to switch it over to requests.get(), as that is something that should work better on windows devices.
os.popen("curl https://www.URL.com -u username:password -H 'Accept:application/json' -d 'since_id=someID&sort_field=id&sort_direction=asc' -G")
With requests, I have tried running it many different ways including trying to pass the -d parameters through like so.
payload = '-d 'since_id=someID&sort_field=id&sort_direction=asc' -G'
payload(alternate) = "{"-d":"'since_id=someID&sort_field=id&sort_direction=asc'","-G":""}"
requests.get('https://www.URL.com',auth=('username','password'),data=payload)
Honestly, I wasn't sure what to do with the -G at the end of my second attempt at the payload variable.
I have tried the following.
*including '-G' in the "-d" value of the json as well as putting it in its own dict
*a few different variations including switching 'data' to 'params' on the requests.get line.
*Adding/removing single quotes on the -d value in the get request

The -d argument in curl corresponds to query parameters in requests. Something like this should work:
payload = {'since_id': 'someID', 'sort_field': 'id', 'sort_direction': 'asc'}
requests.get('https://www.example.com', params=payload)

I'm assuming the api uses basic authentication.
import requests
from requests.auth import HTTPBasicAuth
payload = {'something':'value'}
requests.get('http://myurl.com', auth=HTTPBasicAuth('user', 'pass'), params=Payload)
Hopefully this will work!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Adding the time-cond flag to the python requests.get call? - python

Related

JSONDecodeError while trying to post csv value via python's requests.put method

How to complete URL's with only domain name

Fetching Doi metadata using python

How to get more insight into why documents fail to be ingested in Watson Discovery Service

How can I pass curl options through Python requests?

Categories

Resources