I'm trying to convert curl script to parse pdf file from grobid server to requests in Python.
Basically, if I run the grobid server as follows,
./gradlew run
I can use the following curl to get the output of parsed XML of an academic paper example.pdf as below
curl -v --form input=#example.pdf localhost:8070/api/processHeaderDocument
However, I don't know the way to convert this script into Python. Here is my attempt to use requests:
GROBID_URL = 'http://localhost:8070'
url = '%s/processHeaderDocument' % GROBID_URL
pdf = 'example.pdf'
xml = requests.post(url, files=[pdf]).text
I got the answer. Basically, I missed api in the GROBID_URL and also the input files should be a dictionary instead of a list.
GROBID_URL = 'http://localhost:8070'
url = '%s/api/processHeaderDocument' % GROBID_URL
pdf = 'example.pdf'
xml = requests.post(url, files={'input': open(pdf, 'rb')}).text
Here is an example bash script from http://ceur-ws.bitplan.com/index.php/Grobid. Please note that there is also a ready to use python client available. See https://github.com/kermitt2/grobid_client_python
#!/bin/bash
# WF 2020-08-04
# call grobid service with paper from ceur-ws
v=2644
p=44
vol=Vol-$v
pdf=paper$p.pdf
if [ ! -f $pdf ]
then
wget http://ceur-ws.org/$vol/$pdf
else
echo "paper $p from volume $v already downloaded"
fi
curl -v --form input=#./$pdf http://grobid.bitplan.com/api/processFulltextDocument > $p.tei
Related
I'm trying to upload a secure file to my repository in GitLab.
While I am able to upload a secure file with curl, I encounter an error when using requests in Python.
my python code:
r = requests.post("https://gitlab.com/api/v4/projects/10186699/secure_files",
headers={"PRIVATE-TOKEN": "glpat-TH7FM3nThKmHgOp"},
files={"file": open("/Users/me/Desktop/dev/web-server/utils/a.txt", "r"),
"name": "a.txt"})
print(r.status_code,r.json())
Response:
400 {'error': 'name is invalid'}
The equivalent curl command I use that actually works:
curl --request POST --header "PRIVATE-TOKEN: glpat-TH7FM3nThKmHgOp" https://gitlab.com/api/v4/projects/10186699/secure_files --form "name=a.txt" --form "file=#/Users/me/Desktop/dev/web-server/utils/a.txt"
The equivalent call will be
import requests
resp = requests.post(
"https://gitlab.com/api/v4/projects/10186699/secure_files",
headers={"PRIVATE-TOKEN": "glpat-TH7FM3nThKmHgOp"},
files={"file": open("/Users/me/Desktop/dev/web-server/utils/a.txt", "rb")},
data={"name": "a.txt"}
)
print(resp.status_code,resp.json())
This is because the file= parameter is intended only for uploading files. On the other hand, name is your form data (you need to pass in the data= parameter).
It's also recommended to open files in binary mode. (docs)
Hi am attempting to upload a file to a server when making a post with requests but everytime I get errors. If do the same with cURL it goes through but Im not familiar with cURL or uploading files really, mostly get requests, so I have no clue what its doing differently. I am on windows and am running the below. This is being uploaded to mcafee epo so Im not sure if their api just super picky or what the difference is but every python example ive tried for uploading a file via requests module has failed for me.
url = "https://server.url.com:1234/remote/repository.checkInPackage.do?&allowUnsignedPackages=True&option=Normal&branch=Evaluation"
user = "domain\user"
password = "mypass"
filepath = "C:\\my\\folder\\with\\afile.zip"
with open(filepath, "rb") as f:
file_dict = {"file": f}
response = requests.post(url, auth=(user, password), files=file_dict)
I usually get a error as follows:
'Error 0 :\r\njava.lang.reflect.InvocationTargetException\r\n'
if I use cURL it works though
curl.exe -k -s -u "domain\username:mypass" "https://server.url.com:1234/remote/repository.checkInPackage.do?&allowUnsignedPackages=True&option=Normal&branch=Evaluation" -F file=#"C:\my\folder\with\afile.zip"
I cant really see the difference though and am wondering what is being done differently on the backend for cURL or what I could be doing wrong when using python.
I am writing a script in Python that detects the language of a provided text.
I found the following command line that works in a terminal, but I would like to use it in my script.
Command :
**curl -X POST "https://api.cognitive.microsofttranslator.com/detect?api-version=3.0" -H "Ocp-Apim-Subscription-Key: <client-secret>" -H "Content-Type: application/json" -d "[{'Text':'What language is this text written in?'}]"**.
In the script, elements like the client-secret, the "text", and so on... should be in variables. And I would like to catch the result of the whole command line in a variable and then print it to the user.
How can I do this?
I found the command line here.
The command in Command Line is essentially sending http request. So you just need to use the python code I provide below, just for reference.
import requests
import json
url = 'https://api.cognitive.microsofttranslator.com//Detect?api-version=3.0'
body =[{"text": "你好"}]
headers = {'Content-Type': 'application/json',"Ocp-apim-subscription-key": "b12776c*****14f5","Ocp-apim-subscription-region": "koreacentral"}
r = requests.post(url, data=json.dumps(body), headers=headers)
result=json.loads(r.text)
a=result[0]["language"]
print(r.text)
print("Language = " + a)
I have a url that I can make curl requests against
curl --insecure --header "Expect:" \
--header "Authorization: Bearer <api key>" \
https://some-url --silent --show-error --fail -o data-package.tar -v
Here I am trying to do it with the requests module
r = requests.get('https://stg-app.conduce.com/conduce/api/v1/admin/export/' + id,
headers=headers)
r.content ##binary tar file info
How do I write this to a tarfile-like data package?
The content will be the entire file (as bytes) that you can write out.
import requests
r = requests.get('...YOUR URL...')
# Create a file to write to in binary mode and just write out
# the entire contents at once.
# Also check to see if we get a successful response (add whatever codes
# are necessary if this endpoint will return something other than 200 for success)
if r.status_code in (200,):
with open('tarfile.tar', 'wb') as tarfile:
tarfile.write(r.content)
If you are downloading any arbitrary tar file and it could be rather large, you can choose to stream it instead.
import requests
tar_url = 'YOUR TAR URL HERE'
rsp = requests.get(tar_url, stream=True)
if rsp.status_code in (200,):
with open('tarfile.tar', 'wb') as tarfile:
# chunk size is how many bytes to read at a time,
# feel free to adjust up or down as you see fit.
for file_chunk in rsp.iter_content(chunk_size=512):
tarfile.write(chunk)
Note that this pattern (opening a file with wb mode) should generally work with writing any type of binary file. I would suggest reading the writing file documentation for Python 3 (Python 2 documentation here).
I would like to transpose the curl command (which upload a local file to rackspace)
curl -X PUT -T screenies/hello.jpg -D - \
-H "X-Auth-Token: fc81aaa6-98a1-9ab0-94ba-aba9a89aa9ae" \
https://storage101.dfw1.clouddrive.com/v1/CF_xer7_343/images/hello.jpg
to python requests. So far I have:
url = 'http://storage.clouddrive.com/v1/CF_xer7_343/images/hello.jpg'
headers = {'X-Auth-Token': 'fc81aaa6-98a1-9ab0-94ba-aba9a89aa9ae'}
request = requests.put(url, headers=headers, data={})
where do I specify I want to upload screenies/hello.jpg?
I understand -T in curl represents 'to FTP server', but I have searched the requests's github but cannot find mention of FTP.
No, -T just means 'upload this file', which can be used with FTP but is not limited to that.
You can just upload the file data as the data parameter:
with open('screenies/hello.jpg', 'rb') as image:
request = requests.put(url, headers=headers, data=image)
where data will read and upload the image data for you.