I have ran tika server in my machine and call api using terminal which is working fine. I am able to extract text from image and pdf. But, I want to implement the api call in my python application.
curl -T price.xls http://localhost:9998/tika --header "Accept: text/plain"
Above is api call that i have to make. I can run this in my terminal and works fine but how to implement in python application. I have installed and tried requests.
API_URL = 'http://localhost:9998/tika'
APP_ROOT = os.path.dirname(os.path.abspath(__file__))
tika_client = TikaApp(file_jar=join(APP_ROOT,'../tika-app-1.19.jar'))
data = {
"url": join(APP_ROOT,'../static/image/a.pdf')
}
response = requests.put(API_URL, data)
print(response.content)
Any help will be appreciate. Thank you :)
error output
INFO tika (application/x-www-form-urlencoded)
WARN tika: Text extraction failed
org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.server.resource.TikaResource$1#475b0e2
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
at org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:402)
at org.apache.tika.server.resource.TikaResource$5.write(TikaResource.java:513)
at org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:177)
at org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1391)
at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:246)
at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:122)
at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:84)
at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90)
at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:267)
at org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)
at org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:205)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:531)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)
at java.lang.Thread.run(Thread.java:748)
Caused by: javax.ws.rs.WebApplicationException: HTTP 415 Unsupported Media Type
at org.apache.tika.server.resource.TikaResource$1.parse(TikaResource.java:128)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
... 37 more
ERROR Problem with writing the data, class org.apache.tika.server.resource.TikaResource$5, ContentType: text/plain
You need to define the data(payload), header.
url = 'http://localhost:9998/tika/......'
headers = {"Accept: text/plain"}
response = requests.put(url,data = data, headers=headers)
Have glance at this Making a request to a RESTful API using python
Related
I'm having trouble adding a Compliance Standard to an existing Policy via the Pal Alto Prisma Cloud API.
Everytime I send the request, I'm returned with a 500 Server Error (and, unfortunately, the API documentation is super unhelpful with this). I'm not sure if I'm sending the right information to add a compliance standard as the API documentation doesn't show what info needs to be sent. If I leave out required fields (name, policyType, and severity), I'm returned a 400 error (bad request, which makes sense). But I can't figure out why I keep getting the 500 Server Error.
In essence, my code looks like:
import requests
url = https://api2.redlock.io/policy/{policy_id}
header = {'Content-Type': 'application/json', 'x-redlock-auth': 'token'}
payload = {
'name': 'policy_name',
'policyType': 'policy_type',
'severity': 'policy_severity',
'complianceMetadata': [
{
'standardName': 'standard_name',
'requirementId': 'requirement_ID',
'sectionId': 'section_id'
}
]
}
response = requests.request('PUT', url, json=payload, header=header)
The response should be a 200 with the policy's metadata returned in JSON format with the new compliance standard.
For those using the RedLock API, I managed to figure it out.
Though non-descriptive, 500 errors generally mean the JSON being sent to the server is incorrect. In this case, the payload was incorrect.
The correct JSON for updating a policy's compliance standard is:
req_header = {'Content-Type':'application/json','x-redlock-auth':jwt_token}
# This is a small function to get a policy by ID
policy = get_redlock_policy_by_ID(req_header, 'policy_ID')
new_standard = {
"standardName":"std-name",
"requirementId":"1.1",
"sectionId":"1.1.1",
"customAssigned":true,
"complianceId":"comp-id",
"requirementName":"req-name"
}
policy['complianceMetadata'].append(new_standard)
requests.put('{}/policy/{}'.format(REDLOCK_API_URL, policy['policyId']), json=policy, headers=req_header)
I try to follow MS official doc to get the log from my resource in Azure Log Monitor but never success.
My code is like below.
from azure.loganalytics import LogAnalyticsDataClient
from azure.common.client_factory import get_client_from_cli_profile
from azure.loganalytics.models import QueryBody
log_client = get_client_from_cli_profile(LogAnalyticsDataClient)
myWorkSpaceId = '1234567890...'
result = log_client.query(myWorkSpaceId, QueryBody(**{'query': 'Heartbeat| limit 50'}))
And I always get exception like below:
result = log_client.query(myWorkSpaceId, QueryBody(**{'query': 'Heartbeat| limit 50'}))
File ".../lib/python2.7/site-packages/azure/loganalytics/log_analytics_data_client.py", line 121, in query
raise models.ErrorResponseException(self._deserialize, response)
azure.loganalytics.models.error_response.ErrorResponseException: (MissingApiVersionParameter) The api-version query parameter (?api-version=) is required for all requests
I trace code into library in /azure/loganalytics/log_analytics_data_client.py, and dump the url string used for query like below.
print(url, query_parameters, header_parameters, body_content)
request = self._client.post(url, query_parameters)
response = self._client.send(request, header_parameters, body_content, stream=False, **operation_config)
The output of the url and query information is like below and it look like no version information in between and I doubt this is why I get the exception:
('https://management.azure.com/workspaces/1234567890.../query', {}, {'Content-Type': 'application/json; charset=utf-8'}, {'query': 'Heartbeat| limit 50'})
My azure SDK version is 4.0.0, and my azure-loganalytics library version is v0.1.0, running on Ubuntu.
Does anyone run into same issue or know how to fix this?
Thanks.
I am trying to publish a machine learning model on Azure webservice using python. I am able to deploy the code successfully but when i try to call it through the URL, it's throwing me 'Azure' module doesn't exist. The code basically retrieves a TFIDF model from the container (blob) and use it to predict the new value. The error clearly says, Azure package is missing while trying to run on the webservice and I am not sure how to fix it. Here goes the code:
For deployment:
from azureml import services
from azure.storage.blob import BlobService
#services.publish('7c94eb2d9e4c01cbe7ce1063','f78QWNcOXHt9J+Qt1GMzgdEt+m3NXby9JL`npT7XX8ZAGdRZIX/NZ4lL2CkRkGQ==')
#services.types(res=unicode)
#services.returns(str)
def TechBot(res):
from azure.storage.blob import BlobService
from gensim.similarities import SparseMatrixSimilarity, MatrixSimilarity, Similarity
blob_service = BlobService(account_name='tfidf', account_key='RU4R/NIVPsPOoR0bgiJMtosHJMbK1+AVHG0sJCHT6jIdKPRz3cIMYTsrQ5BBD5SELKHUXgBHNmvsIlhEdqUCzw==')
blob_service.get_blob_to_path('techbot',"2014.csv","df")
df=pd.read_csv("df")
doct = res
To access the url I used the python code from
service.azureml.net
import urllib2
import json
import requests
data = {
"Inputs": {
"input1":
[
{
'res': "wifi wnable",
}
],
},
"GlobalParameters": {
}
}
body = str.encode(json.dumps(data))
#proxies = {"http":"http://%s" % proxy}
url = 'http://ussouthcentral.services.azureml.net/workspaces/7c94eb2de26a45399e4c01cbe7ce1063/services/11943e537e0741beb466cd91f738d073/execute?api-version=2.0&format=swagger'
api_key = '8fH9kp67pEt3C6XK9sXDLbyYl5cBNEwYg9VY92xvkxNd+cd2w46sF1ckC3jqrL/m8joV7o3rsTRUydkzRGDYig==' # Replace this with the API key for the web service
headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)}
#proxy_support = urllib2.ProxyHandler(proxies)
#opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler(debuglevel=1))
#urllib2.install_opener(opener)
req = urllib2.Request(url, body, headers)
try:
response = urllib2.urlopen(req, timeout=60)
result = response.read()
print(result)
except urllib2.HTTPError, error:
print("The request failed with status code: " + str(error.code))
# Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
print(error.info())
print(json.loads(error.read()))
The string 'res' will be predicted at the end. As I said it runs perfectly fine if I run as it is in python by calling azure module, problem happens when I access the url.
Any help is appreciated, please let me know if you need more information (I only sohwcased half of my code)
I tried to reproduce the issue via POSTMAN, then I got the error information below as you said.
{
"error": {
"code": "ModuleExecutionError",
"message": "Module execution encountered an error.",
"details": [
{
"code": "85",
"target": "Execute Python Script RRS",
"message": "Error 0085: The following error occurred during script evaluation, please view the output log for more information:\r\n---------- Start of error message from Python interpreter ----------\r\nCaught exception while executing function: Traceback (most recent call last):\n File \"\\server\\InvokePy.py\", line 120, in executeScript\n outframe = mod.azureml_main(*inframes)\n File \"\\temp\\1280677032.py\", line 1094, in azureml_main\n File \"<ipython-input-15-bd03d199b8d9>\", line 6, in TechBot_2\nImportError: No module named azure\n\r\n\r\n---------- End of error message from Python interpreter ----------"
}
]
}
}
According to the error code 00085 & the information ImportError: No module named azure, I think the issue was caused by importing python moduleazure-storage. There was a similar SO thread Access Azure blog storage from within an Azure ML experiment which got the same issue, I think you can refer to its answer try to use HTTP protocol instead HTTPS in your code to resolve the issue as the code client = BlobService(STORAGE_ACCOUNT, STORAGE_KEY, protocol="http").
Hope it helps. Any concern & update, please feel free to let me know.
Update: Using HTTP protocol for BlobService
from azureml import services
from azure.storage.blob import BlobService
#services.publish('7c94eb2d9e4c01cbe7ce1063','f78QWNcOXHt9J+Qt1GMzgdEt+m3NXby9JL`npT7XX8ZAGdRZIX/NZ4lL2CkRkGQ==')
#services.types(res=unicode)
#services.returns(str)
def TechBot(res):
from azure.storage.blob import BlobService
from gensim.similarities import SparseMatrixSimilarity, MatrixSimilarity, Similarity
# Begin: Update code
# Using `HTTP` protocol for BlobService
blob_service = BlobService(account_name='tfidf',
account_key='RU4R/NIVPsPOoR0bgiJMtosHJMbK1+AVHG0sJCHT6jIdKPRz3cIMYTsrQ5BBD5SELKHUXgBHNmvsIlhEdqUCzw==',
protocol='http')
# End
blob_service.get_blob_to_path('techbot',"2014.csv","df")
df=pd.read_csv("df")
doct = res
I am trying to launch a Jenkins parametrized job from a python script. Due to environment requirements, I can't install python-jenkins. I am using raw requests module.
This job I am trying to launch has three parameters:
string (let's call it payload)
string (let's call it target)
file (a file, optional)
I've searched and search, without any success.
I managed to launch the job with two string parameters by launching:
import requests
url = "http://myjenkins/job/MyJobName/buildWithParameters"
target = "http://10.44.542.62:20000"
payload = "{payload: content}"
headers = {"Content-Type": "application/x-www-form-urlencoded"}
msg = {
'token': 'token',
'payload': [ payload ],
'target': [ target ],
}
r = requests.post(url, headers=headers, data=msg)
However I am unable to send a file and those arguments in single request.
I've tried requests.post file argument and failed.
It turns out it is impossible to send both data and file in a single request via HTTP.
import jenkinsapi
from jenkinsHandler import JenkinsHandler
in your python script
Pass parameters to buildJob() , (like < your JenkinsHandler object name>.buildJob())
JenkinsHandler module has functions like init() , buildJob(), isRunning() which helps in triggering the build
Here is an example:
curl -vvv -X POST http://127.0.0.1:8080/jenkins/job/jobname/build
--form file0='#/tmp/yourfile'
--form json='{"parameter": [{"name":"file", "file":"file0"}]}'
--form json='{"parameter": [{"name":"payload", "value":"123"}]
--form json='{"parameter": [{"name":"target", "value":"456"}]}'
I am trying a upload a file to Google Cloud Storage via a python script but keep getting a 404 error! I am sure I am not trying to reference a non-available resource. My code snippet is:
uploadFile = open("testUploadFile.txt", "r")
httpObj = httplib.HTTPSConnection("googleapis.com", timeout = 10)
httpObj.request("PUT", requestString, uploadFile, headerString)
uploadResponse = httpObj.getresponse()
print "Request string is:" + requestString
print "Return status:" + str(uploadResponse.status)
print "Reason:" + str(uploadResponse.reason)
Where
requestString = /upload/storage/v1beta2/b/bucket_id_12345678/o?uploadType=resumable&name=1%2FtestUploadFile.txt%7Calm_1391258335&upload_id=AbCd-1234
headerString = {'Content-Length': '47', 'Content-Type': 'text/plain'}
Any idea where I'm going wrong?
If you're doing a resumable upload, you'll need to start with a POST as described here: https://developers.google.com/storage/docs/json_api/v1/how-tos/upload#resumable
However, for a 47-byte object, you can use a simple upload, which will be much ... simpler. Instructions are here:
https://developers.google.com/storage/docs/json_api/v1/how-tos/upload#simple
It should be easy enough for you to replace the appropriate lines in your code with:
httpObj.request("POST", requestString, uploadFile, headerString)
requestString = /upload/storage/v1beta2/b/bucket_id_12345678/o?uploadType=media&name=1%2FtestUploadFile.txt%7Calm_1391258335
As an aside, in your code, headerString is actually a dict, not a string.