Export/stream massive results from splunk REST API - python

I need to export a massive number of events from splunk. Hence for performance reasons i resorted to directly using the REST API in my python code rather than using the Splunk SDK itself.
I found the following curl command to export results. This is also available here:-
curl -ku username:password
https://splunk_host:port/servicesNS/admin/search/search/jobs/export -d
search=“search index%3D_internal | head 3” -d output_mode=json
My attempt at simulating this using python's http functions is as follows:-
//assume i have authenticated to splunk and have a session key
base_url = "http://splunkhost:port"
search_job_urn = '/services/search/jobs/export'
myhttp = httplib2.Http(disable_ssl_certificate_validation=True)
searchjob = myhttp.request(base_url + search_job_urn, 'POST', headers=
{'Authorization': 'Splunk %s' % sessionKey},
body=urllib.urlencode({'search':'search index=indexname sourcetype=sourcename'}))[1]
print searchjob
The last print keeps printing all results until done. For large queries i get "Memory Errors". I need to be able to read results in chunks (say 50,000) and write them to a file and reset the buffer for searchjob. How can i accomplish that?

Related

zip file upload with Python Requests Not working

Hi am attempting to upload a file to a server when making a post with requests but everytime I get errors. If do the same with cURL it goes through but Im not familiar with cURL or uploading files really, mostly get requests, so I have no clue what its doing differently. I am on windows and am running the below. This is being uploaded to mcafee epo so Im not sure if their api just super picky or what the difference is but every python example ive tried for uploading a file via requests module has failed for me.
url = "https://server.url.com:1234/remote/repository.checkInPackage.do?&allowUnsignedPackages=True&option=Normal&branch=Evaluation"
user = "domain\user"
password = "mypass"
filepath = "C:\\my\\folder\\with\\afile.zip"
with open(filepath, "rb") as f:
file_dict = {"file": f}
response = requests.post(url, auth=(user, password), files=file_dict)
I usually get a error as follows:
'Error 0 :\r\njava.lang.reflect.InvocationTargetException\r\n'
if I use cURL it works though
curl.exe -k -s -u "domain\username:mypass" "https://server.url.com:1234/remote/repository.checkInPackage.do?&allowUnsignedPackages=True&option=Normal&branch=Evaluation" -F file=#"C:\my\folder\with\afile.zip"
I cant really see the difference though and am wondering what is being done differently on the backend for cURL or what I could be doing wrong when using python.

python script for api call

I a splunk developer need to create a python script to get data from website using api call. I have no idea about how to write python script.
I have one refresh token through which we will get another token (access token ).
curl -X POST https://xxxx.com/api/auth/refreshToken -d <refresh token>
above command will return only access code in text format
curl -X GET https://xxxx.com/api/reporting/v0.1.0/training -g --header "Authorization:Bearer <access token>"| json_pp
by running above code we will get the data in json format.
I need to create a python script for this type api call.
Thanks in advance.
Say you have a file called rest.py, then :
import requests
from requests.auth import HTTPDigestAuth
import json
# Replace with the correct URL
url = "http://api_url"
# It is a good practice not to hardcode the credentials. So ask the user to enter credentials at runtime
myResponse = requests.get(url,auth=HTTPDigestAuth(raw_input("username: "), raw_input("Password: ")), verify=True)
#print (myResponse.status_code)
# For successful API call, response code will be 200 (OK)
if(myResponse.ok):
# Loading the response data into a dict variable
# json.loads takes in only binary or string variables so using content to fetch binary content
# Loads (Load String) takes a Json file and converts into python data structure (dict or list, depending on JSON)
jData = json.loads(myResponse.content)
print("The response contains {0} properties".format(len(jData)))
print("\n")
for key in jData:
print key + " : " + jData[key]
else:
# If response code is not ok (200), print the resulting http error code with description
myResponse.raise_for_status()

Forge - inconsistent output from python script and terminal command

I am following this tutorial to convert a .sldprt to .obj file. I wanted to accomplish this conversion using a python script, and I found a script online that accomplishes this to the point where it uploads the file to the server and begins the conversion. In step 3 of the tutorial (Verify the job is Complete), when I type the following command into the command line:
curl -X 'GET' -H 'Authorization: Bearer MYTOKEN' -v 'https://developer.api.autodesk.com/modelderivative/v2/designdata/MYURN/manifest'
I get an appropriate response (see image below):
However, doing the same thing from Python script gives me the following output:
My Python script is as below:
### Verify if translation is complete and get the outURN
url = BASE_URL + 'modelderivative/v2/designdata/' + urn + '/manifest'
headers = {
'Authorization' : 'Bearer ' + ACCESS_TOKEN
}
r = requests.get(url, headers=headers)
content = eval(r.content)
print("==========================================")
print(content)
print("==========================================")
I have no idea whats the difference between the two (terminal command and the command given from python script). Can someone point out what the problem here is?
Or better yet, listen to extraction.finished event, which notifies when a translation is done.
I believe I had to pause for some time after beginning the conversion to allow some time for the cloud to convert .sldprt to .stl. The solution is constantly poll the 'status' key and proceed only when the status changes from 'pending' to 'success'

Issues retrieving information from API

Unfortunately I cannot offer a reproducible dataset. I'm attempting to connect to an API and pull out report data from GoodData. I've been able to successfully connect and pull the report out, but occasionally it fails. There is a specific point in the script that it fails and I can't figure out why it works sometimes and not others.
connect to gd api, get temporary token
I created the below function to download the report. The function parameters are the project id within gooddata, the temporary token I received from logging in/authenticating, the file name I want it to be called, and the uri that I receive from calling the specific project and report id. the uri is like the location of the data.
uri looks something like (not real uri)..
'{"uri":"/gdc/projects/omaes11n7jpaisfd87asdfhbakjsdf87adfbkajdf/execute/raw/876dfa8f87ds6f8fd6a8ds7f6a8da8sd7f68as7d6f87af?q=as8d7f6a8sd7fas8d7fa8sd7f6a8sdf7"}'
from urllib2 import Request, urlopen
import re
import json
import pandas as pd
import os
import time
# function
def download_report(proj_id, temp_token, file_name, uri, write_to_file=True):
headers = {
'Accept': 'application/json',
'Content-Type': 'application/json',
'X-GDC-AuthTT': temp_token
}
uri2 = re.sub('{"uri":|}|"', '', uri)
put_request = Request('https://secure.gooddata.com' + uri2, headers=headers)
response = urlopen(put_request).read()
with open(file_name + ".csv", "wb") as text_file:
text_file.write(response)
with open(file_name + ".csv", 'rb') as f:
gd_data = pd.read_csv(f)
if write_to_file:
gd_data.to_csv(file_name + '.csv', index=False)
return gd_data
The uri gets attached to the normal gooddata URL, along with the headers to extract the information into a text format which then gets converted into a csv/dataframe.
For some reason the dataframe is coming back just basically turning the uri into a dataframe instead of pulling the data out of the link. One last thing that I'm finding that is strange is that when I launch Spyder and try this, it fails the first time, always. If I try running it again, it will work. I don't know why. Since I'm trying to run this on a schedule its successfully running for a few days a couple times a day and then just starts failing.
Reason why you sometimes get URI to data result and not actual data result is that the data result is not yet ready. It sometimes takes a while to compute report. Besides the URI you also get HTTP status 202. It means that request was accepted, but result is not done yet.
Check HTTP status with getcode() method. If you get 202, request the URI again until you get 200 and then read data result.
First try if you get a response on curl (make sure the URL is correct)
curl \
-H "Content-Type: application/json" \
-H "X-GDC-AuthTT: temp_token" \
"https://secure.gooddata.com/gdc/projects/omaes11n7jpaisfd87asdfhbakjsdf87adfbkajdf/execute/raw/876dfa8f87ds6f8fd6a8ds7f6a8da8sd7f68as7d6f87af?q=as8d7f6a8sd7fas8d7fa8sd7f6a8sdf7"

ReadTheDocs set version to active via script

I am trying to set ReadTheDocs.com (i.e. the commercial end of ReadTheDocs) versions' active state programatically.
This idea being that a branch, when created, has documentation built for it, and when the branch ends we delete the version documentation (or at least stop building for it).
The latter is, obviously, only cleanup and not that important (want, not need). But we'd strongly like to avoid having to use the project management interface to set each branch/version to active.
I've been trying to use the v2 REST API provided by RTD. I can extract version data from "GET https://readthedocs.com/api/v2/version/" and find the version I want to mess with, but I am unable to either send data back, or find something that lets me set Version.active=True for a given version id in their API.
I'm not hugely up on how to play with these APIs so any help would be much appreciated.
I am using python and the requests library.
I searched a solution for this, because I had the same problem at automating the build process of the documentation in connection with my Git server.
At the end I found two different ways to change the project versions and set them to active with a script. Both scripts emulate the http requests which are sent to the read-the-docs server. I have a local running instance with http (without https) and it works, but I don´t know if it works for https too.
Maybe it is necessary to capture the packets via Wireshark and adapt the script.
First script (using Python):
def set_version_active_on_rtd():
server_addr = "http://192.168.1.100:8000"
project_slug = "myProject"
rtd_user = 'mylogin'
rtd_password = 'mypassword'
with requests.session() as s:
url = server_addr + "/accounts/login/"
# fetch the login page
s.get(url)
if 'csrftoken' in s.cookies:
# Django 1.6 and up
csrftoken = s.cookies['csrftoken']
else:
# older versions
csrftoken = s.cookies['csrf']
login_data = dict(login=rtd_user, password=rtd_password, csrfmiddlewaretoken=csrftoken, next='/')
r = s.post(url, data=login_data, headers=dict(Referer=url))
url = server_addr+"/dashboard/"+project_slug+"/versions/"
if 'csrftoken' in s.cookies:
# Django 1.6 and up
csrftoken = s.cookies['csrftoken']
else:
# older versions
csrftoken = s.cookies['csrf']
'''
These settings which are saved in version_data, are configured normally with help of the webinterface.
To set a version active, it must be configured with
'version-<version_number>':'on'
and its privacy must be set like
'privacy-<version_number>':'public'
To disable a version, its privacy has to be set and no other entry with 'on' has to be supplied
'''
version_data = {'default-version': 'latest', 'version-latest': 'on', 'privacy-latest' : 'public', 'privacy-master':'public','csrfmiddlewaretoken': csrftoken}
r = s.post(url, data = version_data, headers=dict(Referer=url))
Second script (bash and cUrl):
#!/bin/bash
RTD_SERVER='http://192.168.1.100:8000'
RTD_LOGIN='mylogin'
RTD_PASSWORD='mypassword'
RTD_SLUG='myProject'
#fetch login page and save first cookie
curl -c cookie1.txt "$RTD_SERVER"/accounts/login/ > /dev/null
#extract token from first cookie
TOKEN1=$(tail -n1 cookie1.txt | awk 'NF>1{print $NF}')
#login and send first cookie and save second cookie
curl -b cookie1.txt -c cookie2.txt -X POST -d
"csrfmiddlewaretoken=$TOKEN1&login=$RTD_LOGIN&\
password=$RTD_PASSWORD&next=/dashboard/"$RTD_SLUG"/versions/"
"$RTD_SERVER"/accounts/login/ > /dev/null
#extract token from second cookie
TOKEN2=$(tail -n3 cookie2.txt | awk 'NF>1{print $NF}' | head -n1)
# send data for the versions to the rtd server using the second cookie
curl -b cookie2.txt -X POST -d "csrfmiddlewaretoken=$TOKEN2&default-
version=latest&version-master=on&privacy-master=public&\
version-latest=on&privacy-latest=public"
$RTD_SERVER"/dashboard/"$RTD_SLUG"/versions/ > /dev/null
#delete the cookies
rm cookie1.txt cookie2.txt
To set a default-version, it can be necessary to run the script twice, if the version was not set to active. At the first run for activating the version, and at the second run to set it as default-version.
Hope it helps

Categories

Resources