how to speed up a splunk export?

how to speed up a splunk export? - python

I am using the python 3 splunk API to export some massive logs.
My code essentially follows the splunk API guidelines:
import splunklib.client as client
import splunklib.results as results
import pandas as pd
kwargs_export = {"earliest_time": "2019-08-19T12:00:00.000-00:00",
"latest_time": "2019-08-19T14:00:00.000-00:00",
"search_mode": "normal"}
exportsearch_results = service.jobs.export(mysearchquery, **kwargs_export)
reader = results.ResultsReader(exportsearch_results)
df = pd.DataFrame(list(reader))
But this is extremely slow...
Ultimately I want to store the output of the search as a csv to disk. Is there any way to speed the export?
Thanks!

Check this as it works
kwargs_export = {"earliest_time": "-1d",
"latest_time": "now",
"search_mode": "normal"}
service = client.connect(**args)
job = service.jobs.create(query, **kwargs_export)
with open(filename, 'wb') as out_f:
try:
job_results = job.results(output_mode="csv", count=0)
for result in job_results:
out_f.write(result)
except :
print("Session timed out. Reauthenticating")

Related

Redis Timeseries Pipeline with Python

I am looking to use pipeline to insert data into a redis Timeseries but cannot find a way to ts.add via pipeline.
I can do basic example with get / set:
import redis
import json
redis_client = redis.Redis(host='xxx.xxx.xxx.xxx', port='xxxxx', password='xxxx')
pipe = redis_client.pipeline()
pipe.set(1,'apple')
pipe.set(2,'orange')
pipe.execute()
I cant find a way to insert into a timeseries:
import redis
import json
redis_client = redis.Redis(host='xxx.xxx.xxx.xxx', port='xxxxx', password='xxxx')
pipe = redis_client.pipeline()
pipe.ts.add(TS1,1652683016,55) #<----- this is what I want to do!
pipe.ts.add(TS1,1652683017,59) #<----- this is what I want to do!
pipe.execute()

As of this writing (redis-py 4.3.1) there exists another pipeline object on the timeseries class itself. The following will work:
import redis
r = redis.Redis()
pipe = r.ts().pipeline()
pipe.add("TS1", 1, 123123123123)
pipe.add("TS1", 2, 123123123451)
...
pipe.add("TS1", 15, 123123126957)
pipe.execute()

Python flask server to retrieve certain records

I have this following python code for a Flask server. I am trying to have this part of the code list all my vehicles that match the horsepower that I put in through my browser. I want it to return all the car names that match the horsepower, but what I have doesn't seem to be working? It returns nothing. I know the issue is somewhere in the "for" statement, but I don't know how to fix it.
This is my first time doing something like this and I've been trying multiple things for hours. I can't figure it out. Could you please help?
from flask import Flask
from flask import request
import os, json
app = Flask(__name__, static_folder='flask')
#app.route('/HORSEPOWER')
def horsepower():
horsepower = request.args.get('horsepower')
message = "<h3>HORSEPOWER "+str(horsepower)+"</h3>"
path = os.getcwd() + "/data/vehicles.json"
with open(path) as f:
data = json.load(f)
for record in data:
horsepower=int(record["Horsepower"])
if horsepower == record:
car=record["Car"]
return message

The following example should meet your expectations.
from flask import Flask
from flask import request
import os, json
app = Flask(__name__)
#app.route('/horsepower')
def horsepower():
# The type of the URL parameters are automatically converted to integer.
horsepower = request.args.get('horsepower', type=int)
# Read the file which is located in the data folder relative to the
# application root directory.
path = os.path.join(app.root_path, 'data', 'vehicles.json')
with open(path) as f:
data = json.load(f)
# A list of names of the data sets is created,
# the performance of which corresponds to the parameter passed.
cars = [record['Car'] for record in data if horsepower == int(record["Horsepower"])]
# The result is then output separated by commas.
return f'''
<h3>HORSEPOWER {horsepower}</h3>
<p>{','.join(cars)}<p>
'''
There are many different ways of writing the loop. I used a short variant in the example. In more detail, you can use these as well.
cars = []
for record in data:
if horsepower == int(record['Horsepower']):
cars.append(record['Car'])
As a tip:
Pay attention to when you overwrite the value of a variable by using the same name.

Changing output of speedtest.py and speedtest-cli to include IP address in output .csv file

I added a line in the python code “speedtest.py” that I found at pimylifeup.com. I hoped it would allow me to track the internet provider and IP address along with all the other speed information his code provides. But when I execute it, the code only grabs the next word after the find all call. I would also like it to return the IP address that appears after the provider. I have attached the code below. Can you help me modify it to return what I am looking for.
Here is an example what is returned by speedtest-cli
$ speedtest-cli
Retrieving speedtest.net configuration...
Testing from Biglobe (111.111.111.111)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by GLBB Japan (Naha) [51.24 km]: 118.566 ms
Testing download speed................................................................................
Download: 4.00 Mbit/s
Testing upload speed......................................................................................................
Upload: 13.19 Mbit/s
$
And this is an example of what it is being returned by speediest.py to my .csv file
Date,Time,Ping,Download (Mbit/s),Upload(Mbit/s),myip
05/30/20,12:47,76.391,12.28,19.43,Biglobe
This is what I want it to return.
Date,Time,Ping,Download (Mbit/s),Upload (Mbit/s),myip
05/30/20,12:31,75.158,14.29,19.54,Biglobe 111.111.111.111
Or may be,
05/30/20,12:31,75.158,14.29,19.54,Biglobe,111.111.111.111
Here is the code that I am using. And thank you for any help you can provide.
import os
import re
import subprocess
import time
response = subprocess.Popen(‘/usr/local/bin/speedtest-cli’, shell=True, stdout=subprocess.PIPE).stdout.read().decode(‘utf-8’)
ping = re.findall(‘km]:\s(.*?)\s’, response, re.MULTILINE)
download = re.findall(‘Download:\s(.*?)\s’, response, re.MULTILINE)
upload = re.findall(‘Upload:\s(.*?)\s’, response, re.MULTILINE)
myip = re.findall(‘from\s(.*?)\s’, response, re.MULTILINE)
ping = ping[0].replace(‘,’, ‘.’)
download = download[0].replace(‘,’, ‘.’)
upload = upload[0].replace(‘,’, ‘.’)
myip = myip[0]
try:
f = open(‘/home/pi/speedtest/speedtestz.csv’, ‘a+’)
if os.stat(‘/home/pi/speedtest/speedtestz.csv’).st_size == 0:
f.write(‘Date,Time,Ping,Download (Mbit/s),Upload (Mbit/s),myip\r\n’)
except:
pass
f.write(‘{},{},{},{},{},{}\r\n’.format(time.strftime(‘%m/%d/%y’), time.strftime(‘%H:%M’), ping, download, upload, myip))

Let me know if this works for you, it should do everything you're looking for
#!/usr/local/env python
import os
import csv
import time
import subprocess
from decimal import *
file_path = '/home/pi/speedtest/speedtestz.csv'
def format_speed(bits_string):
""" changes string bit/s to megabits/s and rounds to two decimal places """
return (Decimal(bits_string) / 1000000).quantize(Decimal('.01'), rounding=ROUND_UP)
def write_csv(row):
""" writes a header row if one does not exist and test result row """
# straight from csv man page
# see: https://docs.python.org/3/library/csv.html
with open(file_path, 'a+', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=',', quotechar='"')
if os.stat(file_path).st_size == 0:
writer.writerow(['Date','Time','Ping','Download (Mbit/s)','Upload (Mbit/s)','myip'])
writer.writerow(row)
response = subprocess.run(['/usr/local/bin/speedtest-cli', '--csv'], capture_output=True, encoding='utf-8')
# if speedtest-cli exited with no errors / ran successfully
if response.returncode == 0:
# from the csv man page
# "And while the module doesn’t directly support parsing strings, it can easily be done"
# this will remove quotes and spaces vs doing a string split on ','
# csv.reader returns an iterator, so we turn that into a list
cols = list(csv.reader([response.stdout]))[0]
# turns 13.45 ping to 13
ping = Decimal(cols[5]).quantize(Decimal('1.'))
# speedtest-cli --csv returns speed in bits/s, convert to bytes
download = format_speed(cols[6])
upload = format_speed(cols[7])
ip = cols[9]
date = time.strftime('%m/%d/%y')
time = time.strftime('%H:%M')
write_csv([date,time,ping,download,upload,ip])
else:
print('speedtest-cli returned error: %s' % response.stderr)

$/usr/local/bin/speedtest-cli --csv-header > speedtestz.csv
$/usr/local/bin/speedtest-cli --csv >> speedtestz.csv
output:
Server ID,Sponsor,Server Name,Timestamp,Distance,Ping,Download,Upload,Share,IP Address
Does that not get you what you're looking for? Run the first command once to create the csv with header row. Then subsequent runs are done with the append '>>` operator, and that'll add a test result row each time you run it
Doing all of those regexs will bite you if they or a library that they depend on decides to change their debugging output format
Plenty of ways to do it though. Hope this helps

read from text file and send in to aws sqs Fifo queue

I have a little issue here. I want to read from a text file using python and create queue and then send these lines from the text file into Amazon web services SQS(Simple Queue service). First of all, I`ve actually managed to do this using boto, but the problem is that the lines dont come in order, just randomly, like line 4,line 1, line 5 etc etc..
Here is my code:
import boto.sqs
conn = boto.sqs.connect_to_region("us-east-2",
aws_access_key_id='AKIAIJIQZG5TR3NMW3LQ',
aws_secret_access_key='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')
q = conn.create_queue('test')
with open('read.txt', 'r') as read_file:
from boto.sqs.message import RawMessage
for line in read_file:
m = RawMessage()
m.set_body(line)
q.write(m)
So, what to do? Well, we need to create an FIFO queue(which I also managed to do using boto3 in python), but now the problem is that I have problems reading the text file.. Here is the code i used to create a FIFO queue in SQS:
import boto3
AWS_ACCESS_KEY = 'AKIAIJIQZG5TR3NMW3LQ'
AWS_SECRET_ACCESS_KEY = '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'
sqs_client = boto3.resource(
'sqs',
aws_access_key_id=AWS_ACCESS_KEY,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
region_name='us-east-2'
)
queue_name = ('demo_queue.fifo')
response = sqs_client.create_queue(
QueueName=queue_name,
Attributes={
'FifoQueue': 'true',
'ContentBasedDeduplication': 'true'
}
)
with open('read.txt', 'r') as read_file:
from boto.sqs.message import RawMessage
for line in read_file:
m = RawMessage()
m.set_body(line)
queue_name.write(m)
Somebody know how to solve this? thanks.

Try replacing
queue_name.write(m)
with
response.write(m)
in your first piece of code. You should use the actual queue returned by get_queue_by_name
Also, when only specifying MessageBody and MessageGroupID in boto3, make sure that Content-Based deduplication is enabled for the queue, or specify a MessageDeduplicationId string, otherwise it will fail

PyMongo/Mongoengine equivalent of mongodump

Is there an equivalent function in PyMongo or mongoengine to MongoDB's mongodump? I can't seem to find anything in the docs.
Use case: I need to periodically backup a remote mongo database. The local machine is a production server that does not have mongo installed, and I do not have admin rights, so I can't use subprocess to call mongodump. I could install the mongo client locally on a virtualenv, but I'd prefer an API call.
Thanks a lot :-).

For my relatively small small database, I eventually used the following solution. It's not really suitable for big or complex databases, but it suffices for my case. It dumps all documents as a json to the backup directory. It's clunky, but it does not rely on other stuff than pymongo.
from os.path import join
import pymongo
from bson.json_utils import dumps
def backup_db(backup_db_dir):
client = pymongo.MongoClient(host=<host>, port=<port>)
database = client[<db_name>]
authenticated = database.authenticate(<uname>,<pwd>)
assert authenticated, "Could not authenticate to database!"
collections = database.collection_names()
for i, collection_name in enumerate(collections):
col = getattr(database,collections[i])
collection = col.find()
jsonpath = collection_name + ".json"
jsonpath = join(backup_db_dir, jsonpath)
with open(jsonpath, 'wb') as jsonfile:
jsonfile.write(dumps(collection))

The accepted answer is not working anymore. Here is a revised code:
from os.path import join
import pymongo
from bson.json_util import dumps
def backup_db(backup_db_dir):
client = pymongo.MongoClient(host=..., port=..., username=..., password=...)
database = client[<db_name>]
collections = database.collection_names()
for i, collection_name in enumerate(collections):
col = getattr(database,collections[i])
collection = col.find()
jsonpath = collection_name + ".json"
jsonpath = join(backup_db_dir, jsonpath)
with open(jsonpath, 'wb') as jsonfile:
jsonfile.write(dumps(collection).encode())
backup_db('.')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to speed up a splunk export? - python

Related

Redis Timeseries Pipeline with Python

Python flask server to retrieve certain records

Changing output of speedtest.py and speedtest-cli to include IP address in output .csv file

read from text file and send in to aws sqs Fifo queue

PyMongo/Mongoengine equivalent of mongodump

Categories

Resources