Why is this function taking so long to evaluate? - python

I am attempting to make a function that records a few strings to a logfile on a server. For some reason, this function takes forever to run through..about 20 seconds before it will return the Exception. I think it's the try: statement with the file open command.
Any ideas how I can do this correctly?
def writeUserRecord():
""" Given a path, logs user name, fetch version, and time"""
global fetchVersion
global fetchHome
filename = 'users.log'
logFile = os.path.normpath(os.path.join(fetchHome, filename))
timeStamp = str(datetime.datetime.now()).split('.')[0]
userID = getpass.getuser()
try:
file = open(logFile, 'a')
file.write('{} {} {}'.format(userID, timeStamp, fetchVersion))
file.close()
except IOError:
print('Error Accessing Log File')
pass

Related

Opening a file, if not existent, create it

I'm using two JSON files, one for storing and loading device variables and another one for mqtt infos. I'm using a load_config function to load the correct file and then load it as JSON. When the file exists, it works without any problem, but when the file is not existing, it throws a file not found error, obviously. but My function contains an exception block to handle this by creating the file, but it isn't called. Here's my code for the function:
def load_config(config_path):
with open(config_path) as f: #Config
try:
return json.load(f)
except OSError:
print("file not there, creating it")
open(config_path, "w")
except json.JSONDecodeError:
return {}
f.close()
I call that function like this:
DEVICE_PATH = 'config.json'
MQTT_PATH = 'mqtt.json'
conf = load_config(DEVICE_PATH) #load device config
mqtt_conf = load_config(MQTT_PATH) #load mqtt config
mqtt_broker_ip = mqtt_conf['ip'] #setup mqtt
mqtt_broker_port = mqtt_conf['port']
mqtt_user = mqtt_conf['username']
mqtt_pass = mqtt_conf['password']
client = mqtt.Client()
client.on_connect = on_connect
client.on_message = on_message
client.username_pw_set(mqtt_user, password=mqtt_pass)
client.connect(mqtt_broker_ip, mqtt_broker_port, keepalive = 60, bind_address="" )
what am I doing wrong? When I open the file directly with the load_config via with open(config_path, "a") as f: everything in it gets deleted, with x it just throws an exception if the file exists and with w, it gets also overwritten.
What you are trying to accomplish is already an open() built-in functionality.
Just skip the whole file existence check and load the JSON in w+ mode:
with open("file.json", "w+") as f:
try:
data = json.load(f)
except JSONDecodeError:
data = {}
w+ opens any file in read and write mode and creates the filename if it doesn't exist.
Keep in mind that dumping any data with this mode will entirely overwrite any existing file's content.
Just as a side note, you might need to explore some basic knowledge about file processing, to avoid being stuck with a similar issue very soon.
Since I had a logic error, the exception IOError would never been raised. I opened the file and tried to load into json. Now I just check beforehands, if the file not exists, and create it.
def load_config(config_path):
if not os.path.isfile(config_path):
open(config_path, "w+")
with open(config_path) as f: #Config
try:
return json.load(f)
except json.JSONDecodeError:
return {}

how to Improve execution time of importing data in python

Below code take 2.5 seconds to import a log file with 1 million lines of code.
Is there a better way to the code and also decrease the execution time ?
""" This code is used to read the log file into the memory and convert into the data frame
Once the log file is loaded ,every item in the IPQuery file checked if exist and result is print onto the console"""
#importing python modules required for this script to perform operations
import pandas as pd
import time
import sys
#code to check the arguments passed """
if len(sys.argv)!= 3:
raise ValueError(""" PLEASE PASS THE BOTH LOG FILE AND IPQUERY FILE AS INPUT TO SCRIPT
ex: python program.py log_file query_file """)
# extracting file names from command line """
log_file_name=sys.argv[1]
query_file_name = sys.argv[2]
start = time.time()#capturing time instance
#Reading the content from the log file into dataframe log_df """
log_df = pd.read_csv(log_file_name," ",header=None ,names = ['DATE','TIME', 'IPADDR','URL','STATUS'],skip_blank_lines = True)
#Reading the content from the IPquery file into the data frame query_df """
query_df = pd.read_csv(query_file_name," ",header=None,skip_blank_lines=True )
#Cheking if the IP address exists in the log file"""
Ipfound = query_df.isin(log_df.IPADDR).astype(int)
#print all the results to the Query results onto the stdout"""
for items in Ipfound[0]:
print items
print "Execution Time of this script is %f" %(time.time() - start)
#importing python modules required for this script to perform operations
import time
import sys
start = time.time()#capturing time instance
class IpQuery:
"""Below methods contain the functionality to read file paths ,import log and query data
and provide the result to the console """
def __init__(self):
self.log_file_name= ""
self.query_file_name = ""
self.logset = {}
self.IPlist= []
def Inputfiles(self):
"""code to check the arguments passed and throw an error """
if len(sys.argv)!= 3:
raise ValueError(""" PLEASE PASS THE BOTH LOG FILE AND IPQUERY FILE AS INPUT TO SCRIPT
ex: python program.py log_file query_file """)
# extracting file names from command line
self.log_file_name=sys.argv[1]
self.query_file_name = sys.argv[2]
def read_logfile(self):
#Reading the log data
with open(self.log_file_name,'r') as f:
self.logset = {line.split(' ')[2] for line in f if not line.isspace()}
def read_Queryfile(self):
#Reading the query file into the dataframe"""
with open(self.query_file_name,'r') as f:
self.IPlist = [line.rstrip('\n') for line in f if not line.isspace() ]
def CheckIpAdress(self):
#Ip address from query file ae checked against the log file """
dummy= self.logset.intersection(set(self.IPlist))
for element in self.IPlist:
if element in dummy:
print "1"
else :
print "0"
try:
#Create an instance of the IpQuery file
msd=IpQuery()
#Extracting the input file information
msd.Inputfiles()
#Importing the Ip information from the log files
msd.read_logfile()
#Importing the Ipquery information from the query file
msd.read_Queryfile()
#Searching for the Ip in log file
msd.CheckIpAdress()
except IOError:
print "Error: can\'t find file or read data"
except ValueError :
print "PLEASE PASS THE BOTH LOG FILE AND IPQUERY FILE AS INPUT TO SCRIPT "

Python cx_Oracle and csv extracts saved differently with different executions

I'm working on a Python Script that runs queries against an Oracle Database and saves the results to csv. The greater plan is to use regular extracts with a separate application to check differences in the files through hashing.
The issue I've run into is that my script has so far saved some fields in the extracts in different formats. For example, saving a field as an integer in one extract and as a float in the next, and saving a date at 2000/01/01 in one and 2000-01-01 in another.
These changes are giving my difference check script a fit. What can I do to ensure that every extract is saved the same way, while keeping the script generic enough to run arbitrary queries?
import sys
import traceback
import cx_Oracle
from Creds import creds
import csv
import argparse
import datetime
try:
conn = cx_Oracle.connect(
'{username}/{password}#{address}/{dbname}'.format(**creds)
)
except cx_Oracle.Error as e:
print('Unable to connect to database.')
print()
print(''.join(traceback.format_exception(*sys.exc_info())), file=sys.stderr)
sys.exit(1)
def run_extract(query, out):
"""
Run the given query and save results to given out path.
:param query: Query to be executed.
:param out: Path to store results in csv.
"""
cur = conn.cursor()
try:
cur.execute(query)
except cx_Oracle.DatabaseError as e:
print('Unable to run given query.')
print()
print(query)
print()
print(''.join(traceback.format_exception(*sys.exc_info())), file=sys.stderr)
sys.exit(1)
with open(out, 'w', newline='') as out_file:
wrt = csv.writer(out_file)
header = []
for column in cur.description:
header.append(column[0])
wrt.writerow(header)
for row in cur:
wrt.writerow(row)
cur.close()
def read_sql(file_path):
"""
Read the SQL from a given filepath and return as a string.
:param file_path: File path location of the file to read.
"""
try:
with open(file_path, 'r') as file:
return file.read()
except FileNotFoundError as e:
print('File not found a given path.')
print()
print(file_path)
print()
print(''.join(traceback.format_exception(*sys.exc_info())), file=sys.stderr)
sys.exit(1)
def generate_timestamp_path(path):
"""
Add a timestamp to the beginning of the given path.
:param path: File path for the timestamp to be added to.
"""
if '/' in args.out_file:
sep = '/'
elif '\\' in args.out_file:
sep = '\\'
else:
sep = ''
path = args.out_file.split(sep)
stamp = datetime.datetime.now().strftime('%Y%m%dT%H%M%S ')
path[-1] = stamp + path[-1]
return sep.join(path)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
in_group = parser.add_mutually_exclusive_group()
in_group.add_argument('-q', '--query', help='String of the query to run.')
in_group.add_argument('-f', '--in_file', help='File of the query to run.')
parser.add_argument('-o', '--out_file', help='Path to file to store.')
parser.add_argument('-t', '--timestamp',
help='Store the file with a preceding timestamp.',
action='store_true')
args = parser.parse_args()
if not args.out_file:
print('Please provide a path to put the query results with -o.')
sys.exit(1)
if args.timestamp:
path = generate_timestamp_path(args.out_file)
else:
path = args.out_file
if args.query:
query = args.query
elif args.in_file:
query = read_sql(args.in_file)
else:
print('Please provide either a query string with -q',
'or a SQL file with -f.')
sys.exit(1)
run_extract(query, path)
Your code is simply using the default transformations for all data types. Note that an Oracle type of number(9) will be returned as an integer but number by itself will be returned as a float. You may wish to use an outputtypehandler in order to place this under your control a bit more firmly. :-) Examples for doing so are in the cx_Oracle distribution samples directory (ReturnLongs and ReturnUnicode). All of that said, though, the same data definition in Oracle will always return the same type in Python -- so I would suspect that you are referencing different data types in Oracle that you would prefer to see processed the same way.

Write line with timestamp+message to file

I want to create a logfile that adds every time an error occurs a new line to a textfile log.txt. I am pretty new to python, so maybe I miss something...but everytime an error occurs, the log.txt is overwritten and only the current error message is displayed although the error message is different every time (due to timestamp) and I added a \n.
Thats my code so far:
import os
import sys
import time
import datetime
try:
path = sys.argv[1]
ts = time.time()
sttime = datetime.datetime.fromtimestamp(ts).strftime('%Y%m%d_%H:%M:%S - ')
#some more things but nothing else of interest for here
except:
error = "ERROR! No file 'bla' found!"
log = 'log.txt'
logfile = file(log, "w")
logfile.write(sttime + error + '\n')
logfile.close()
sys.exit(0)
Maybe you can help me out here. Do I need a loop somewhere? I tried to create an empty string (error = "") that adds the error message to log.txt with += each time an error occurs, but that didn't work at all :-/
Thank you!
Open the file in append mode as 'w' mode will truncate the file each time., i.e
logfile = open(log, "a")
And you should use with:
with open(log, 'a') as logfile:
logfile.write(sttime + error + '\n')
No need to close the file, this will happen automatically.
Note that if the exception is raised at path = sys.argv[1], the timestamp might not be set when you try to log. It would be better to get the timestamp in the logging code.
Also, you should not use a bare except clause, but at least catch the exception and report it.
from datetime import datetime
except Exception, exc:
sttime = datetime.now().strftime('%Y%m%d_%H:%M:%S - ')
error = "ERROR! {}".format(exc)
log = 'log.txt'
with open(log, 'a') as logfile:
logfile.write(sttime + error + '\n')
raise
# sys.exit(0)
When you do file(log, 'W'). The file log will become empty. If you want to add something you should use a instead of w:
open(log, "a")
class Logger(object):
def __init__(self, n):
self.n = n
self.count = 0
self.log = open('log.txt', 'a')
def write(self, message):
self.count+=1
if self.count<self.n:
self.log.write("%s %s"% (time,message))
self.log.flush()
import sys
sys.stdout= Logger()
time -- is time string formatted the way you want.
Now regular print function will write to file.

How to write Big files into Blobstore using experimental API?

I have dilemma.. I'm uploading files both in scribd store and blobstore using tipfy as framework.
I have webform with action is not created by blobstore.create_upload_url (i'm just using url_for('myhandler')). I did it because if i'm using blobstore handler the POST response parsed and I cannot use normal python-scribd api to upload file into scribd store.
Now I have working scribd saver:
class UploadScribdHandler(RequestHandler, BlobstoreUploadMixin):
def post(self):
uploaded_file = self.request.files.get('upload_file')
fname = uploaded_file.filename.strip()
try:
self.post_to_scribd(uploaded_file, fname)
except Exception, e:
# ... get the exception message and do something with it
msg = e.message
# ...
# reset the stream to zero (beginning) so the file can be read again
uploaded_file.seek(0)
#removed try-except to see debug info in browser window
# Create the file
file_name = files.blobstore.create(_blobinfo_uploaded_filename=fname)
# Open the file and write to it
with files.open(file_name, 'a') as f:
f.write(uploaded_file.read())
# Finalize the file. Do this before attempting to read it.
files.finalize(file_name)
# Get the file's blob key
blob_key = files.blobstore.get_blob_key(file_name)
return Response('done')
def post_to_scribd(self, uploaded_file, fname):
errmsg =''
uploaded_file = self.request.files.get('upload_file')
fname = uploaded_file.filename.strip()
fext = fname[fname.rfind('.')+1:].lower()
if (fext not in ALLOWED_EXTENSION):
raise Exception('This file type does not allowed to be uploaded\n')
if SCRIBD_ENABLED:
doc_title = self.request.form.get('title')
doc_description = self.request.form.get('description')
doc_tags = self.request.form.get('tags')
try:
document = scribd.api_user.upload(uploaded_file, fname, access='private')
#while document.get_conversion_status() != 'DONE':
# time.sleep(2)
if not doc_title:
document.title = fname[:fname.rfind('.')]
else:
document.title = doc_title
if not doc_description:
document.description = 'This document was uploaded at ' + str(datetime.datetime.now()) +'\n'
else:
document.description = doc_description
document.tags = doc_tags
document.save()
except scribd.ResponseError, err:
raise Exception('Scribd failed: error code:%d, error message: %s\n' % (err.errno, err.strerror))
except scribd.NotReadyError, err:
raise Exception('Scribd failed: error code:%d, error message: %s\n' % (err.errno, err.strerror))
except:
raise Exception('something wrong exception')
As you can see it also saves file into blobstore.. But If i'm uploading big file (i.e. 5Mb) I'm receiving
RequestTooLargeError: The request to API call file.Append() was too large.
Request: docs.upload(access='private', doc_type='pdf', file=('PK\x03\x04\n\x00\x00\x00\x00\x00"\x01\x10=\x00\x00(...)', 'test.pdf'))
How can I fix it?
Thanks!
You need to make multiple, smaller calls to the file API, for instance like this:
with files.open(file_name, 'a') as f:
data = uploaded_file.read(65536)
while data:
f.write(data)
data = uploaded_file.read(65536)
Note that the payload size limit on regular requests to App Engine apps is 10MB; if you want to upload larger files, you'll need to use the regular blobstore upload mechanism.
finally i found solution.
Nick Johneson's answer occurred attribute error because uploaded_file is treated as string.
string didn't have read() method.
Cause string doesn't have method read(), i spliced file string and write it just like he wrote.
class UploadRankingHandler(webapp.RequestHandler):
def post(self):
fish_image_file = self.request.get('file')
file_name = files.blobstore.create(mime_type='image/png', _blobinfo_uploaded_filename="testfilename.png")
file_str_list = splitCount(fish_image_file,65520)
with files.open(file_name, 'a') as f:
for line in file_str_list:
f.write(line)
you can check about splitCount(). here
http://www.bdhwan.com/entry/gaewritebigfile

Categories

Resources