python logparse search specific text - python

I am using this function in my code to return the strings i want from reading the log file, I want to grep the "exim" process and return the results, but running the code gives no error, but the output is limited to three lines, how can i just get the output only related to exim process..
#output:
{'date': '13', 'process': 'syslogd', 'time': '06:27:33', 'month': 'May'}
{'date': '13', 'process': 'exim[23168]:', 'time': '06:27:33', 'month': 'May'}
{'May': ['syslogd']}
#function:
def generate_log_report(logfile):
report_dict = {}
for line in logfile:
line_dict = dictify_logline(line)
print line_dict
try:
month = line_dict['month']
date = line_dict['date']
time = line_dict['time']
#process = line_dict['process']
if "exim" in line_dict['process']:
process = line_dict['process']
break
else:
process = line_dict['process']
except ValueError:
continue
report_dict.setdefault(month, []).append(process)
return report_dict

It's because you have a break statement inside the if that checks for "exim". As soon as you find a line with "exim", you will stop processing entirely, which sounds like the opposite of what you want!
I think you want to remove the break and put your printout inside the if. If your question is about the return value of the function, you need to make much more significant changes, probably removing report_dict entirely and simply creating a list of line_dicts that have exim in their process fields.

i changed the code to this, but it gives me just one line as output??? anything missing...
#!/usr/bin/env python
import sys
def generate_log_report(logfile):
for line in logfile:
line_split = line.split()
list = [line_split[0], line_split[1], line_split[2], line_split[4]]
if "exim" in list[3]:
l = [line_split[0], line_split[1], line_split[2], line_split[4]]
else:
li = [line_split[0], line_split[1], line_split[2], line_split[4]]
return l
if __name__ == "__main__":
if not len(sys.argv) > 1:
print __doc__
sys.exit(1)
infile_name = sys.argv[1]
try:
infile = open(infile_name, "r")
except IOError:
print "you must specify a valid file"
print __doc__
sys.exit(1)
log_report = generate_log_report(infile)
print log_report
infile.close()

Related

I am trying to write data into file

When I run then i get error.why print man gave error and also os.getwd() also give error.But when i comment that then there is no error.code works according to expectation
from __future__ import print_function;
import os
man=[]
other = []
print os.getcwd()
try:
data = open("sketch.txt")
for each_line in data:
try:
(role,line_spoken) = each_line.split(':',1)
line_spoken = line_spoken.strip()
if role=='Man':
man.append(line_spoken)
elif role =='Other Man':
other.append(line_spoken)
except ValueError:
pass
data.close()
except IOError:
print ("The Data File is Missing")
print man
print other
try:
man_file = open('man_data.txt','w')
other_file = open('other_data.txt','w')
print (man,file = man_file)
print (other,file = other_file)
other_file.close()
man_file.close()
except IOError:
pass
You should call print as function, because you import print_function:
from __future__ import print_function
print("Hello World")
As far as I see the following.
1) In the first line there is a ';' that could be removed.
2) the second line 'import...' and the rest to the bottom have tabs that should be removed. These lines should be in the same col that line 1 ('from ...')
3) when you use 'print' (as other people are saying) you should use '(' & ')'.
4) for coherence you should get used to follow the same approach in all your code (good practiceS), if there are no spaces between function names and parameters (i.e. line 7: data = open("sketch...) then go on with them. The same for strings, the code compile, but it is better if you use ' or " not mix them along the code.
looking forward to help!

Creating loop for __main__

I am new to Python, and I want your advice on something.
I have a script that runs one input value at a time, and I want it to be able to run a whole list of such values without me typing the values one at a time. I have a hunch that a "for loop" is needed for the main method listed below. The value is "gene_name", so effectively, i want to feed in a list of "gene_names" that the script can run through nicely.
Hope I phrased the question correctly, thanks! The chunk in question seems to be
def get_probes_from_genes(gene_names)
import json
import urllib2
import os
import pandas as pd
api_url = "http://api.brain-map.org/api/v2/data/query.json"
def get_probes_from_genes(gene_names):
if not isinstance(gene_names,list):
gene_names = [gene_names]
#in case there are white spaces in gene names
gene_names = ["'%s'"%gene_name for gene_name in gene_names]**
api_query = "?criteria=model::Probe"
api_query= ",rma::criteria,[probe_type$eq'DNA']"
api_query= ",products[abbreviation$eq'HumanMA']"
api_query= ",gene[acronym$eq%s]"%(','.join(gene_names))
api_query= ",rma::options[only$eq'probes.id','name']"
data = json.load(urllib2.urlopen(api_url api_query))
d = {probe['id']: probe['name'] for probe in data['msg']}
if not d:
raise Exception("Could not find any probes for %s gene. Check " \
"http://help.brain- map.org/download/attachments/2818165/HBA_ISH_GeneList.pdf? version=1&modificationDate=1348783035873 " \
"for list of available genes."%gene_name)
return d
def get_expression_values_from_probe_ids(probe_ids):
if not isinstance(probe_ids,list):
probe_ids = [probe_ids]
#in case there are white spaces in gene names
probe_ids = ["'%s'"%probe_id for probe_id in probe_ids]
api_query = "? criteria=service::human_microarray_expression[probes$in%s]"% (','.join(probe_ids))
data = json.load(urllib2.urlopen(api_url api_query))
expression_values = [[float(expression_value) for expression_value in data["msg"]["probes"][i]["expression_level"]] for i in range(len(probe_ids))]
well_ids = [sample["sample"]["well"] for sample in data["msg"] ["samples"]]
donor_names = [sample["donor"]["name"] for sample in data["msg"] ["samples"]]
well_coordinates = [sample["sample"]["mri"] for sample in data["msg"] ["samples"]]
return expression_values, well_ids, well_coordinates, donor_names
def get_mni_coordinates_from_wells(well_ids):
package_directory = os.path.dirname(os.path.abspath(__file__))
frame = pd.read_csv(os.path.join(package_directory, "data", "corrected_mni_coordinates.csv"), header=0, index_col=0)
return list(frame.ix[well_ids].itertuples(index=False))
if __name__ == '__main__':
probes_dict = get_probes_from_genes("SLC6A2")
expression_values, well_ids, well_coordinates, donor_names = get_expression_values_from_probe_ids(probes_dict.keys())
print get_mni_coordinates_from_wells(well_ids)
whoa, first things first. Python ain't Java, so do yourself a favor and use a nice """xxx\nyyy""" string, with triple quotes to multiline.
api_query = """?criteria=model::Probe"
,rma::criteria,[probe_type$eq'DNA']
...
"""
or something like that. you will get white spaces as typed, so you may need to adjust.
If, like suggested, you opt to loop on the call to your function through a file, you will need to either try/except your data-not-found exception or you will need to handle missing data without throwing an exception. I would opt for returning an empty result myself and letting the caller worry about what to do with it.
If you do opt for raise-ing an Exception, create your own, rather than using a generic exception. That way your code can catch your expected Exception first.
class MyNoDataFoundException(Exception):
pass
#replace your current raise code with...
if not d:
raise MyNoDataFoundException(your message here)
clarification about catching exceptions, using the accepted answer as a starting point:
if __name__ == '__main__':
with open(r"/tmp/genes.txt","r") as f:
for line in f.readlines():
#keep track of your input data
search_data = line.strip()
try:
probes_dict = get_probes_from_genes(search_data)
except MyNoDataFoundException, e:
#and do whatever you feel you need to do here...
print "bummer about search_data:%s:\nexception:%s" % (search_data, e)
expression_values, well_ids, well_coordinates, donor_names = get_expression_values_from_probe_ids(probes_dict.keys())
print get_mni_coordinates_from_wells(well_ids)
You may want to create a file with Gene names, then read content of the file and call your function in the loop. Here is an example below
if __name__ == '__main__':
with open(r"/tmp/genes.txt","r") as f:
for line in f.readlines():
probes_dict = get_probes_from_genes(line.strip())
expression_values, well_ids, well_coordinates, donor_names = get_expression_values_from_probe_ids(probes_dict.keys())
print get_mni_coordinates_from_wells(well_ids)

Python refresh file from disk

I have a python script that calls a system program and reads the output from a file out.txt, acts on that output, and loops. However, it doesn't work, and a close investigation showed that the python script just opens out.txt once and then keeps on reading from that old copy. How can I make the python script reread the file on each iteration? I saw a similar question here on SO but it was about a python script running alongside a program, not calling it, and the solution doesn't work. I tried closing the file before looping back but it didn't do anything.
EDIT:
I already tried closing and opening, it didn't work. Here's the code:
import subprocess, os, sys
filename = sys.argv[1]
file = open(filename,'r')
foo = open('foo','w')
foo.write(file.read().rstrip())
foo = open('foo','a')
crap = open(os.devnull,'wb')
numSolutions = 0
while True:
subprocess.call(["minisat", "foo", "out"], stdout=crap,stderr=crap)
out = open('out','r')
if out.readline().rstrip() == "SAT":
numSolutions += 1
clause = out.readline().rstrip()
clause = clause.split(" ")
print clause
clause = map(int,clause)
clause = map(lambda x: -x,clause)
output = ' '.join(map(lambda x: str(x),clause))
print output
foo.write('\n'+output)
out.close()
else:
break
print "There are ", numSolutions, " solutions."
You need to flush foo so that the external program can see its latest changes. When you write to a file, the data is buffered in the local process and sent to the system in larger blocks. This is done because updating the system file is relatively expensive. In your case, you need to force a flush of the data so that minisat can see it.
foo.write('\n'+output)
foo.flush()
I rewrote it to hopefully be a bit easier to understand:
import os
from shutil import copyfile
import subprocess
import sys
TEMP_CNF = "tmp.in"
TEMP_SOL = "tmp.out"
NULL = open(os.devnull, "wb")
def all_solutions(cnf_fname):
"""
Given a file containing a set of constraints,
generate all possible solutions.
"""
# make a copy of original input file
copyfile(cnf_fname, TEMP_CNF)
while True:
# run minisat to solve the constraint problem
subprocess.call(["minisat", TEMP_CNF, TEMP_SOL], stdout=NULL,stderr=NULL)
# look at the result
with open(TEMP_SOL) as result:
line = next(result)
if line.startswith("SAT"):
# Success - return solution
line = next(result)
solution = [int(i) for i in line.split()]
yield solution
else:
# Failure - no more solutions possible
break
# disqualify found solution
with open(TEMP_CNF, "a") as constraints:
new_constraint = " ".join(str(-i) for i in sol)
constraints.write("\n")
constraints.write(new_constraint)
def main(cnf_fname):
"""
Given a file containing a set of constraints,
count the possible solutions.
"""
count = sum(1 for i in all_solutions(cnf_fname))
print("There are {} solutions.".format(count))
if __name__=="__main__":
if len(sys.argv) == 2:
main(sys.argv[1])
else:
print("Usage: {} cnf.in".format(sys.argv[0]))
You take your file_var and end the loop with file_var.close().
for ... :
ga_file = open(out.txt, 'r')
... do stuff
ga_file.close()
Demo of an implementation below (as simple as possible, this is all of the Jython code needed)...
__author__ = ''
import time
var = 'false'
while var == 'false':
out = open('out.txt', 'r')
content = out.read()
time.sleep(3)
print content
out.close()
generates this output:
2015-01-09, 'stuff added'
2015-01-09, 'stuff added' # <-- this is when i just saved my update
2015-01-10, 'stuff added again :)' # <-- my new output from file reads
I strongly recommend reading the error messages. They hold quite a lot of information.
I think the full file name should be written for debug purposes.

Why do I keep getting this error in map-reduce while using mincemeat?

I just want to calculate word count from some 7500 files with some condition on which words to count. The program goes like this.
import glob
import mincemeat
text_files = glob.glob('../fldr/2/*')
def file_contents(file_name):
f = open(file_name)
try:
return f.read()
finally:
f.close()
source = dict((file_name, file_contents(file_name))
for file_name in text_files)
def mapfn(key, value):
for line in value.splitlines():
list2 = [ ]
for temp in line.split("::::"):
list2.append(temp)
if (list2[0] == '5'):
for review in list2[1].split():
yield [review.lower(),1]
def reducefn(key, value):
return key, len(value)
s = mincemeat.Server()
s.datasource = source
s.mapfn = mapfn
s.reducefn = reducefn
results = s.run_server(password="wola")
print results
The error I get while running this program is
error: uncaptured python exception, closing channel <__main__.Client connected at 0x250f990>
(<type 'exceptions.IndexError'>:list index out of range
[C:\Python27\lib\asyncore.py|read|83]
[C:\Python27\lib\asyncore.py|handle_read_event|444]
[C:\Python27\lib\asynchat.py|handle_read|140]
[mincemeat.py|found_terminator|96]
[mincemeat.py|process_command|194]
[mincemeat.py|call_mapfn|170]
[projminc2.py|mapfn|21])
Take a look at what's in list2 e.g. by doing
print(list2)
or with a debugger. If you do this you'll see that list2 only has one element so list2[1] isn't valid.
(You don't really want to split on "::::" - that's a typo in your script).

Python Dictionary Throwing KeyError for Some Reason

In some code I pass a dictionary (TankDict) a string from a list. This throws a KeyError, no matter what letter I put in. When I copied and pasted the dictionary out of the context of the program and passed in the same letters from a list, they came out correctly. I have also run type(TankDict) and it comes back as 'dict'.
Here is the dictionary:
TankDict = {'E':0, 'F':1, 'G':2, 'H':3, 'I':4, 'J':5,
'K':6, 'L':7, 'M':8, 'N':9,
'O':10, 'P':11, 'Q':12, 'R':13, 'S':14, 'T':15,
'U':16, 'V':17, 'W':18, 'X':19}
The error:
enter code herechannelData[1] = tank_address_dict[channelData[1]]
KeyError: 'L'
(tank_address_dict is a function argument into which TankDict is passed)
the contents of channelData: ['447', 'L', '15', 'C']
Can anyone tell me the (probably simple) reason that this happens?
EDIT: Code!
This is the function where the error is:
def getTankID(channel,tank_address_dict,PTM_dict,channel_ref):
rawChannelData = 'NA'
for line in channel_ref:
if str(channel) in line: rawChannelData = line
if(rawChannelData == 'NA'): return -1;
channelData = rawChannelData.split(' ')
channelData.extend(['',''])
channelData[1] = channelData[1][:-1]
channelData[3] = channelData[1][-1]
channelData[1] = channelData[1][:-1]
channelData[2] = channelData[1][1:]
channelData[1] = channelData[1][:1]
print channelData #debug
print 'L' in tank_address_dict
print 'E' in tank_address_dict
print 'O' in tank_address_dict
print 'U' in tank_address_dict
print type(tank_address_dict)
channelData[1] = tank_address_dict[channelData[1]]
channelData[3] = PTM_dict[channelData[3]]
return(channelData[1:])
This is the function that calls it:
def runFile(model, datafile, time_scale, max_PEs, tank_address_dict, PMT_dict, channel_ref):
#add initSerial for ser0-4
while(True):
raw_data = datafile.readline() #intake data
if(raw_data == ''): break #End while loop if the file is done
data = raw_data.split(' ') #break up the parts of each line
del data[::2] #delete the human formatting
data[2] = data[2][:-1] #rm newline (NOTE: file must contain blank line at end!)
TankID = getTankID(data[0], tank_address_dict, PMT_dict,channel_ref)
if(TankID == -1):
print '!---Invalid channel number passed by datafile---!'; break #check for valid TankID
model[TankID[0]][TankID[1]][TankID[2]] = scale(data[2],(0,max_PEs),(0,4096))
createPackets(model)
#updateModel(ser0,ser1,ser2,ser3,ser4,packet)
data[2] = data[2]*time_scale #scale time
time.sleep(data[2]) #wait until the next event
print data #debug
if(TankID != -1): print '---File',datafile,'finished---' #report errors in file run
else: print '!---File',datafile,'finished with error---!'
And this is the code that calls that:
import hawc_func
import debug_options
#begin defs
model = hawc_func.createDataStruct() #create the data structure
TankDict = hawc_func.createTankDict() #tank grid coordinate conversion table
PTMDict = hawc_func.createPMTDict() #PMT conversion table
log1 = open('Logs/log1.txt','w') #open a logfile
data = open('Data/event.txt','r') #open data
channel_ref = open('aux_files/channel_map.dat','r')
time_scale = 1 #0-1 number to scale nano seconds? to seconds
#end defs
hawc_func.runFile(model,data,4000,TankDict,PTMDict,time_scale,channel_ref)
#hawc_func.runFile(model,data,TankDict,PTMDict)
#close files
log1.close()
data.close()
#end close files
print '-----Done-----' #confirm tasks finished
tank_address_dict is created through this function, run by the 3rd block of code, then passed on through the other two:
def createTankDict():
TankDict = {'E':0, 'F':1, 'G':2, 'H':3, 'I':4, 'J':5,
'K':6, 'L': 7, 'M':8, 'N':9,
'O':10, 'P':11, 'Q':12, 'R':13, 'S':14, 'T':15,
'U':16, 'V': 17, 'W':18, 'X':19}
return TankDict
You are not passing your arguments correctly.
def runFile(model, datafile, time_scale, max_PEs, tank_address_dict, PMT_dict, channel_ref):
hawc_func.runFile(model,data,4000,TankDict,PTMDict,time_scale,channel_ref)
Here, you have max_PEs = TankDict.
That may not be your only problem. Fix that first, and if you are still having problems, update your post with your fixed code and then tell us what your new error is.

Categories

Resources