Store Python PEP8 module output - python

I am using PEP8 module of python inside my code.
import pep8
pep8_checker = pep8.StyleGuide(format='pylint')
pep8_checker.check_files(paths=['./test.py'])
r = pep8_checker.check_files(paths=['./test.py'])
This is the output:
./test.py:6: [E265] block comment should start with '# '
./test.py:23: [E265] block comment should start with '# '
./test.py:24: [E302] expected 2 blank lines, found 1
./test.py:30: [W293] blank line contains whitespace
./test.py:35: [E501] line too long (116 > 79 characters)
./test.py:41: [E302] expected 2 blank lines, found 1
./test.py:53: [E501] line too long (111 > 79 characters)
./test.py:54: [E501] line too long (129 > 79 characters)
But this result is printed on terminal and the final value that is assigned to 'r' is 8 (i.e. total numbers of errors).
I want to store these errors in a variable. How can I do this?
EDIT:
here is the test.py file: http://paste.fedoraproject.org/347406/59337502/raw/

There are at least two ways to do this. The simplest is to redirect sys.stdout to a text file, then read the file at your leisure:
import pep8
import sys
saved_stdout = sys.stdout
sys.stdout = open('pep8.out', 'w')
pep8_checker = pep8.StyleGuide(format='pylint')
pep8_checker.check_files(paths=['./test.py'])
r = pep8_checker.check_files(paths=['./test.py'])
sys.stdout.close()
sys.stdout = saved_stdout
# Now you can read "pep.out" into a variable
Alternatively you can write to a variable using StringIO:
import pep8
import sys
# The module name changed between python 2 and 3
if sys.version_info.major == 2:
from StringIO import StringIO
else:
from io import StringIO
saved_stdout = sys.stdout
sys.stdout = StringIO()
pep8_checker = pep8.StyleGuide(format='pylint')
pep8_checker.check_files(paths=['./test.py'])
r = pep8_checker.check_files(paths=['./test.py'])
testout = sys.stdout.getvalue()
sys.stdout.close()
sys.stdout = saved_stdout
# testout contains the output. You might wish to testout.spilt("\n")

Related

Capture the last timestamp, without reading the complete file using Python

I am fairly new to python and I trying to capture the last line on a syslog file using python but unable to do so. This is a huge log file so I want to avoid loading the complete file in memory. I just want to read the last line of the file and capture the timestamp for further analysis.
I have the below code which captures all the timestamps into a python dict which take a really long time to run for it to get to the last timestamp once it completed my plan was to reverse the list and capture the first object in the index[0]:
The lastFile function uses glob module and gives me the most latest log file name which is being fed into recentEdit of the main function.
Is there a better way of doing this
Script1:
#!/usr/bin/python
import glob
import os
import re
def main():
syslogDir = (r'Location/*')
listOfFiles = glob.glob(syslogDir)
recentEdit = lastFile(syslogDir)
print(recentEdit)
astack=[]
with open(recentEdit, "r") as f:
for line in f:
result = [re.findall(r'\d{4}.\d{2}.\d{2}T\d{2}.\d{2}.\d{2}.\d+.\d{2}.\d{2}',line)]
print(result)
def lastFile(i):
listOfFiles = glob.glob(i)
latestFile = max(listOfFiles, key=os.path.getctime)
return(latestFile)
if __name__ == '__main__': main()
Script2:
###############################################################################
###############################################################################
#The readline() gives me the first line of the log file which is also not what I am looking for:
#!/usr/bin/python
import glob
import os
import re
def main():
syslogDir = (r'Location/*')
listOfFiles = glob.glob(syslogDir)
recentEdit = lastFile(syslogDir)
print(recentEdit)
with open(recentEdit, "r") as f:
fLastLine = f.readline()
print(fLastLine)
# astack=[]
# with open(recentEdit, "r") as f:
# for line in f:
# result = [re.findall(r'\d{4}.\d{2}.\d{2}T\d{2}.\d{2}.\d{2}.\d+.\d{2}.\d{2}',line)]
# print(result)
def lastFile(i):
listOfFiles = glob.glob(i)
latestFile = max(listOfFiles, key=os.path.getctime)
return(latestFile)
if __name__ == '__main__': main()
I really appreciate your help!!
Sincerely.
If you want to directly go,to the end of the file. Follow these steps:
1.Every time your program runs persist or store the last '\n' index.
2.If you have persisted index of last '\n' then you can directly seek to that index using
file.seek(yourpersistedindex)
3.after this when you call file.readline() you will get the lines starting from yourpersistedindex.
4.Store this index everytime your are running your script.
For Example:
you file log.txt has content like:
timestamp1 \n
timestamp2 \n
timestamp3 \n
import pickle
lastNewLineIndex = None
#here trying to read the lastNewLineIndex
try:
rfile = open('pickledfile', 'rb')
lastNewLineIndex = pickle.load(rfile)
rfile.close()
except:
pass
logfile = open('log.txt','r')
newLastNewLineIndex = None
if lastNewLineIndex:
#seek(index) will take filepointer to the index
logfile.seek(lastNewLineIndex)
#will read the line starting from the index we provided in seek function
lastLine = logfile.readline()
print(lastLine)
#tell() gives you the current index
newLastNewLineIndex = logfile.tell()
logfile.close()
else:
counter = 0
text = logfile.read()
for c in text:
if c == '\n':
newLastNewLineIndex = counter
counter+=1
#here saving the new LastNewLineIndex
wfile = open('pickledfile', 'wb')
pickle.dump(newLastNewLineIndex,wfile)
wfile.close()

Python 3.4: Trying to get full name results from nltk

I am a beginner in Python, and I am trying to collect the names from a txt and put them inside another txt file using NLTK. The issue is that only the first names are returned, without the surnames. Anything I can do? Here's the code:
import nltk
# function start
def extract_entities(text):
ind = len(text)-7
sub = text[ind:]
print(sub)
output.write('\nPRODID=='+sub+'\n\n')
for sent in nltk.sent_tokenize(text):
for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent))):
if hasattr(chunk, 'label'):
output.write(chunk.label()+':'+ ' '.join(c[0] for c in chunk.leaves())+'\n')
# function end
# main program
# -*- coding: utf-8 -*-
import sys
import codecs
sys.stdout = codecs.getwriter("iso-8859-1")(sys.stdout, 'xmlcharrefreplace')
if sys.stdout.encoding != 'cp850':
sys.stdout = codecs.getwriter('cp850')(sys.stdout.buffer, 'strict')
if sys.stderr.encoding != 'cp850':
sys.stderr = codecs.getwriter('cp850')(sys.stderr.buffer, 'strict')
file = open('C:\Python34\Description.txt', 'r')
output = open('C:\Python34\out.txt', 'w')
for line in file:
if not line : continue
extract_entities(line)
file.close()
output.close()
Thanks in advance for your answers!

Open, read, then store lines into a list in Python

So I've seen this code;
with open(fname) as f:
content = f.readlines()
in another question. I just need some confirmation on how it works.
If I were to have a file named normaltrack.py which contains code;
wall 0 50 250 10
wall 0 -60 250 10
finish 200 -50 50 100
I should have a list called wall = [] and have the opening code as;
with open(normaltrack.py) as f:
wall = f.readlines()
to open the file and store the lines of code that start with "wall" into the list?
Do I always have the change the "fname" everytime I want to open a different file? Or is there a way to do it from the interpreter? Such as python3 assignment.py < normaltrack.py ?
In your example:
with open(fname) as f:
content = f.readlines()
'fname' is a variable reference to a string. This string is the file path (either relative or absolute).
To read your example file, and generate a list of all lines that with 'wall', you can do this:
fname = '/path/to/normaltrack-example.txt' # this would be an absolute file path in Linux/Unix/Mac
wall = []
with open(fname) as the_file:
for line in the_file:
if line.startswith('wall'):
wall.append(line) # or wall.append(line.rstrip()) to remove the line return character
In general, it's best to not call 'readlines()' on a file object unless you control the file (that is, it's not something the user provides). This is because readlines will read the entire file into memory, which sucks when the file is multiple GBs.
Here's a quick and dirty script that does what you want.
import sys
if len(sys.argv) > 1:
infile = sys.argv[1]
else:
print("Usage: {} <infile>".format(sys.argv[0]))
sys.exit(1)
with open(infile, 'r') as f:
walls = []
for line in f:
if line.startswith('wall'):
walls.append(line.strip())
If you name this script 'read_walls.py', you can run it from the command line like this,
python read_walls.py normaltrack.py
Ordinarily, I'd use argparse to parse command-line arguments, and write a main() function for the code. (That makes it testable in the interactive python interpreter.)
this code should work for you
#!/usr/bin/env python
import sys
def read_file(fname):
call = []
with file(fname) as f:
call = f.readlines()
call = filter(lambda l: l.startswith('wall'), call)
return call
if __name__ == '__main__':
fname = sys.argv[1]
call = read_file(fname)
print call

Python 2.7 CSV writer issue

I have some Python code that lists pull requests in Github. If I print the parsed json output to the console, I get the expected results, but when I output the parsed json to a csv file, I'm not getting the same results. They are cut off after the sixth result (and that varies).
What I'm trying to do is overwrite the csv each time with the latest output.
Also, I'm dealing with unicode output which I use unicodecsv for. I don't know if this is throwing the csv output off.
I will list both instances of the relevant piece of code with the print statement and with the csv code.
Thanks for any help.
import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)
import csv
import unicodecsv
for pr in result:
data = pr.as_dict()
changes = (gh.repository('my-repo', repo).pull_request(data['number'])).as_dict()
if changes['commits'] == 1 and changes['changed_files'] == 1:
#keep print to console for testing purposes
print "Login: " + changes['user']['login'] + '\n' + "Title: " + changes['title'] + '\n' + "Changed Files: " + str(changes['changed_files']) + '\n' + "Commits: " + str(changes['commits']) + '\n'
With csv:
import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)
import csv
import unicodecsv
for pr in result:
data = pr.as_dict()
changes = (gh.repository('my-repo', repo).pull_request(data['number'])).as_dict()
if changes['commits'] == 1 and changes['changed_files'] == 1:
with open('c:\pull.csv', 'r+') as f:
csv_writer = unicodecsv.writer(f, encoding='utf-8')
csv_writer.writerow(['Login', 'Title', 'Changed files', 'Commits'])
for i in changes['user']['login'], changes['title'], str(changes['changed_files']), str(changes['commits']) :
csv_writer.writerow([changes['user']['login'], changes['title'],changes['changed_files'], changes['commits']])
The problem is with the way you write data to file.
Every time you open file in r+ mode you will overwrite the last written rows.
And for dealing with JSON

How python ConfigParser module translate the string "\r\n" to CRLF?

The testing code:
import ConfigParser
config = ConfigParser.ConfigParser()
config.read('test_config.ini')
orig_str = """line1\r\nline2"""
config_str = config.get('General', 'orig_str')
print orig_str == config_str
and the content of test_config.ini is:
[General]
orig_str = line1\r\nline2
What I want is the config_str can be the same value as the orig_str.
Any help would be appreciated!
If you want to write multiline configuration values, you just need to prepend a space at the beginning of the following lines. For example:
[General]
orig_str = line1
line2
returns 'line1\nline2' in my OS (Linux), but maybe in a different OS might return the '\r' as well. Otherwise, it would be easy to replace '\n' with '\r\n' as you need.
I'd say that whatever library you're using to encode the message should take care of the replacement. Anyway, if that's not the case, you can create your own ConfigParser as follows:
from ConfigParser import ConfigParser
class MyConfigParser(ConfigParser):
def get(self, section, option):
return ConfigParser.get(self, section, option).replace('\n', '\r\n')
config = MyConfigParser()
config.read('test_config.ini')
orig_str = """line1\r\nline2"""
config_str = config.get('General', 'orig_str')
print orig_str == config_str # Prints True

Categories

Resources