Parse line of JSON using Python - python

I have this method:
lines = rec.split("\n")
rec = ''
size = len(lines)
i=0
for line in lines:
try:
self.on_data(json.load(line))
except:
logging.warning ('warning, could not parse line:', line)
if i == size - 1:
# if it is the last element, we can keep it, since it might not be complete
rec+=line
finally:
i += 1
I am getting this error:
Message: 'warning, could not parse line:'
Arguments: ('{"readersCount":0,"uuid":"17f5fe87-5140-4f34-ac32-d325beb6b2a1","key":"bar","lockRequestCount":0,"type":"lock","acquired":true}',)
it looks like I need to read this first element of a tuple or something? the JSON looks ok?

As stated in the comments already, you need self.on_data(json.loads(line)):
lines = rec.split("\n")
rec = ''
size = len(lines)
i=0
for line in lines:
try:
self.on_data(json.loads(line))
except:
logging.warning ('warning, could not parse line:', line)
if i == size - 1:
# if it is the last element, we can keep it, since it might not be complete
rec+=line
finally:
i += 1

Related

Extract IP addresses from text file without using REGEX

I am trying to extract IPv4 addresses from a text file and save them as a list to a new file, however, I can not use regex to parse the file, Instead, I have check the characters individually. Not really sure where to start with that, everything I find seems to have import re as the first line.
So far this is what I have,
#Opens and prints wireShark txt file
fileObject = open("wireShark.txt", "r")
data = fileObject.read()
print(data)
#Save IP adresses to new file
with open('wireShark.txt') as fin, open('IPAdressess.txt', 'wt') as fout:
list(fout.write(line) for line in fin if line.rstrip())
#Opens and prints IPAdressess txt file
fileObject = open("IPAdressess.txt", "r")
data = fileObject.read()
print(data)
#Close Files
fin.close()
fout.close()
So I open the file, and I have created the file that I will put the extracted IP's in, I just don't know ow to pull them without using REGEX.
Thanks for the help.
Here is a possible solution.
The function find_first_digit, position the index at the next digit in the text if any and return True. Else return False
The functions get_dot and get_num read a number/dot and, lets the index at the position just after the number/dot and return the number/dot as str. If one of those functions fails to get the number/dot raise an MissMatch exception.
In the main loop, find a digit, save the index and then try to get an ip.
If sucess, write it to output file.
If any of the called functions raises a MissMatch exception, set the current index to the saved index plus one and start over.
class MissMatch(Exception):pass
INPUT_FILE_NAME = 'text'
OUTPUT_FILE_NAME = 'ip_list'
def find_first_digit():
while True:
c = input_file.read(1)
if not c: # EOF found!
return False
elif c.isdigit():
input_file.seek(input_file.tell() - 1)
return True
def get_num():
num = input_file.read(1) # 1st digit
if not num.isdigit():
raise MissMatch
if num != '0':
for i in range(2): # 2nd 3th digits
c = input_file.read(1)
if c.isdigit():
num += c
else:
input_file.seek(input_file.tell() - 1)
break
return num
def get_dot():
if input_file.read(1) == '.':
return '.'
else:
raise MissMatch
with open(INPUT_FILE_NAME) as input_file, open(OUTPUT_FILE_NAME, 'w') as output_file:
while True:
ip = ''
if not find_first_digit():
break
saved_position = input_file.tell()
try:
ip = get_num() + get_dot() \
+ get_num() + get_dot() \
+ get_num() + get_dot() \
+ get_num()
except MissMatch:
input_file.seek(saved_position + 1)
else:
output_file.write(ip + '\n')

How to use better 'try' and 'except' in my code?

now i have this code and i need to use better the function try and except and improve the code, like which parts i should change of place
this is the beginning of my code:
contador = 0
name = input("Put the name of the file:")
while name != "close":
validation=0
try:
file = open(name,"r",1,"utf-8")
validation = validation + 1
except FileNotFoundError:
validation = validation
if validation >= 1:
Games=[]
countrylist = []
lines = 0
File = open(name,"r")
line = File.readline().strip()
while line != "":
parts= line.split(";")
country=parts[0]
game= parts[1]
sales= int(parts[2])
price= float(parts[3])
format= parts[4]
Games.append(parts)
countrylist.append(country)
line = File.readline().strip()
lines = lines + 1
contador = contador + 1
I don't know exactly the objective of the code, however.
I had to work out how would the file be structured by the code Correct me if I'm wrong but I believe that the file is meant to have a list of parameters separated by ";" and each line being an entry in that list.
You do nothing with the data, in any case just breaking the file into a list of parameters and sending said list of lists back would be enough for a function and then you could do the separation later
So that I could see that the code was doing what I wanted I added a print at the end to get the result
This is the code I ended with I tried to explain most of the issues in comment (probably a bad idea and I shall be berated by this till the end of ages)
# Why is there a global counter
# contador = 0
name = None # you need to declare the name before the loop
# check if the name is empty instead of an arbitrary name
while name != "":
name = input("Put the name of the file:")
# have the call defenition of the name in the loop so you can run the
# loop until the anme is "" (nothing)
# otherwhise if you don't break on the catch block it will loop forever
# since the name will be constant inside the loop
try:
File = open(file=name,encoding="utf-8").read()
# when using a function and you don't want to use the arguments
Games=[]
countrylist = []
# lines = 0
lst = File.strip().split("\n") # break the whole text into lines
for line in lst: # iterate over the list of lines
# seperate it into a list of data
parts= line.strip().split(";") #make each line into a list that you can adress
# elem[0] -> county
countrylist.append(parts[0]) # here you can just append directly isntead of saving extra variables
# same as the previous example
Games.append(parts[1])
sales= int(parts[2])
price= float(parts[3].replace(",","."))
style = parts[4] # format is already an existing function you shoudn't name your variable like that
# line = File.readline().strip() -> you don't need to prepare the next line since all lines are
# already in the array lst
# lines += 1
# contador += 1
# you don't need to count the lines let the language do that for you
# and why do you need a counter in the first place
# you were using no for loops or doing any logic based around the number of lines
# the only logic you were doing is based on their
print(parts)
except FileNotFoundError as e0:
print("File not found: " + str(e0))
except ValueError as e1 :
print("Value Error: " + str(e1))
For a text file with the format:
Portugal;Soccer;1000;12.5;dd/mm/yyyy
England;Cricket;2000;13,5;mm/dd/yyyy
Spain;Ruggby;1500;11;yyyy/dd/mm
I got an output in the form of:
['Portugal', 'Soccer', '1000', '12.5', 'dd/mm/yyyy']
['England', 'Cricket', '2000', '13,5', 'mm/dd/yyyy']
['Spain', 'Ruggby', '1500', '11', 'yyyy/dd/mm']

Line count of decoded binary content from json response

I have some code below that counts the number of lines in decoded GitHub binary content and then looks for percent change based on the changes count of a file. This is contained in a loop within a loop in an if/else statement. What I have now works, but it outputs the results of each individual file in the pull request. I would like to write the if/else just once if any of the results in the set of returned files meet the condition in the if statement (else print no file has changed) and then move on to the next set for evaluation.
found = False
for data in repo.pull_request(prs.number).files():
if data.filename.endswith((".png",".jpeg",".gif")):
pass
else:
for files_content in [repo.blob(data.sha)]:
binary_coded_content = io.BytesIO((base64.b64decode(files_content.content)))
tempfile = 'temp'
with open(tempfile,'wb') as f:
f.write(binary_coded_content.read())
num_lines = sum(1 for line in open(tempfile, encoding='utf8') if line.rstrip())
if data.changes_count/num_lines > 0.25:
found = True
break
if found:
print("A file has changed by more than 25%", '\n')
else:
print("No file has changed by more than 25%", '\n')
If I understand correctly you want to create a found variable and test it outside the loop like this
found = False # <----
for data in repo.pull_request(prs.number).files():
if data.filename.endswith((".png",".jpeg",".gif")):
pass
else:
for files_content in [repo.blob(data.sha)]:
binary_coded_content = io.BytesIO((base64.b64decode(files_content.content)))
tempfile = 'temp'
with open(tempfile,'wb') as f:
f.write(binary_coded_content.read())
num_lines = sum(1 for line in open(tempfile, encoding='utf8') if line.rstrip())
if data.changes_count/num_lines > 0.25:
found = True # <----
break #<--- break inner loop
if found:
break #<--- break outer loop
if found: # after the loop ended, we check if we found something...
print("A file has changed by more than 25%")
else:
print("No file has changed by more than 25%")

Python XML Parsing With Minidom Using Exception Handling

I am in the process of stripping a couple million XMLs of sensitive data. How can I add a try and except to get around this error which seems to have occurred because a couple of malformed xmls out to the bunch.
xml.parsers.expat.ExpatError: mismatched tag: line 1, column 28691
#!/usr/bin/python
import sys
from xml.dom import minidom
def getCleanString(word):
str = ""
dummy = 0
for character in word:
try:
character = character.encode('utf-8')
str = str + character
except:
dummy += 1
return str
def parsedelete(content):
dom = minidom.parseString(content)
for element in dom.getElementsByTagName('RI_RI51_ChPtIncAcctNumber'):
parentNode = element.parentNode
parentNode.removeChild(element)
return dom.toxml()
for line in sys.stdin:
if line > 1:
line = line.strip()
line = line.split(',', 2)
if len(line) > 2:
partition = line[0]
id = line[1]
xml = line[2]
xml = getCleanString(xml)
xml = parsedelete(xml)
strng = '%s\t%s\t%s' %(partition, id, xml)
sys.stdout.write(strng + '\n')
Catching exceptions is straight forward. Add import xml to your import statements and wrap the problem code in a try/except handler.
def parsedelete(content):
try:
dom = minidom.parseString(content)
except xml.parsers.expat.ExpatError, e:
# not sure how you want to handle the error... so just passing back as string
return str(e)
for element in dom.getElementsByTagName('RI_RI51_ChPtIncAcctNumber'):
parentNode = element.parentNode
parentNode.removeChild(element)
return dom.toxml()

Optimizing python file search?

I'm having some trouble optimizing this part of code.
It works, but seems unnecessary slow.
The function searches after a searchString in a file starting on line line_nr and returns the line number for first hit.
import linecache
def searchStr(fileName, searchString, line_nr = 1, linesInFile):
# The above string is the input to this function
# line_nr is needed to search after certain lines.
# linesInFile is total number of lines in the file.
while line_nr < linesInFile + 1:
line = linecache.getline(fileName, line_nr)
has_match = line.find(searchString)
if has_match >= 0:
return line_nr
break
line_nr += 1
I've tried something along these lines, but never managed to implement the "start on a certain line number"-input.
Edit: The usecase. I'm post processing analysis files containing text and numbers that are split into different sections with headers. The headers on line_nr are used to break out chunks of the data for further processing.
Example of call:
startOnLine = searchStr(fileName, 'Header 1', 1, 10000000):
endOnLine = searchStr(fileName, 'Header 2', startOnLine, 10000000):
Why don't you start with simplest possible implementation ?
def search_file(filename, target, start_at = 0):
with open(filename) as infile:
for line_no, line in enumerate(infile):
if line_no < start_at:
continue
if line.find(target) >= 0:
return line_no
return None
I guess your file is like:
Header1 data11 data12 data13..
name1 value1 value2 value3...
...
...
Header2 data21 data22 data23..
nameN valueN1 valueN2 valueN3..
...
Does the 'Header' string contains any constant formats(i.e: all start with '#' or sth). If so, you can read the line directly, judge if the line contains this format (i.e: if line[0]=='#') and write different code for different kinds of lines(difination line and data line in your example).
Record class:
class Record:
def __init__(self):
self.data={}
self.header={}
def set_header(self, line):
...
def add_data(self, line):
...
iterate part:
def parse(p_file):
record = None
for line in p_file:
if line[0] == "#":
if record : yield record
else:
record = Record()
record.set_header(line)
else:
record.add_data(line)
yield record
main func:
data_file = open(...)
for rec in parse(data_file):
...

Categories

Resources