File pointer - sending and reception in a function - python

Python 2.7
Download a test file from: www.py4inf.com/code/mbox.txt
Briefly, I need list all lines that start with 'From' and take only the mail address. Selecting line by line.
If the condition is true, write in other file the ( only) mail address (result). I could wrote the code and it is working. But It would be better if I use functions. I crashed when I tried to pass the parameters. I have a problem when I have a function that receive a parameter and send one or two.
The result is: copy line by line ALL input file in output file almost like a recursion. No search nothing and the file is very big.
At last, have you any page to read about funtions, paramt, passing paramt, pass reference, and other. Ask is easy and I prefer read and try to understand and if I have a problem, light a cande in the middle of the night!.
#Li is the input paramet. Line from fileRead(the pointer of the file).
#if the condition is true select all lines that start with From:
def funFormat(li):
if li.startswith('From:'):
li = li.rstrip('')
li = li.replace('From: ',' \n')
return li?
fileRead=open('mbox_small.txt','r')
for eachline in fileRead:
result=funFormat(eachline)
fileWrite =open('Resul73.txt', 'a')
fileWrite.write( result )
fileRead.close()
fileWrite.close()

You're opening a file every time you need to write, and closing it just at the end. Maybe that's what's messing it up? Try this and let me know if it works -
#Li is the input paramet. Line from fileRead(the pointer of the file).
#if the condition is true select all lines that start with From:
def funFormat(li):
if li.startswith('From:'):
li = li.rstrip('')
li = li.replace('From: ',' \n')
return li
else:
return None
fileRead = open('mbox_small.txt', 'r')
fileWrite = open('Resul73.txt', 'a')
for eachline in fileRead:
result=funFormat (eachline)
if result:
fileWrite.write (result)
fileRead.close()
fileWrite.close()
Also, I suggest you to read up on with blocks. This will help you work with files more efficiently. As for functions, there are enough resources online.

Related

Python try finally statement to run another file

I am having an issue getting my try and finally statement to execute properly. I am trying to get another Python file to execute once a user has interacted with the first program.For example, once the first program is run the user will be asked to scan their tag which will create a unique user id for that user; after their tag is scanned a second python file will be executed. My problem is that the second file is constantly being run as soon as the first file is executed regardless if the tag is scanned first or not. I have added my code below with comments to help explain better. Any thoughts?
import RPi.GPIO as GPIO
from mfrc522 import SimpleMFRC522
# Second File being ran
import medform
reader = SimpleMFRC522()
try:
# user id being created
c = string.ascii_letters + string.digits
op = "".join(choice(c) for x in range(randint(8,16)))
# Printed before tag is scanned
print("Please Scan tag " )
reader.write(op + op)
# if tag is scanned / id created open second file
if reader.write(op + op):
os.system('python medform.py')
else:
print("Scan Tag First" )
# Print after tag is scanned
print("Scan Complete")
finally:
GPIO.cleanup()
importing a file runs it, there are 2 ways to do what you want:
import the file when you want it to run
define a main function in the other file that you can run from the first one instead of having all the code in the top level
the second option is the best one in most cases, as you normally would not want a file to actually do stuff on import.
so in the second file you would have:
def main():
# the code you want to run (make sure to indent it)
then in the first you can have:
import medform
# ...
medform.main()

How to check for and discard invalid multi-line JSON log requests in log files?

I'm writing a script to parse some of our requests, and I need to be able to handle when we have a malformed or incomplete requests. So for example, a typical request would come in with the following format:
log-prefix: {JSON request data}\n
all on a single line, etc...
Then I found out that they have a character buffer limit of 1024 in their writer, so the requests could be spread across many lines, like so:
log-prefix: {First line of data
log-prefix: Second line of requests data
log-prefix: Final line of log data}\n
I'm able to handle this by just calling next on the iterator I'm using, and then removing the prefix, concatenating the requests, and then passing it to json.loads to return my dictionary that I need for writing to a file.
I'm doing that in the following way:
lines = (line.strip('\n') for line in inf.readlines())
for line in lines:
if not line.endswith('}'):
bad_lines = [line]
while not line.endswith('}'):
line = next(lines)
bad_lines.append(line)
form_line = malformed_data_handler(bad_lines)
else:
form_line = parse_out_json(line)
And my functions used in the above code are:
def malformed_data_handler(lines: Sequence) -> dict:
"""
Takes n malformed lines of bridge log data (where the JSON response has
been split across n lines, all containing prefixes) and correctly
delegates the parsing to parse_out_json before returning the concatenated
result as a dictionary.
:param lines: An iterable with malformed lines as the elements
:return: A dictionary ready for writing.
"""
logger.debug('Handling malformed data.')
parsed = ''
logger.debug(lines)
print(lines)
for line in lines:
logger.info('{}'.format(line))
parsed += parse_out_malformed(line)
logger.debug(parsed)
return json.loads(parsed, encoding='utf8')
def parse_out_json(line: str) -> dict:
"""
Parses out the JSON response returned from the Apache Bridge logs. Takes a
line and removes the prefix, returning a dictionary.
:param line:
:return:
"""
data = slice(line.find('{'), None)
return json.loads(line[data], encoding='utf8')
def parse_out_malformed(line: str) -> str:
prefix = 'bridge-rails: '
data = slice(line.find(prefix), None)
parsed = line[data].replace(prefix, '')
return parsed
So now to my problem, I've now found instances where the log data can look like this:
log-prefix: {First line of data
....
log-prefix: Last line of data (No closing brace)
log-prefix: {New request}
My first though to handle this was to add some sort of check to see if '{' in line. Since I'm using a generator for scalability to process the lines, I don't know that I have found one of these requests until I have already called next and pulled the line out of the line generator, and at that point I can't re-append it, and I'm not sure how to efficiently tell my process to then start from that line and continue normally.

Loading lines with Selenium

In Python 2.7, using Selenium, I want the script to open the 'masters.txt' file. Then, go to Twitter (while logged in beforehand), and post the line like so:
driver.get("https://twitter.com")
elem = driver.find_element_by_name('tweet')
elem.send_keys(line1 + " #worked")
elem = driver.find_element_by_xpath('//*[#id="timeline"]/div[2]/div/form/div[2]/div[2]/button')
elem.send_keys(Keys.RETURN)
then, the same, except:
elem.send_keys(line2 + " #worked")
then, the same, except:
elem.send_keys(line3 + " #worked")
etc...
So, load every single line & tweet it + "with another text I've added". How is that possible?
Thanks in advance! :)
EDIT: Example of 'master.txt' (text file) contents:
test
blablabla
awesome
This is an awesome Tweet
something
another line
etc...
What I've tried:
f = open("masters.txt")
lines = f.readlines()
for line in lines:
elem.send_keys(line + " #worked")
However that doesn't exactly work and messes it up etc... If someone could complete my code and write me what I'm looking for, that would be great! :)

How to remove one of the two duplicate blocks in a file?

I have a difficult problem. I know there are so many 're' masters in python out there. So please help me. I have a huge log file. The format is something like this:
[text hello world yadda
lines lines lines
exceptions]
[something i'm not interested in]
[text hello world yadda
lines lines lines
exceptions]
And so on...
So Block 1 and 3 are same. And there are multiple cases like this. My ques is how can I read this file and write in an output file only the unique blocks? If there's a duplicate, it should be written only once. And sometimes there are multiple blocks in between two duplicate blocks. I'm actually pattern matching and this is the code as of now. It only matches the pattern but doesn't do anything about duplicates.
import re
import sys
from itertools import islice
try:
if len(sys.argv) != 3:
sys.exit("You should enter 3 parameters.")
elif sys.argv[1] == sys.argv[2]:
sys.exit("The two file names cannot be the same.")
else:
file = open(sys.argv[1], "r")
file1 = open(sys.argv[2],"w")
java_regex = re.compile(r'[java|javax|org|com]+?[\.|:]+?', re.I) # java
at_regex = re.compile(r'at\s', re.I) # at
copy = False # flag that control to copy or to not copy to output
for line in file:
if re.search(java_regex, line) and not (re.search(r'at\s', line, re.I) or re.search(r'mdcloginid:|webcontainer|c\.h\.i\.h\.p\.u\.e|threadPoolTaskExecutor|caused\sby', line, re.I)):
# start copying if "java" is in the input
copy = True
else:
if copy and not re.search(at_regex, line):
# stop copying if "at" is not in the input
copy = False
if copy:
file1.write(line)
file.close()
file1.close()
except IOError:
sys.exit("IO error or wrong file name.")
except IndexError:
sys.exit('\nYou must enter 3 parameters.') #prevents less than 3 inputs which is mandatory
except SystemExit as e: #Exception handles sys.exit()
sys.exit(e)
I don't care if this has to be in this code(removing duplicates). It can be in a separate .py file also. Doesn't matter
This is the original snippet of the log file:
javax.xml.ws.soap.SOAPFaultException: Uncaught BPEL fault http://schemas.xmlsoap.org/soap/envelope/:Server
at org.apache.axis2.jaxws.marshaller.impl.alt.MethodMarshallerUtils.createSystemException(MethodMarshallerUtils.java:1326) ~[org.apache.axis2.jar:na]
at org.apache.axis2.jaxws.marshaller.impl.alt.MethodMarshallerUtils.demarshalFaultResponse(MethodMarshallerUtils.java:1052) ~[org.apache.axis2.jar:na]
at org.apache.axis2.jaxws.marshaller.impl.alt.DocLitBareMethodMarshaller.demarshalFaultResponse(DocLitBareMethodMarshaller.java:415) ~[org.apache.axis2.jar:na]
at org.apache.axis2.jaxws.client.proxy.JAXWSProxyHandler.getFaultResponse(JAXWSProxyHandler.java:597) ~[org.apache.axis2.jar:na]
at org.apache.axis2.jaxws.client.proxy.JAXWSProxyHandler.createResponse(JAXWSProxyHandler.java:537) ~[org.apache.axis2.jar:na]
at org.apache.axis2.jaxws.client.proxy.JAXWSProxyHandler.invokeSEIMethod(JAXWSProxyHandler.java:403) ~[org.apache.axis2.jar:na]
at org.apache.axis2.jaxws.client.proxy.JAXWSProxyHandler.invoke(JAXWSProxyHandler.java:188) ~[org.apache.axis2.jar:na]
com.hcentive.utils.exception.HCRuntimeException: Unable to Find User Profile:null
at com.hcentive.agent.service.AgentServiceImpl.getAgentByUserProfile(AgentServiceImpl.java:275) ~[agent-service-core-4.0.0.jar:na]
at com.hcentive.agent.service.AgentServiceImpl$$FastClassByCGLIB$$e3caddab.invoke(<generated>) ~[cglib-2.2.jar:na]
at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:191) ~[cglib-2.2.jar:na]
at org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint(Cglib2AopProxy.java:689) ~[spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) ~[spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:110) ~[spring-tx-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) ~[spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.security.access.intercept.aopalliance.MethodSecurityInterceptor.invoke(MethodSecurityInterceptor.java:64) ~[spring-security-core-3.1.2.RELEASE.jar:3.1.2.RELEASE]
javax.xml.ws.soap.SOAPFaultException: Uncaught BPEL fault http://schemas.xmlsoap.org/soap/envelope/:Server
at org.apache.axis2.jaxws.marshaller.impl.alt.MethodMarshallerUtils.createSystemException(MethodMarshallerUtils.java:1326) ~[org.apache.axis2.jar:na]
at org.apache.axis2.jaxws.marshaller.impl.alt.MethodMarshallerUtils.demarshalFaultResponse(MethodMarshallerUtils.java:1052) ~[org.apache.axis2.jar:na]
at org.apache.axis2.jaxws.marshaller.impl.alt.DocLitBareMethodMarshaller.demarshalFaultResponse(DocLitBareMethodMarshaller.java:415) ~[org.apache.axis2.jar:na]
at org.apache.axis2.jaxws.client.proxy.JAXWSProxyHandler.getFaultResponse(JAXWSProxyHandler.java:597) ~[org.apache.axis2.jar:na]
at org.apache.axis2.jaxws.client.proxy.JAXWSProxyHandler.createResponse(JAXWSProxyHandler.java:537) ~[org.apache.axis2.jar:na]
at org.apache.axis2.jaxws.client.proxy.JAXWSProxyHandler.invokeSEIMethod(JAXWSProxyHandler.java:403) ~[org.apache.axis2.jar:na]
And so on and on....
you can remove duplicate blocks with this:
import re
yourstr = r'''
[text hello world yadda
lines lines lines
exceptions]
[something i'm not interested in]
[text hello world yadda
lines lines lines
exceptions]
'''
pat = re.compile(r'\[([^]]+])(?=.*\[\1)', re.DOTALL)
result = pat.sub('', yourstr)
Note that only the last block is preserved, If you want the first you must reverse the string and use this pattern:
(][^[]+)\[(?=.*\1\[)
and then reverse the string again.
You could use a hashing algorithm like in hashlib and a dictionary that looks like this: {123456789: True}
the value is not important but a dict makes it significantly faster than a list if its a big file.
Anyway you can hash each block as you come along it and store it in the dictionary as long as its not in the dictionary. If it is in the dictionary then ignore the block. That's assuming your blocks are structured absolutely identical.

try: and exception: error

So i am working on this code below. It complied alright when my Reff.txt has more than one line. But it doesnt work when my Reff.txt file has one line. Why is that? I also wondering why my code doesn't run "try" portion of my code but it always run only "exception" part.
so i have a reference file which has a list of ids (one id per line)
I use the reference file(Reff.txt) as a reference to search through the database from the website and the database from the server within my network.
The result i should get is there should be an output file and file with information of that id; for each reference id
However, this code doesn't do anything on my "try:" portion at all
import sys
import urllib2
from lxml import etree
import os
getReference = open('Reff.txt','r') #open the file that contains list of reference ids
global tID
for tID in getReference:
tID = tID.strip()
try:
with open(''+tID.strip()+'.txt') as f: pass
fileInput = open(''+tID+'.txt','r')
readAA = fileInput.read()
store_value = (readAA.partition('\n'))
aaSequence = store_value[2].replace('\n', '') #concatenate lines
makeList = list(aaSequence)#print makeList
inRange = ''
fileAddress = '/database/int/data/'+tID+'.txt'
filename = open(fileAddress,'r')#name of the working file
print fileAddress
with open(fileAddress,'rb') as f:
root = etree.parse(f)
for lcn in root.xpath("/protein/match[#dbname='PFAM']/lcn"):#find dbname =PFAM
start = int(lcn.get("start"))#if it is PFAM then look for start value
end = int(lcn.get("end"))#if it is PFAM then also look for end value
while start <= end:
inRange = makeList[start]
start += 1
print outputFile.write(inRange)
outputFile.close()
break
break
break
except IOError as e:
newURL ='http://www.uniprot.org/uniprot/'+tID+'.fasta'
print newURL
response = urllib2.urlopen(''+newURL) #go to the website and grab the information
creatNew = open(''+uniprotID+'.txt','w')
html = response.read() #read file
creatNew.write(html)
creatNew.close()
So, when you do Try/Except - if try fails, Except runs. Except is always running, because Try is always failing.
Most likely reason for this is that you have this - "print outputFile.write(inRange)", but you have not previously declared outputFile.
ETA: Also, it looks like you are only interested in testing to the first pass of the for loop? You break at that point. Your other breaks are extraneous in that case, because they will never be reached while that one is there.

Categories

Resources