Searching for text in python without re module - python

Im creating a definition in python that use urllib2 to download a status page and loop until a condition is met.
The status page looks like the one below:
reportId:327686
reportName:report2
status:Running
percent_done:0
I need to;
Parse reportId and create a variable with this value
Loop until status is different than "Running"
Can I accomplish this without using re module? At the end I will need to convert this to an exe using pyinstaller so wanted to avoid loading a lot of modules to keep the program small.

This should do it:
import urllib2
def parse_data(raw_data): # Name this better
parsed_data = dict(line.split(':') for line in raw_data.splitlines())
parsed_data['reportId'] = int(parsed_data['reportId'])
parsed_data['percent_done'] = int(parsed_data['percent_done'])
return parsed_data
def get_parsed_data_from_url(url): # Name this better
raw_data = urllib2.urlopen(url).read()
parsed_data = parse_data(raw_data)
return parsed_data
parsed_data = get_parsed_data_from_url('http://example.com')
# And to loop until status != 'Running', you could do this..
while get_parsed_data_from_url('http://example.com')['status'] == 'Running':
do_some_stuff()

If that is your result and you don't have more HTML as That1Guy commented, you can probably use startswith and endswith. Something along the lines of (I've skipped a lot of checking and default values here!) ...
if line.startswith("reportId:"):
report_id = line.split(":")[1]
if line.startswith("status:"):
if line.endswith("Running") == false:
# abort processing

Since you have stable patterns to match to, it's pretty simple:
reportIds=[]
# not written here: load report into variable s
for line in s.splitlines():
if 'reportId' in line:
reportIds.append(line.split(':')[1])
if 'status' in line:
if not line.split(':')[1] == 'Running':
break
# not written here: pause for some period of time

Related

Why is "ObjectId('5efbe85b4aeb5d21e56fa81f')" not considered a valid ObjectId?

I am using PyMongo and I am trying to loop through an entire collection and display the ObjectId onto onto my Flask Web Page. However, when I write my method I keep getting the error "ObjectId('5efbe85b4aeb5d21e56fa81f')" is not a valid ObjectId.
The following is the code I am running
def get_class_names(self):
temp = list()
print("1")
for document_ in db.classes.find():
tempstr = document_.get("_id")
tempobjectid = ObjectId(tempstr)
temp.append(repr(tempobjectid))
print("2")
classes = list()
for class_ in temp:
classes.append(class_, Classes.get_by_id(class_).name)
return classes
How do I fix this?
Note: get_by_id, just takes in an ObjectId and finds it in the database.
The line
tempstr = document_.get("_id")
retrieves an ObjectId already. You then wrap it again in another ObjectId before calling repr on that. If you print(type(tempstr)), you'll see that it's an ObjectId.
Just do temp.append(tempstr).
BTW, you should rename the variable tempstr to tempId or something more appropriate.

How can I remove a value from a specific key in YAML with Python?

I have successfully worked out the addition of a value to a key in YAML with Python, and started to work on the reverse of it with reference to the code for addition. Here is my proposal of how the code work:
connected_guilds:
- 1
- 2
after the code is ran, the YAML file should be changed to:
connected_guilds:
- 1
Here is my code, however it didn't work, it ended up completely wiping out and the remaining is the -1 in the first YAML example I enclosed.
with open('guilds.yaml', 'r+') as guild_remove:
loader = yaml.safe_load(guild_remove)
content = loader['connected_guilds']
for server in content:
if server != guild_id:
continue
else:
content.remove(guild_id)
guild_remove.seek(0)
yaml.dump(content, guild_remove)
guild_remove.truncate()
I'd be grateful if anyone could help me out :D
Don't try to reimplement searching for the item to remove when Python already provides this to you:
with open('guilds.yaml', 'r+') as guild_remove:
content = yaml.safe_load(guild_remove)
content["connected_guilds"].remove(guild_id)
guild_remove.seek(0)
yaml.dump(content, guild_remove)
guild_remove.truncate()
Here is the solution(with reference to the addition code):
with open('guilds.yaml', 'r+') as guild_remove:
loader = yaml.safe_load(guild_remove)
content = loader['connected_guilds']
for server in content:
if server != guild_id:
continue
else:
content.remove(guild_id)
guild_remove.seek(0)
yaml.dump({'connected_guilds': content}, guild_remove)
guild_remove.truncate()

Repeat command in while loop or similar until not existing result is found

i need to repeat a cmd which always generated a new random string, after that i need to make sure that this specific string has not been generated before. I never did while loops before and im not able to figure out how to repate the cmd until a result has been found which is not already part of the database. I can't be to specific as this source is closed
all this is packed into a celery task
tasks.py
#app.task
def allocate_new_string(user_pk):
user = User.objects.get(pk=user_pk)
new_string = subprocess.Popen("$get_new_string_cmd", shell=True, stdout=subprocess.PIPE).communicate()[
0].decode('utf-8').strip()
try:
while True:
if Used_String.objects.filter(string=new_string, atom=0).exists():
new_string << how to repeat the command from above here until a new random string has been found?
else:
used_string = Used_String.objects.create(user=user, string=new_string, atom=0)
used_string.save()
logger.info(str("New String has been set)
except:
logger.info(str("Something went wrong while processing the task"))
The function i want to have here is: Search for a none existing sting until one has found that has never been generated before or is at least not part of the database.
the cmd im using isn't openSSL or something like that and it's quite likly that i hit two times the same random generated string.
Thanks in advance
Slight change.
#app.task
def allocate_new_string(user_pk):
user = User.objects.get(pk=user_pk)
try:
Found = True
while Found:
new_string = subprocess.Popen("$get_new_string_cmd", shell=True, stdout=subprocess.PIPE).communicate()[
0].decode('utf-8').strip()
if Used_String.objects.filter(string=new_string, atom=0).exists():
Found = True
else:
used_string = Used_String.objects.create(user=user, string=new_string, atom=0)
used_string.save()
logger.info(str("New String has been set"))
Found = False
except:
logger.info(str("Something went wrong while processing the task"))
I have not tested it but should work. Please try it out and let me know.

How to skip one part of a single loop iteration in Python

I am creating about 200 variables within a single iteration of a python loop (extracting fields from excel documents and pushing them to a SQL database) and I am trying to figure something out.
Let's say that a single iteration is a single Excel workbook that I am looping through in a directory. I am extracting around 200 fields from each workbook.
If one of these fields I extract (lets say field #56 out of 200) and it isn't in proper format (lets say the date was filled out wrong ie. 9/31/2015 which isnt a real date) and it errors out with the operation I am performing.
I want the loop to skip that variable and proceed to creating variable #57. I don't want the loop to completely go to the next iteration or workbook, I just want it to ignore that error on that variable and continue with the rest of the variables for that single loop iteration.
How would I go about doing something like this?
In this sample code I would like to continue extracting "PolicyState" even if ExpirationDate has an error.
Some sample code:
import datetime as dt
import os as os
import xlrd as rd
files = os.listdir(path)
for file in files: #Loop through all files in path directory
filename = os.fsdecode(file)
if filename.startswith('~'):
continue
elif filename.endswith( ('.xlsx', '.xlsm') ):
try:
book = rd.open_workbook(os.path.join(path,file))
except KeyError:
print ("Error opening file for "+ file)
continue
SoldModelInfo=book.sheet_by_name("SoldModelInfo")
AccountName=str(SoldModelInfo.cell(1,5).value)
ExpirationDate=dt.datetime.strftime(xldate_to_datetime(SoldModelInfo.cell(1,7).value),'%Y-%m-%d')
PolicyState=str(SoldModelInfo.cell(1,6).value)
print("Insert data of " + file +" was successful")
else:
continue
Use multiple try blocks. Wrap each decode operation that might go wrong in its own try block to catch the exception, do something, and carry on with the next one.
try:
book = rd.open_workbook(os.path.join(path,file))
except KeyError:
print ("Error opening file for "+ file)
continue
errors = []
SoldModelInfo=book.sheet_by_name("SoldModelInfo")
AccountName=str(SoldModelInfo.cell(1,5).value)
try:
ExpirationDate=dt.datetime.strftime(xldate_to_datetime(SoldModelInfo.cell(1,7).value),'%Y-%m-%d')
except WhateverError as e:
# do something, maybe set a default date?
ExpirationDate = default_date
# and/or record that it went wrong?
errors.append( [ "ExpirationDate", e ])
PolicyState=str(SoldModelInfo.cell(1,6).value)
...
# at the end
if not errors:
print("Insert data of " + file +" was successful")
else:
# things went wrong somewhere above.
# the contents of errors will let you work out what
As suggested you could use multiple try blocks on each of your extract variable, or you could streamline it with your own custom function that handles the try for you:
from functools import reduce, partial
def try_funcs(cell, default, funcs):
try:
return reduce(lambda val, func: func(val), funcs, cell)
except Exception as e:
# do something with your Exception if necessary, like logging.
return default
# Usage:
AccountName = try_funcs(SoldModelInfo.cell(1,5).value, "some default str value", str)
ExpirationDate = try_funcs(SoldModelInfo.cell(1,7).value), "some default date", [xldate_to_datetime, partial(dt.datetime.strftime, '%Y-%m-%d')])
PolicyState = try_funcs(SoldModelInfo.cell(1,6).value, "some default str value", str)
Here we use reduce to repeat multiple functions, and pass partial as a frozen function with arguments.
This can help your code look tidy without cluttering up with lots of try blocks. But the better, more explicit way is just handle the fields you anticipate might error out individually.
So, basically you need to wrap your xldate_to_datetime() call into try ... except
import datetime as dt
v = SoldModelInfo.cell(1,7).value
try:
d = dt.datetime.strftime(xldate_to_datetime(v), '%Y-%m-%d')
except TypeError as e:
print('Could not parse "{}": {}'.format(v, e)

try: and exception: error

So i am working on this code below. It complied alright when my Reff.txt has more than one line. But it doesnt work when my Reff.txt file has one line. Why is that? I also wondering why my code doesn't run "try" portion of my code but it always run only "exception" part.
so i have a reference file which has a list of ids (one id per line)
I use the reference file(Reff.txt) as a reference to search through the database from the website and the database from the server within my network.
The result i should get is there should be an output file and file with information of that id; for each reference id
However, this code doesn't do anything on my "try:" portion at all
import sys
import urllib2
from lxml import etree
import os
getReference = open('Reff.txt','r') #open the file that contains list of reference ids
global tID
for tID in getReference:
tID = tID.strip()
try:
with open(''+tID.strip()+'.txt') as f: pass
fileInput = open(''+tID+'.txt','r')
readAA = fileInput.read()
store_value = (readAA.partition('\n'))
aaSequence = store_value[2].replace('\n', '') #concatenate lines
makeList = list(aaSequence)#print makeList
inRange = ''
fileAddress = '/database/int/data/'+tID+'.txt'
filename = open(fileAddress,'r')#name of the working file
print fileAddress
with open(fileAddress,'rb') as f:
root = etree.parse(f)
for lcn in root.xpath("/protein/match[#dbname='PFAM']/lcn"):#find dbname =PFAM
start = int(lcn.get("start"))#if it is PFAM then look for start value
end = int(lcn.get("end"))#if it is PFAM then also look for end value
while start <= end:
inRange = makeList[start]
start += 1
print outputFile.write(inRange)
outputFile.close()
break
break
break
except IOError as e:
newURL ='http://www.uniprot.org/uniprot/'+tID+'.fasta'
print newURL
response = urllib2.urlopen(''+newURL) #go to the website and grab the information
creatNew = open(''+uniprotID+'.txt','w')
html = response.read() #read file
creatNew.write(html)
creatNew.close()
So, when you do Try/Except - if try fails, Except runs. Except is always running, because Try is always failing.
Most likely reason for this is that you have this - "print outputFile.write(inRange)", but you have not previously declared outputFile.
ETA: Also, it looks like you are only interested in testing to the first pass of the for loop? You break at that point. Your other breaks are extraneous in that case, because they will never be reached while that one is there.

Categories

Resources