selectively copying from an input file - python

My assignment calls for 3 modules- fileutility, choices, and selectiveFileCopy, the last of which imports the first two.
The purpose is to be able to selectively copy pieces of text from an input file, then write it to an output file, determined by the "predicate" in the choices module. As in, either copy everything (choices.always), if a specific string is present(choices.contains(x)), or by length (choices.shorterThan(x)).
So far, I have only the always() working, but it must take in one parameter, but my professor specifically stated the parameter could be anything, even nothing(?). Is this possible? If so, how do I write my definition so that it works?
The second part of this very long question is why my other two predicates don't work. When I tested them with docstests(another part of the assignment), they all passed.
Here's some code:
fileutility(I've been told this function is meaningless, but its part of the assignment so...)-
def safeOpen(prompt:str, openMode:str, errorMessage:str ):
while True:
try:
return open(input(prompt),openMode)
except IOError:
return(errorMessage)
choices-
def always(x):
"""
always(x) always returns True
>>> always(2)
True
>>> always("hello")
True
>>> always(False)
True
>>> always(2.1)
True
"""
return True
def shorterThan(x:int):
"""
shorterThan(x) returns True if the specified string
is shorter than the specified integer, False is it is not
>>> shorterThan(3)("sadasda")
False
>>> shorterThan(5)("abc")
True
"""
def string (y:str):
return (len(y)<x)
return string
def contains(pattern:str):
"""
contains(pattern) returns True if the pattern specified is in the
string specified, and false if it is not.
>>> contains("really")("Do you really think so?")
True
>>> contains("5")("Five dogs lived in the park")
False
"""
def checker(line:str):
return(pattern in line)
return checker
selectiveFileCopy-
import fileutility
import choices
def selectivelyCopy(inputFile,outputFile,predicate):
linesCopied = 0
for line in inputFile:
if predicate == True:
outputFile.write(line)
linesCopied+=1
inputFile.close()
return linesCopied
inputFile = fileutility.safeOpen("Input file name: ", "r", " Can't find that file")
outputFile = fileutility.safeOpen("Output file name: ", "w", " Can't create that file")
predicate = eval(input("Function to use as a predicate: "))
print("Lines copied =",selectivelyCopy(inputFile,outputFile,predicate))

So far, I have only the always() working, but it must take in one
parameter, but my professor specifically stated the parameter could be
anything, even nothing(?). Is this possible? If so, how do I write my
definition so that it works?
You can use a default argument:
def always(x=None): # x=None when you don't give a argument
return True
The second part of this very long question is why my other two
predicates don't work. When I tested them with docstests(another part
of the assignment), they all passed.
Your predicates do work, but they are functions that need to be called:
def selectivelyCopy(inputFile,outputFile,predicate):
linesCopied = 0
for line in inputFile:
if predicate(line): # test each line with the predicate function
outputFile.write(line)
linesCopied+=1
inputFile.close()
return linesCopied

Related

Finding pattern in binary file?

I have this two functions:
def make_regex_from_hex_sign(hex_sign):
regex_hex_sign = re.compile(hex_sign.decode('hex'))
return regex_hex_sign
def find_regex_pattern_and_return_its_offset(regex_pattern, bytes_array):
if found_regex_pattern in regex_pattern.finditer(bytes_array):
return found_regex_pattern.start()
else:
return 0
and i'm using them like this:
pattern = make_regex_from_hex_sign("634351535F")
file = open('somefile.bin', 'rb')
allbytes = file.read()
offset = find_regex_pattern_and_return_its_offset(pattern, allbytes)
Python throws: NameError: global name 'found_regex_pattern' is not defined
If i replace if with for in if found_regex_pattern in regex_pattern.finditer(bytes_array) it works, but then i need to break at the end to stop it from searching past first found pattern iteration. Is there more elegant way to solve this without using for and break?
You did not define found_regex_pattern.
When you do the change from if to for it works because its a valid syntax and that means that found_regex_pattern acts as an entry of the regex_pattern.finditer(bytes_array) iterable.

How to extract substrings from a masked Python string?

I'm writing an HTTP Request Handler with intuitive routing. My goal is to be able to apply a decorator to a function which states the HTTP method being used as well as the path to be listened on for executing the decorated function. Here's a sample of this implementation:
#route_handler("GET", "/personnel")
def retrievePersonnel():
return personnelDB.retrieveAll()
However, I also want to be able to add variables to the path. For example, /personnel/3 would fetch a personnel with an ID of 3. The way I want to go about doing this is providing a sort of 'variable mask' to the path passed into the route_handler. A new example would be:
#route_handler("GET", "/personnel/{ID}")
def retrievePersonnelByID(ID):
return personnelDB.retrieveByID(ID)
The decorator's purpose would be to compare the path literal (/personnel/3 for example) with the path 'mask' (/personnel/{ID}) and pass the 3 into the decorated function. I'm assuming the solution would be to compare the two strings, keep the differences, and place the difference in the literal into a variable named after the difference in the mask (minus the curly braces). But then I'd also have to check to see if the literal matches the mask minus the {} variable catchers...
tl;dr - is there a way to do
stringMask("/personnel/{ID}", "/personnel/5") -> True, {"ID": 5}
stringMask("/personnel/{ID}", "/flowers/5") -> False, {}
stringMask("/personnel/{ID}", "/personnel") -> False, {}
Since I'm guessing there isn't really an easy solution to this, I'm gonna post the solution I did. I was hoping there would be something I could do in a few lines, but oh well ¯_(ツ)_/¯
def checkPath(self, mask):
mask_parts = mask[1:].split("/")
path_parts = self.path[1:].rstrip("/").split("/")
if len(mask_parts) != len(path_parts):
self.urlVars = {}
return False
vars = {}
for i in range(len(mask_parts)):
if mask_parts[i][0] == "{":
vars[mask_parts[i][1:-1]] = path_parts[i]
else:
if mask_parts[i] != path_parts[i]:
self.urlVars = {}
return False
self.url_vars = vars # save extracted variables
return True
A mask is just a string like one of the ones below:
/resource
/resource/{ID}
/group/{name}/resource/{ID}

Looping through list of functions in a function in Python dynamically

I'd like to see if it's possible to run through a list of functions in a function. The closest thing I could find is looping through an entire module. I only want to use a pre-selected list of functions.
Here's my original problem:
Given a string, check each letter to see if any of the 5 tests fulfill.
If a minimum of 1 letter passes a check, return True.
If all letters in the string fails the check, return False.
For each letter in the string, we will check these functions: isalnum(), isalpha(), isdigit(), islower(), isupper()
The result of each test should print to different lines.
Sample Input
qA2
Sample Output (must print to separate lines, True if at least one letter passes, or false is all letters fail each test):
True
True
True
True
True
I wrote this for one test. Of course I could just write 5 different sets of code but that seems ugly. Then I started wondering if I could just loop through all the tests they're asking for.
Code for just one test:
raw = 'asdfaa3fa'
counter = 0
for i in xrange(len(raw)):
if raw[i].isdigit() == True: ## This line is where I'd loop in diff func's
counter = 1
print True
break
if counter == 0:
print False
My fail attempt to run a loop with all the tests:
raw = 'asdfaa3fa'
lst = [raw[i].isalnum(),raw[i].isalpha(),raw[i].isdigit(),raw[i].islower(),raw[i].isupper()]
counter = 0
for f in range(0,5):
for i in xrange(len(raw)):
if lst[f] == True: ## loop through f, which then loops through i
print lst[f]
counter = 1
print True
break
if counter == 0:
print False
So how do I fix this code to fulfill all the rules up there?
Using info from all the comments - this code fulfills the rules stated above, looping through each method dynamically as well.
raw = 'ABC'
functions = [str.isalnum, str.isalpha, str.isdigit, str.islower, str.isupper]
for func in functions:
print any(func(letter) for letter in raw)
getattr approach (I think this is called introspection method?)
raw = 'ABC'
meths = ['isalnum', 'isalpha', 'isdigit', 'islower', 'isupper']
for m in meths:
print any(getattr(c,m)() for c in raw)
List comprehension approach:
from __future__ import print_function ## Changing to Python 3 to use print in list comp
raw = 'ABC'
functions = [str.isalnum, str.isalpha, str.isdigit, str.islower, str.isupper]
solution = [print(func(raw)) for func in functions]
The way you are looping through a list of functions is slightly off. This would be a valid way to do it. The functions you need to store in the list are the generic string functions given by str.funcname. Once you have those list of functions, you can loop through them using a for loop, and just treat it like a normal function!
raw = 'asdfaa3fa'
functions = [str.isalnum, str.isalpha, str.isdigit, str.islower, str.isupper] # list of functions
for fn in functions: # iterate over list of functions, where the current function in the list is referred to as fn
for ch in raw: # for each character in the string raw
if fn(ch):
print(True)
break
Sample outputs:
Input Output
===================================
"qA2" -----> True True True True True
"asdfaa3fa" -----> True True True True
Also I notice you seem to use indexing for iteration which makes me feel like you might be coming from a language like C/C++. The for in loop construct is really powerful in python so I would read up on it (y).
Above is a more pythonic way to do this but just as a learning tool, I wrote a working version that matches how you tried to do it as much as possible to show you where you went wrong specifically. Here it is with comments:
raw = 'asdfaa3fa'
lst = [str.isalnum, str.isalpha, str.isdigit, str.islower, str.isupper] # notice youre treating the functions just like variables and aren't actually calling them. That is, you're writing str.isalpha instead of str.isalpha()
for f in range(0,5):
counter = 0
for i in xrange(len(raw)):
if lst[f](raw[i]) == True: # In your attempt, you were checking if lst[f]==True; lst[f] is a function so you are checking if a function == True. Instead, you need to pass an argument to lst[f](), in this case the ith character of raw, and check whether what that function evaluates to is true
print lst[f]
counter = 1
print True
break
if counter == 0:
print False
Okay, so the first question is easy enough. The simple way to do it is just do
def foo(raw):
for c in raw:
if c.isalpha(): return True
if c.isdigit(): return True
# the other cases
return False
Never neglect the simplest thing that could work.
Now, if you want to do it dynamically -- which is the magic keyword you probably needed, you want to apply something like this (cribbed from another question):
meths = [isalnum, isalpha, isdigit, islower, isupper]
for c in raw:
for m in meths:
getattr(c, m)()
Warning, this is untested code meant to give you the idea. The key notion here is that the methods of an object are attributes just like anything else, so, for example getattr("a", "isalpha")() does the following:
Uses getattr to search the attributes dictionary of "a" for a method named isalpha
Returns that method itself -- <function isalpha>
then invokes that method using the () which is the function application operator in Python.
See this example:
In [11]: getattr('a', 'isalpha')()
Out[11]: True
All the other answers are correct, but since you're a beginner, I want to point out the problem in your code:
lst = [raw[i].isalnum(),raw[i].isalpha(),raw[i].isdigit(),raw[i].islower(),raw[i].isupper()]
First: Not sure which value i currently has in your code snipped, but it seems to point somewhere in the string - which results in single characters being evaluated, not the whole string raw.
Second: When you build your list, you are already calling the methods you want to insert, which has the effect that not the functions themself get inserted, but their return values (that's why you're seeing all those True values in your print statement).
Try changing your code as follows:
lst = [raw.isalnum, raw.isalpha, raw.isdigit, raw.islower, raw.isupper]
I'm going to guess that you're validating password complexity, and I'm also going to say that software which takes an input and says "False" and there's no indication why is user-hostile, so the most important thing is not "how to loop over nested char function code wizardry (*)" but "give good feedback", and suggest something more like:
raw = 'asdfaa3fa'
import re
def validate_password(password):
""" This function takes a password string, and validates it
against the complexity requirements from {wherever}
and returns True if it's complex enough, otherwise False """
if not re.search('\d', password):
print("Error: password needs to include at least one number")
return False
elif not re.search('[a-z]', password):
print("Error: password must include at least one lowercase letter")
return False
elif not re.search('[A-Z]', password):
print("Error: password must include at least one uppercase letter")
return False
print("Password is OK")
return True
validate_password(raw)
Try online at repl.it
And the regex searching checks ranges of characters and digits in one call, which is neater than a loop over characters.
(PS. your functions overlap; a string which has characters matching 'isupper', 'islower' and 'isnumeric' already has 'isadigit' and 'isalnum' covered. More interesting would be to handle characters like ! which are not upper, lower, digits or alnum).
(*) function wizardry like the other answers is normally exactly what I would answer, but there's so much of that already answered that I may as well answer the other way instead :P
To answer the original question:
raw = 'asdfa3fa'
functions = [str.isalnum, str.isalpha, str.isdigit, str.islower, str.isupper]
isanything = [func(raw) for func in functions]
print repr(isanything)
Since you are looping through a list of simple items and trying to find if all of the functions has any valid results, you can simply define the list of functions you want to call on the input and return that. Here is a rather pythonic example of what you are trying to achieve:
def checker(checks, value):
return all(any(check(r) for r in value) for check in checks)
Test it out:
>>> def checker(checks, value):
... return all(any(check(r) for r in value) for check in checks)
...
>>> checks = [str.isalnum, str.isalpha, str.isdigit, str.islower, str.isupper]
>>> checker(checks, 'abcdef123ABC')
True
>>> checker(checks, 'abcdef123')
False
>>>
You can use introspection to loop through all of an object's attributes, whether they be functions or some other type.
However you probably don't want to do that here, because str has lots of function attributes, and you're only interested in five of them. It's probably better to do as you did and just make a list of the five you want.
Also, you don't need to loop over each character of the string if you don't want to; those functions already look at the whole string.
Check out this one-line solution for your problem. That problem is from HackerRank. I loop through a list of functions using the built-in getattr function.
s='qA2'
[print(bool(list(filter(lambda x : getattr(x, func)(),s)))) for func in ['isalnum','isalpha','isdigit','islower','isupper']]

0 being passed as an optional parameter to a class' method, param not found

I have a class file LinkedListADT.py and within that I have the class LinkedList. One method I have defined for this class is as follows:
def test(self, index=None):
if index:
print(str(index))
else:
print("Index not found")
Now, from my other file (not the class) I have defined the following function:
def test():
tester = LinkedListADT.LinkedList()
tester.test(0)
The result being printed is "Index not found" -- My question is, can 0 be passed as an optional input to a method or regular function?
I am quite new to Python and I'm assuming 0 is being interpreted as false. Whether or not this is the case, is there a workaround I can use?
As you guessed, 0 is considered as "falsey". The documentation lists other "falsey" values.
print bool(0)
# False
But you can check like this
def test(self, index=None):
if index is not None:
print(str(index))
else:
print("Index not found")
This program makes sure that the input is not None and if it is not, it is user provided (irrespective of its Truthiness).

FastQ programming error

So I'm trying to parse a FastQ sequence, but I'm a beginner to Python, and I'm a little confused as to why my code isn't working. This is what the program is supposed to carry out:
if I enter the FASTQ seqname line...
#EAS139:136:FC706VJ:2:2104:15343:197393
...then the program should output:
Instrument = EAS139
Run ID = 136
Flow Cell ID = FC706VJ
Flow Cell Lane = 2
Tile Number = 2104
X-coord = 15343
Y-coord = 197393
Here's my unfinished code thus far:
class fastq:
def __init__(self,str):
self.str = inStr.replace ('#',' ').split (':')
def lists (self,parameters):
self.parameters = ("Instrument","Run ID","Flow Cell ID","Flow Cell Lane","Tile Number","X-coordinates","y-coordinates")
def zip (self,myZip,zippedTuple):
self.Zip = zip(self.parameters,self.transform)
self.zippedTuple = tuple(myZip)
print (tuple(myZip))
def main():
seq = input('Enter FastQ sequence:')
new_fastq = fastq(str)
new_fastq.lists()
new_fastq.zip()
main()
The reason that your code isn't working is that it's more-or-less entirely wrong. To address your errors in the order we reach them when trying to run the program:
main:
new_fastq = fastq(str) does not pass the seq we just input, it passes the built-in string type;
__init__:
Calling the argument to fastq.__init__ str is a bad idea as it masks the very built-in we just tried to pass to it;
But whatever you call it, be consistent between the function definition and what is inside it - where do you think inStr is coming from?
lists:
Why is this separate to and not even called by __init__?
Why don't you pass any arguments?
What is the argument parameters even for?
zip:
Rather than define a method to print the object, it is more Pythonic to define fastq.__str__ that returns a string representation. Then you can print(str(new_fastq)). That being said;
Again, you mask a built-in. On this occasion, it's more of a problem because you actually try to use the built-in inside the method that masks it. Call it something else;
Again, you put unnecessary arguments in the definition, then don't bother to pass them anyway;
What is self.transform supposed to be? It is never mentioned anywhere else. Do you mean self.str (which, again, should be called something else, for reasons of masking a built-in and not actually being a string)?
myZip is one of the arguments you never passed, and I think you actually want self.Zip; but
Why would you create x = tuple(y) then on the next line print(tuple(y))? print(x)!
Addressing those points, plus some bonus PEP-008 tidying:
class FastQ:
def __init__(self, seq):
self.elements = seq.replace ('#',' ').split (':')
self.parameters = ("Instrument", "Run ID", "Flow Cell ID",
"Flow Cell Lane", "Tile Number",
"X-coordinates", "y-coordinates")
def __str__(self):
"""A rough idea to get you started."""
return "\n".join(map(str, zip(self.parameters, self.elements)))
def main():
seq = input('Enter FastQ sequence: ')
new_fastq = FastQ(seq)
print(str(new_fastq))
main()

Categories

Resources