Finding pattern in binary file?

Finding pattern in binary file? - python

I have this two functions:
def make_regex_from_hex_sign(hex_sign):
regex_hex_sign = re.compile(hex_sign.decode('hex'))
return regex_hex_sign
def find_regex_pattern_and_return_its_offset(regex_pattern, bytes_array):
if found_regex_pattern in regex_pattern.finditer(bytes_array):
return found_regex_pattern.start()
else:
return 0
and i'm using them like this:
pattern = make_regex_from_hex_sign("634351535F")
file = open('somefile.bin', 'rb')
allbytes = file.read()
offset = find_regex_pattern_and_return_its_offset(pattern, allbytes)
Python throws: NameError: global name 'found_regex_pattern' is not defined
If i replace if with for in if found_regex_pattern in regex_pattern.finditer(bytes_array) it works, but then i need to break at the end to stop it from searching past first found pattern iteration. Is there more elegant way to solve this without using for and break?

You did not define found_regex_pattern.
When you do the change from if to for it works because its a valid syntax and that means that found_regex_pattern acts as an entry of the regex_pattern.finditer(bytes_array) iterable.

Related

Automatically set results name in pyparsing?

Is there any way to get pyparsing to automatically set the resultsName of a grammar element to whatever it's named in my source code? That is, I would write code like
my_number = Word(nums)
and it would automatically execute
my_number.setResultsName('my_number')

You should be able to simply do:
my_number = Word(nums)('my_number')
using the shortcut for .setResultsName. Python, in general, makes it hard to get at the name of the variable in question.
As an alternative, if you had a list of them as a dictionary you could do something like:
for key,val in grammar_dict.items():
grammar_dict[key] = val.setResultsName(key)

Using the inspect module you can do what you want. Define a function srn:
import inspect
def srn(expr):
"""Sets the results name to the variable *name* of `expr`"""
cf = inspect.currentframe()
of = inspect.getouterframes(cf)[1]
fs = inspect.getframeinfo(of[0]).code_context[0].strip()
# name of FIRST parameter
try:
args = fs[fs.find('(') + 1:-1].split(',')
n = args[0]
if n.find('=') != -1:
name = n.split('=')[1].strip()
else:
name = n
expr.resultsName = name
except IndexError:
pass
return expr
Then after my_number = Word(nums) call:
srn(my_number)
and my_number contains "my_number" as resultsName.
Disclaimer 1: This approach works well, but is quite hacky and dives deep into the Python internals, so do it at your own risk.
Disclaimer 2: The idea for this is not mine - I got it somewhere from StackOverflow but I don't now exactly where...

Cast value if it is not None in python

If you need to parse some XML which has or hasn't some entries you often end up with patterns like this:
planet = system.findall('.//planet')
row['discoveryyear'] = int(planet.findtext("./discoveryyear")) if planet.findtext("./discoveryyear") else None
Is there a nicer way to do that? I would like to avoid the second planet.findtext call but also don't want to write another line of text to store the variable first

Instead of the try/except solution, I propose a helper function:
def find_int(xml, text):
found = xml.findtext(text)
return int(found) if found else None
row['discoveryyear'] = find_int(planet, "./discoveryyear")
(note that found is also falsy if it's '', which is good case to return None for as well)

This will do (except if it's discovered in year 0 haha):
row['discoveryyear'] = int(planet.findtext("./discoveryyear") or 0) or None

To avoid the extra function call you could wrap it in a try/except
try:
row['discoveryyear'] = int(planet.findtext("./discoveryyear"))
except TypeError: #raised if planet.findtext("./discoveryyear") is None
row['discoveryyear'] = None
This also doesn't store the return value in a seperate variable

FastQ programming error

So I'm trying to parse a FastQ sequence, but I'm a beginner to Python, and I'm a little confused as to why my code isn't working. This is what the program is supposed to carry out:
if I enter the FASTQ seqname line...
#EAS139:136:FC706VJ:2:2104:15343:197393
...then the program should output:
Instrument = EAS139
Run ID = 136
Flow Cell ID = FC706VJ
Flow Cell Lane = 2
Tile Number = 2104
X-coord = 15343
Y-coord = 197393
Here's my unfinished code thus far:
class fastq:
def __init__(self,str):
self.str = inStr.replace ('#',' ').split (':')
def lists (self,parameters):
self.parameters = ("Instrument","Run ID","Flow Cell ID","Flow Cell Lane","Tile Number","X-coordinates","y-coordinates")
def zip (self,myZip,zippedTuple):
self.Zip = zip(self.parameters,self.transform)
self.zippedTuple = tuple(myZip)
print (tuple(myZip))
def main():
seq = input('Enter FastQ sequence:')
new_fastq = fastq(str)
new_fastq.lists()
new_fastq.zip()
main()

The reason that your code isn't working is that it's more-or-less entirely wrong. To address your errors in the order we reach them when trying to run the program:
main:
new_fastq = fastq(str) does not pass the seq we just input, it passes the built-in string type;
__init__:
Calling the argument to fastq.__init__ str is a bad idea as it masks the very built-in we just tried to pass to it;
But whatever you call it, be consistent between the function definition and what is inside it - where do you think inStr is coming from?
lists:
Why is this separate to and not even called by __init__?
Why don't you pass any arguments?
What is the argument parameters even for?
zip:
Rather than define a method to print the object, it is more Pythonic to define fastq.__str__ that returns a string representation. Then you can print(str(new_fastq)). That being said;
Again, you mask a built-in. On this occasion, it's more of a problem because you actually try to use the built-in inside the method that masks it. Call it something else;
Again, you put unnecessary arguments in the definition, then don't bother to pass them anyway;
What is self.transform supposed to be? It is never mentioned anywhere else. Do you mean self.str (which, again, should be called something else, for reasons of masking a built-in and not actually being a string)?
myZip is one of the arguments you never passed, and I think you actually want self.Zip; but
Why would you create x = tuple(y) then on the next line print(tuple(y))? print(x)!
Addressing those points, plus some bonus PEP-008 tidying:
class FastQ:
def __init__(self, seq):
self.elements = seq.replace ('#',' ').split (':')
self.parameters = ("Instrument", "Run ID", "Flow Cell ID",
"Flow Cell Lane", "Tile Number",
"X-coordinates", "y-coordinates")
def __str__(self):
"""A rough idea to get you started."""
return "\n".join(map(str, zip(self.parameters, self.elements)))
def main():
seq = input('Enter FastQ sequence: ')
new_fastq = FastQ(seq)
print(str(new_fastq))
main()

Using local variables outside their functions

I understand that functions are useful for code which will be used multiple times so I tried creating a function to save myself time and make my code look neater. The function I had looks like this:
def drawCard():
drawnCard = random.choice(cardDeck)
adPos = cardDeck.index(drawnCard)
drawnCardValue = cardValues[adPos]
However, I am not sure how to return these variables as they are local(?). Therefore, I can not use these variables outside the function. I am just wondering if someone could help edit this function in a way where I could use the drawnCard and drawnCardValue variables outside the function?

Use return:
def drawCard():
drawnCard = random.choice(cardDeck)
adPos = cardDeck.index(drawnCard)
drawnCardValue = cardValues[adPos]
return drawnCard, drawnCardValue
drawnCard, drawnCardValue = drawnCard()
Note, you could also write drawCard this way:
def drawCard():
adPos = random.randrange(len(cardDeck))
drawnCard = cardDeck[adPos]
drawnCardValue = cardValues[adPos]
return drawnCard, drawnCardValue
These two functions behave differently if cardDeck contains duplicates, however. cardDeck.index would always return the first index, so drawnCardValue would always correspond to the first item which is a duplicate. It would never return the second value (which in theory could be different.)
If you use adPos = random.randrange(len(cardDeck)) then every item in cardValue has an equal chance of being selected -- assuming len(cardValue) == len(cardDeck).

selectively copying from an input file

My assignment calls for 3 modules- fileutility, choices, and selectiveFileCopy, the last of which imports the first two.
The purpose is to be able to selectively copy pieces of text from an input file, then write it to an output file, determined by the "predicate" in the choices module. As in, either copy everything (choices.always), if a specific string is present(choices.contains(x)), or by length (choices.shorterThan(x)).
So far, I have only the always() working, but it must take in one parameter, but my professor specifically stated the parameter could be anything, even nothing(?). Is this possible? If so, how do I write my definition so that it works?
The second part of this very long question is why my other two predicates don't work. When I tested them with docstests(another part of the assignment), they all passed.
Here's some code:
fileutility(I've been told this function is meaningless, but its part of the assignment so...)-
def safeOpen(prompt:str, openMode:str, errorMessage:str ):
while True:
try:
return open(input(prompt),openMode)
except IOError:
return(errorMessage)
choices-
def always(x):
"""
always(x) always returns True
>>> always(2)
True
>>> always("hello")
True
>>> always(False)
True
>>> always(2.1)
True
"""
return True
def shorterThan(x:int):
"""
shorterThan(x) returns True if the specified string
is shorter than the specified integer, False is it is not
>>> shorterThan(3)("sadasda")
False
>>> shorterThan(5)("abc")
True
"""
def string (y:str):
return (len(y)<x)
return string
def contains(pattern:str):
"""
contains(pattern) returns True if the pattern specified is in the
string specified, and false if it is not.
>>> contains("really")("Do you really think so?")
True
>>> contains("5")("Five dogs lived in the park")
False
"""
def checker(line:str):
return(pattern in line)
return checker
selectiveFileCopy-
import fileutility
import choices
def selectivelyCopy(inputFile,outputFile,predicate):
linesCopied = 0
for line in inputFile:
if predicate == True:
outputFile.write(line)
linesCopied+=1
inputFile.close()
return linesCopied
inputFile = fileutility.safeOpen("Input file name: ", "r", " Can't find that file")
outputFile = fileutility.safeOpen("Output file name: ", "w", " Can't create that file")
predicate = eval(input("Function to use as a predicate: "))
print("Lines copied =",selectivelyCopy(inputFile,outputFile,predicate))

So far, I have only the always() working, but it must take in one
parameter, but my professor specifically stated the parameter could be
anything, even nothing(?). Is this possible? If so, how do I write my
definition so that it works?
You can use a default argument:
def always(x=None): # x=None when you don't give a argument
return True
The second part of this very long question is why my other two
predicates don't work. When I tested them with docstests(another part
of the assignment), they all passed.
Your predicates do work, but they are functions that need to be called:
def selectivelyCopy(inputFile,outputFile,predicate):
linesCopied = 0
for line in inputFile:
if predicate(line): # test each line with the predicate function
outputFile.write(line)
linesCopied+=1
inputFile.close()
return linesCopied

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding pattern in binary file? - python

You did not define found_regex_pattern. When you do the change from if to for it works because its a valid syntax and that means that found_regex_pattern acts as an entry of the regex_pattern.finditer(bytes_array) iterable.

Related

Automatically set results name in pyparsing?

Cast value if it is not None in python

FastQ programming error

Using local variables outside their functions

selectively copying from an input file

Categories

Resources