Replacing `eval` in python for a dynamic input - python

I am trying to replace eval in a python code. I am using a configuration file to read and create a string command in python syntax which is later executed using eval.
There are two functions:
Reads the configuration file and creates a string which can be executed using eval. Example: 'raw_bytes[26:31].hex()+","+codecs.decode(raw_bytes[41:42],\"cp500\")+","+raw_bytes[48:49].hex()+","+raw_bytes[102:106].hex()'
def extractor_command(config_file):
START=0
CMD=""
with open(config_file,'r') as f:
next(f) #skipping the comments in the first line
for line in f:
col = line.split()
UPTO=START+int(col[2])
if col[3] == "1":
if col[3] == "0":
CMD=CMD+'raw_bytes[{}:{}].hex()'.format(str(START),str(UPTO))
CMD=CMD+"+\",\"+"
if col[3] == "1":
CMD=CMD+'codecs.decode(raw_bytes[{}:{}],"cp500")'.format(str(START),str(UPTO))
CMD=CMD+"+\",\"+"
elif col[2] == "0":
pass
START=UPTO
CMD=CMD.rstrip('+\",\"+')
return CMD
The configuration file looks like this:
Nr Active Length(bytes) String
Field1 1 8 1
Field2 0 2 0
Field3 1 4 1
...
Field250 1 1 0
Field251 0 1 1
Field252 0 2 1
The second function, will read a binary file and will use the command created in the 1st function to extract from the binary file. The extracted lines are written into a txt file.
def extract(in_file,out_file,cmd):
READBLOCKS=2052
compiled = compile(cmd, '<string>', 'eval')
with open(out_file,'w') as extracted_file:
f=open(in_file, 'rb')
while True:
raw_bytes = f.read(READBLOCKS)
row=eval(compiled)
extracted_file.write(row+'\n')
b=f.read(1)
if not b:
break
f.close()
Although this works fine I am looking for another solution to make the code more readable and avoid eval for security reasons. Also I don't want to create the command to extract everytime a portion of the binary file is read, because it impacts the performance (binary file is huge).
The code doesn't look pretty but it's just for demonstration.
Any suggestion?

Related

How to read ONLY 1 word in python?

I've created an empty text file, and saved some stuff to it. This is what I saved:
Saish ddd TestUser ForTestUse
There is a space before these words. Anyways, I wanted to know how to read only 1 WORD in the text file using python. This is the code I used:
#Uncommenting the line below the line does literally nothing.
import time
#import mmap, re
print("Loading Data...")
time.sleep(2)
with open("User_Data.txt") as f:
lines = f.read() ##Assume the sample file has 3 lines
first = lines.split(None, 1)[0]
print(first)
print("Type user number 1 - 4 for using different user.")
ans = input('Is the name above correct?(y/1 - 4) ')
if ans == 'y':
print("Ok! You will be called", first)
elif ans == '1':
print("You are already registered to", first)
elif ans == '2':
print('Switching to accounts...')
time.sleep(0.5)
with open("User_Data.txt") as f:
lines = f.read() ##Assume the sample file has 3 lines
second = lines.split(None, 2)[2]
print(second)
#Fix the passord issue! Very important as this is SECURITY!!!
when I run the code, my output is:
Loading Data...
Saish
Type user number 1 - 4 for using different user.
Is the name above correct?(y/1 - 4) 2
Switching to accounts...
TestUser ForTestUse
as you can see, it diplays both "TestUser" and "ForTestUse" while I only want it to display "TestUser".
When you give a limit to split(), all the items from that limit to the end are combined. So if you do
lines = 'Saish ddd TestUser ForTestUse'
split = lines.split(None, 2)
the result is
['Saish', 'ddd', 'TestUser ForTestUse']
If you just want the third word, don't give a limit to split().
second = lines.split()[2]
You can use it directly without passing any None
lines.split()[2]
I understand your passing (None, 2) because you want to get None if there is no value at index 2,
A simple way to check if the index is available in the list
Python 2
2 in zip(*enumerate(lines.split()))[0]
Python 3
2 in list(zip(*enumerate(lines.split())))[0]

I'm reading into a 256 byte string. I want to skip it, if it's all binary zeros (\x00) Is there a single test?

Totally new to python. Trying to parse a file but not all records contain data. I want to skip the records that are all hex 00.
if record == ('\x00' * 256): from a sample of print("-"*80))
gave a Syntax error, hey I said I was new. :)
Thanks for the reply, I'm using 2.7 and reading like this....
with open(testfile, "rb") as f:
counter = 0
while True:
record = f.read(256)
counter += 1
Your example looks to be very close. I'm not sure about Python 2, but in Python 3 you should specify that a string is binary.
I would do something like:
empty = b'\x00' * 256
if record == empty:
print('skipped this line')
Remember that Python 2 uses print statements, so you should do print 'skipped this line' instead.

Python print .psl format without quotes and commas

I am working on a linux system using python3 with a file in .psl format common to genetics. This is a tab separated file that contains some cells with comma separated values. An small example file with some of the features of a .psl is below.
input.psl
1 2 3 x read1 8,9, 2001,2002,
1 2 3 mt read2 8,9,10 3001,3002,3003
1 2 3 9 read3 8,9,10,11 4001,4002,4003,4004
1 2 3 9 read4 8,9,10,11 4001,4002,4003,4004
I need to filter this file to extract only regions of interest. Here, I extract only rows with a value of 9 in the fourth column.
import csv
def read_psl_transcripts():
psl_transcripts = []
with open("input.psl") as input_psl:
csv_reader = csv.reader(input_psl, delimiter='\t')
for line in input_psl:
#Extract only rows matching chromosome of interest
if '9' == line[3]:
psl_transcripts.append(line)
return psl_transcripts
I then need to be able to print or write these selected lines in a tab delimited format matching the format of the input file with no additional quotes or commas added. I cant seem to get this part right and additional brackets, quotes and commas are always added. Below is an attempt using print().
outF = open("output.psl", "w")
for line in read_psl_transcripts():
print(str(line).strip('"\''), sep='\t')
Any help is much appreciated. Below is the desired output.
1 2 3 9 read3 8,9,10,11 4001,4002,4003,4004
1 2 3 9 read4 8,9,10,11 4001,4002,4003,4004
You might be able to solve you problem with a simple awk statement.
awk '$4 == 9' input.pls > output.pls
But with python you could solve it like this:
write_pls = open("output.pls", "w")
with open("input.pls") as file:
for line in file:
splitted_line = line.split()
if splitted_line[3] == '9':
out_line = '\t'.join(splitted_line)
write_pls.write(out_line + "\n")
write_pls.close()

matching and dispalying specific lines through python

I have 15 lines in a log file and i want to read the 4th and 10 th line for example through python and display them on output saying this string is found :
abc
def
aaa
aaa
aasd
dsfsfs
dssfsd
sdfsds
sfdsf
ssddfs
sdsf
f
dsf
s
d
please suggest through code how to achieve this in python .
just to elaborate more on this example the first (string or line is unique) and can be found easily in logfile the next String B comes within 40 lines of the first one but this one occurs at lots of places in the log file so i need to read this string withing the first 40 lines after reading string A and print the same that these strings were found.
Also I cant use with command of python as this gives me errors like 'with' will become a reserved keyword in Python 2.6. I am using Python 2.5
You can use this:
fp = open("file")
for i, line in enumerate(fp):
if i == 3:
print line
elif i == 9:
print line
break
fp.close()
def bar(start,end,search_term):
with open("foo.txt") as fil:
if search_term in fil.readlines()[start,end]:
print search_term + " has found"
>>>bar(4, 10, "dsfsfs")
"dsfsfs has found"
#list of random characters
from random import randint
a = list(chr(randint(0,100)) for x in xrange(100))
#look for this
lookfor = 'b'
for element in xrange(100):
if lookfor==a[element]:
print a[element],'on',element
#b on 33
#b on 34
is one easy to read and simple way to do it. Can you give part of your log file as an example? There are other ways that may work better :).
after edits by author:
The easiest thing you can do then is:
looking_for = 'findthis' i = 1 for line in open('filename.txt','r'):
if looking_for == line:
print i, line
i+=1
it's efficient and easy :)

Testing each line in a file

I am trying to write a Python program that reads each line from an infile. This infile is a list of dates. I want to test each line with a function isValid(), which returns true if the date is valid, and false if it is not. If the date is valid, it is written into an output file. If it is not, invalid is written into the output file. I have the function, and all I want to know is the best way to test each line with the function. I know this should be done with a loop, I'm just uncertain how to set up the loop to test each line in the file one-by-one.
Edit: I now have a program that basically works. However, I am getting incorrect output to the output file. Perhaps someone will be able to explain why.
Ok, I now have a program that basically works, but I'm getting strange results in the output file. Hopefully those with Python 3 experience can help.
def main():
datefile = input("Enter filename: ")
t = open(datefile, "r")
c = t.readlines()
ofile = input("Enter filename: ")
o = open(ofile, "w")
for line in c:
b = line.split("/")
e = b[0]
f = b[1]
g = b[2]
text = str(e) + " " + str(f) + ", " + str(g)
text2 = "The date " + text + " is invalid"
if isValid(e,f,g) == True:
o.write(text)
else:
o.write(text2)
def isValid(m, d, y):
if m == 1 or m == 3 or m == 5 or m == 7 or m == 8 or m == 10 or m == 12:
if d is range(1, 31):
return True
elif m == 2:
if d is range(1,28):
return True
elif m == 4 or m == 6 or m == 9 or m == 11:
if d is range(1,30):
return True
else:
return False
This is the output I'm getting.
The date 5 19, 1998
is invalidThe date 7 21, 1984
is invalidThe date 12 7, 1862
is invalidThe date 13 4, 2000
is invalidThe date 11 40, 1460
is invalidThe date 5 7, 1970
is invalidThe date 8 31, 2001
is invalidThe date 6 26, 1800
is invalidThe date 3 32, 400
is invalidThe date 1 1, 1111
is invalid
In the most recent versions of Python you can use the context management features that are implicit for files:
results = list()
with open(some_file) as f:
for line in f:
if isValid(line, date):
results.append(line)
... or even more tersely with a list comprehension:
with open(some_file) as f:
results = [line for line in f if isValid(line, date)]
For progressively older versions of Python you might need to explicitly open and close the file (with simple implicit iteration over the file for line in file:) or add more explicit iteration over the file (f.readline() or f.readlines() (plural) depending on whether you want to "slurp" in the entire file (with the memory overhead implications of that) or iterate line-by-line).
Also note that you may wish to strip the trailing newlines off these file contents (perhaps by calling line.rstrip('\n') --- or possibly just line.strip() if you want to eliminate all leading and trailing whitespace from each line).
(Edit based on additional comment to previous answer):
The function signature isValid(m,d,y) suggests that you're passing a data to this function (month, day, year) but that doesn't make sense given that you must also, somehow, pass in the data to be validated (a line of text, a string, etc).
To help you further you'll have to provide more information (preferable the source or a relevant portion of the source to this "isValid()" function.
In my initial answer I was assuming that your "isValid()" function was merely scanning for any valid date in its single argument. I've modified my code examples to show how one might pass a specific date, as a single argument, to a function which used this calling signature: "isValid(somedata, some_date)."
with open(fname) as f:
for line in f.readlines():
test(line)

Categories

Resources