Openbr python wrapper from command line - python

i'm trying to execute this file test.py from command line:
from brpy import init_brpy
import requests # or whatever http request lib you prefer
import MagicalImageURLGenerator # made up
# br_loc is /usr/local/lib by default,
# you may change this by passing a different path to the shared objects
br = init_brpy(br_loc='/path/to/libopenbr')
br.br_initialize_default()
br.br_set_property('algorithm','CatFaceRecognitionModel') # also made up
br.br_set_property('enrollAll','true')
mycatsimg = open('mycats.jpg', 'rb').read() # cat picture not provided =^..^=
mycatstmpl = br.br_load_img(mycatsimg, len(mycatsimg))
query = br.br_enroll_template(mycatstmpl)
nqueries = br.br_num_templates(query)
scores = []
for imurl in MagicalImageURLGenerator():
# load and enroll image from URL
img = requests.get(imurl).content
tmpl = br.br_load_img(img, len(img))
targets = br.br_enroll_template(tmpl)
ntargets = br.br_num_templates(targets)
# compare and collect scores
scoresmat = br.br_compare_template_lists(targets, query)
for r in range(ntargets):
for c in range(nqueries):
scores.append((imurl, br.br_get_matrix_output_at(scoresmat, r, c)))
# clean up - no memory leaks
br.br_free_template(tmpl)
br.br_free_template_list(targets)
# print top 10 match URLs
scores.sort(key=lambda s: s[1])
for s in scores[:10]:
print(s[0])
# clean up - no memory leaks
br.br_free_template(mycatstmpl)
br.br_free_template_list(query)
br.br_finalize()
this script file is /myfolder/ while the library brpy is in /myfolder/scripts/brpy.
The brpy folder contains 3 files: "face_cluster_viz.py" , "html_viz.py" and "init.py" .
When i try to execute this file from cmd it shows an error:
NameError; name 'init_brpy' is not defined
Why? Where am I doing wrong? Is it possible execute this script from command line?
Thanks

The problem is the following line:
br = init_brpy(br_loc='/path/to/libopenbr')
You have to set your path of the openbr library.

Related

Updating python module varaibles from another python module

I need to update the python module's variable values from another python module.
Values need to be updated permanently to the module, not just for run-time.
So need to update the module file. How should I do that?
VersionInfo.py file has variables with some default values.
This file should be updated while executing another python file ReleaseVersion.py.
VersionInfo.py
__app__ = 'MyApp'
__appName__ = 'My Classic Application'
__version__ = 0.1
__updater__ = 'Kumaresan Lakshmanan'
__updatedOn__ = '2017-08-29'
#Other Lookups
versionStr = "v%s" % __version__
versionInfo ='%s (%s)' % (versionStr, __updatedOn__)
loggerName = __app__
stdLogFile = __app__ + '_log.txt'
errorLogFile = __app__ + '_error.txt'
ReleaseVersion.py
import VersionInfo
newVersion = VersionInfo.__version__
newVersion += .1
updatedOn = currentDate()
updater = 'Lakshmanan'
VersionInfo.__version__ = newVersion
VersionInfo.__updater__ = updater
VersionInfo.__updatedOn__ = getCurrentDate()
def updateVersionInfo():
# i m planning to go by followin step, need some better logic...
# 1. open versioninfo.py file into a str...
# 2. in str, search for old values and replace with new values
# 3. write back the str to versioninfo.py file
# Any other better logic i can do?
updateVersionInfo()
I do realize this is a very old question but maybe someone else has a question similar to this. You could attempt to make sort of a fake database with text files. Since python can easily write and read to text files, it would be very simple. You can create something such as
VersionInfo.txt::
__app__ = 'MyApp'
__appName__ = 'My Classic Application'
__version__ = 0.1
__updater__ = 'Kumaresan Lakshmanan'
__updatedOn__ = '2017-08-29'
And then import the file and split each line on the equal sign. This way you have a readable text file with all the information needed.

How to use python diff_match_patch to create a patch and apply it

I'm looking for a pythonic way to compare two files file1 and file2 obtain the differences in form of a patch file and merge their differences into file2. The code should do something like this:
diff file1 file2 > diff.patch
apply the patch diff.patch to file2 // this must be doing something like git apply.
I have seen the following post Implementing Google's DiffMatchPatch API for Python 2/3 on google's python API dif_match_patch to find the differences but I'm looking for a solution to create and apply patch.
First you need to install diff_match_patch.
Here is my code:
import sys
import time
import diff_match_patch as dmp_module
def readFileToText(filePath):
file = open(filePath,"r")
s = ''
for line in file:
s = s + line
return s
dmp = dmp_module.diff_match_patch()
origin = sys.argv[1];
lastest = sys.argv[2];
originText = readFileToText(origin)
lastestText = readFileToText(lastest)
patch = dmp.patch_make(originText, lastestText)
patchText = dmp.patch_toText(patch)
# floder = sys.argv[1]
floder = '/Users/test/Documents/patch'
print(floder)
patchFilePath = floder
patchFile = open(patchFilePath,"w")
patchFile.write(patchText)
print(patchText)

Python refresh file from disk

I have a python script that calls a system program and reads the output from a file out.txt, acts on that output, and loops. However, it doesn't work, and a close investigation showed that the python script just opens out.txt once and then keeps on reading from that old copy. How can I make the python script reread the file on each iteration? I saw a similar question here on SO but it was about a python script running alongside a program, not calling it, and the solution doesn't work. I tried closing the file before looping back but it didn't do anything.
EDIT:
I already tried closing and opening, it didn't work. Here's the code:
import subprocess, os, sys
filename = sys.argv[1]
file = open(filename,'r')
foo = open('foo','w')
foo.write(file.read().rstrip())
foo = open('foo','a')
crap = open(os.devnull,'wb')
numSolutions = 0
while True:
subprocess.call(["minisat", "foo", "out"], stdout=crap,stderr=crap)
out = open('out','r')
if out.readline().rstrip() == "SAT":
numSolutions += 1
clause = out.readline().rstrip()
clause = clause.split(" ")
print clause
clause = map(int,clause)
clause = map(lambda x: -x,clause)
output = ' '.join(map(lambda x: str(x),clause))
print output
foo.write('\n'+output)
out.close()
else:
break
print "There are ", numSolutions, " solutions."
You need to flush foo so that the external program can see its latest changes. When you write to a file, the data is buffered in the local process and sent to the system in larger blocks. This is done because updating the system file is relatively expensive. In your case, you need to force a flush of the data so that minisat can see it.
foo.write('\n'+output)
foo.flush()
I rewrote it to hopefully be a bit easier to understand:
import os
from shutil import copyfile
import subprocess
import sys
TEMP_CNF = "tmp.in"
TEMP_SOL = "tmp.out"
NULL = open(os.devnull, "wb")
def all_solutions(cnf_fname):
"""
Given a file containing a set of constraints,
generate all possible solutions.
"""
# make a copy of original input file
copyfile(cnf_fname, TEMP_CNF)
while True:
# run minisat to solve the constraint problem
subprocess.call(["minisat", TEMP_CNF, TEMP_SOL], stdout=NULL,stderr=NULL)
# look at the result
with open(TEMP_SOL) as result:
line = next(result)
if line.startswith("SAT"):
# Success - return solution
line = next(result)
solution = [int(i) for i in line.split()]
yield solution
else:
# Failure - no more solutions possible
break
# disqualify found solution
with open(TEMP_CNF, "a") as constraints:
new_constraint = " ".join(str(-i) for i in sol)
constraints.write("\n")
constraints.write(new_constraint)
def main(cnf_fname):
"""
Given a file containing a set of constraints,
count the possible solutions.
"""
count = sum(1 for i in all_solutions(cnf_fname))
print("There are {} solutions.".format(count))
if __name__=="__main__":
if len(sys.argv) == 2:
main(sys.argv[1])
else:
print("Usage: {} cnf.in".format(sys.argv[0]))
You take your file_var and end the loop with file_var.close().
for ... :
ga_file = open(out.txt, 'r')
... do stuff
ga_file.close()
Demo of an implementation below (as simple as possible, this is all of the Jython code needed)...
__author__ = ''
import time
var = 'false'
while var == 'false':
out = open('out.txt', 'r')
content = out.read()
time.sleep(3)
print content
out.close()
generates this output:
2015-01-09, 'stuff added'
2015-01-09, 'stuff added' # <-- this is when i just saved my update
2015-01-10, 'stuff added again :)' # <-- my new output from file reads
I strongly recommend reading the error messages. They hold quite a lot of information.
I think the full file name should be written for debug purposes.

Python and boto package for AWS: distributing files on the clusters

I am currently testing a Python script on an EMR instance with the Boto package.
The script reads each row of a file called fileC, compare their content with the content of the rows in fileAC, and write in a separate file the filtered rows. As I am comparing 2 big files, I created another intermediate filter with a third file called fileA to gain time.
The problem is the following: during my tests on a local machine, the script works fine, filters out more than 200 rows from fileC. But once I try on AWS with Boto package, the script doesn’t filter fileC at all (p=0, no “found” displayed and the same number of rows as fileC). It seems that the 2 files fileA and fileAC are not read. With boto, I used the functionality “cache_files=” in the function “StreamingStep(” in order to distribute the 2 files (fileA and fileAC) to each of the clusters. It used to work for other scripts but here it doesn’t. Any thought ?
Here is the script:
sys.path.append(os.path.dirname(__file__))
def main(argv):
filenameAC = 'activities.log'
filenameA = 'activitiesCookieCountry.log'
fileC = fileinput.FileInput(sys.argv[1:])
fileA = open(filenameA,'r')
fileAC = open(filenameAC,'r')
fileA = [line.rstrip('\n') for line in fileA]
Alines = set(fileA)
for lineC in fileC:
fieldC = lineC.split('#')
fieldComp = fieldC[0]+'#'+fieldC[2]
p = 0
if fieldComp in Alines:
fileAC.seek(0)
for lineAC in fileAC:
fieldAC = lineAC.split('#')
if (fieldAC[0] == fieldC[0]) and (fieldAC[2] == fieldC[2]) and (fieldAC[1] < fieldC[1]):
p = 1
print('found')
if p == 0:
sys.stdout.write(lineC)
if __name__ == "__main__":
main(sys.argv)
And here is the script that runs the script within EMR:
utils = ['s3n://blablabla/activities.log#activities.log','s3n://blablabla/Activities/activitiesCookieCountry.log#activitiesCookieCountry.log']
sargs = ['-jobconf','mapred.output.compress=true','-jobconf','mapred.output.compression.type=block','-jobconf','mapred.compress.map.output=true','-jobconf','stream.map.output.field.separator="#"','-jobconf','mapred.reduce.tasks="1"']
cACstep = StreamingStep(
name='ClickActivityCheck',
mapper='s3n://blablabla/click-formatting-ACheck-S3.py',
reducer=None,
input='s3n://blablabla/ClickCleanedFeb14/*.gz',
output='s3n://blablabla/ClickCleaned2Feb14',
cache_files=utils,
step_args=sargs
)
jobid = conn.run_jobflow(
name= 'AWS_Flow_Test',
log_uri='s3n://blablabla/Logging/jobflow_logs',
ec2_keyname=’xxxx’,
availability_zone=None,
master_instance_type='m1.small',
slave_instance_type='m1.small',
num_instances=4,
action_on_failure=None,
keep_alive=False,
enable_debugging=True,
hadoop_version='1.0.3',
steps=[cACstep],
bootstrap_actions=[],
instance_groups=None,
additional_info=None,
ami_version='2.4.1',
api_params=None,
visible_to_all_users=True)
My suspicion is that the 2 files called shouldn’t be defined like this:
filenameAC = 'activities.log'
filenameA = 'activitiesCookieCountry.log'
But I don’t really know how else …

"Not implemented" Exception when using pywin32 to control Adobe Acrobat

I have written a script in python using pywin32 to save pdf files to text that up until recently was working fine. I use similar methods in Excel. The code is below:
def __pdf2Txt(self, pdf, fileformat="com.adobe.acrobat.accesstext"):
outputLoc = os.path.dirname(pdf)
outputLoc = os.path.join(outputLoc, os.path.splitext(os.path.basename(pdf))[0] + '.txt')
try:
win32com.client.gencache.EnsureModule('{E64169B3-3592-47d2-816E-602C5C13F328}', 0, 1, 1)
adobe = win32com.client.DispatchEx('AcroExch.App')
pdDoc = win32com.client.DispatchEx('AcroExch.PDDoc')
pdDoc.Open(pdf)
jObject = pdDoc.GetJSObject()
jObject.SaveAs(outputLoc, "com.adobe.acrobat.accesstext")
except:
traceback.print_exc()
return False
finally:
del jObject
pdDoc.Close()
del pdDoc
adobe.Exit()
del adobe
However this code has suddenly stopped working and I get the following output:
Traceback (most recent call last):
File "C:\Documents and Settings\ablishen\workspace\HooverKeyCreator\src\HooverKeyCreator.py", line 38, in __pdf2Txt
jObject.SaveAs(outputLoc, "com.adobe.acrobat.accesstext")
File "C:\Python27\lib\site-packages\win32com\client\dynamic.py", line 505, in __getattr__
ret = self._oleobj_.Invoke(retEntry.dispid,0,invoke_type,1)
com_error: (-2147467263, 'Not implemented', None, None)
False
I have similar code written in VB that works correctly so I'm guessing that it has something to do with the COM interfaces not binding to the appropriate functions correctly? (my COM knowledge is patchy).
Blish, this thread holds the key to the solution you are looking for: https://mail.python.org/pipermail/python-win32/2002-March/000260.html
I admit that the post above is not the easiest to find (probably because Google scores it low based on the age of the content?).
Specifically, applying this piece of advice will get things running for you: https://mail.python.org/pipermail/python-win32/2002-March/000265.html
For reference, the complete piece of code that does not require you to manually patch dynamic.py (snippet should run pretty much out of the box):
# gets all files under ROOT_INPUT_PATH with FILE_EXTENSION and tries to extract text from them into ROOT_OUTPUT_PATH with same filename as the input file but with INPUT_FILE_EXTENSION replaced by OUTPUT_FILE_EXTENSION
from win32com.client import Dispatch
from win32com.client.dynamic import ERRORS_BAD_CONTEXT
import winerror
# try importing scandir and if found, use it as it's a few magnitudes of an order faster than stock os.walk
try:
from scandir import walk
except ImportError:
from os import walk
import fnmatch
import sys
import os
ROOT_INPUT_PATH = None
ROOT_OUTPUT_PATH = None
INPUT_FILE_EXTENSION = "*.pdf"
OUTPUT_FILE_EXTENSION = ".txt"
def acrobat_extract_text(f_path, f_path_out, f_basename, f_ext):
avDoc = Dispatch("AcroExch.AVDoc") # Connect to Adobe Acrobat
# Open the input file (as a pdf)
ret = avDoc.Open(f_path, f_path)
assert(ret) # FIXME: Documentation says "-1 if the file was opened successfully, 0 otherwise", but this is a bool in practise?
pdDoc = avDoc.GetPDDoc()
dst = os.path.join(f_path_out, ''.join((f_basename, f_ext)))
# Adobe documentation says "For that reason, you must rely on the documentation to know what functionality is available through the JSObject interface. For details, see the JavaScript for Acrobat API Reference"
jsObject = pdDoc.GetJSObject()
# Here you can save as many other types by using, for instance: "com.adobe.acrobat.xml"
jsObject.SaveAs(dst, "com.adobe.acrobat.accesstext")
pdDoc.Close()
avDoc.Close(True) # We want this to close Acrobat, as otherwise Acrobat is going to refuse processing any further files after a certain threshold of open files are reached (for example 50 PDFs)
del pdDoc
if __name__ == "__main__":
assert(5 == len(sys.argv)), sys.argv # <script name>, <script_file_input_path>, <script_file_input_extension>, <script_file_output_path>, <script_file_output_extension>
#$ python get.txt.from.multiple.pdf.py 'C:\input' '*.pdf' 'C:\output' '.txt'
ROOT_INPUT_PATH = sys.argv[1]
INPUT_FILE_EXTENSION = sys.argv[2]
ROOT_OUTPUT_PATH = sys.argv[3]
OUTPUT_FILE_EXTENSION = sys.argv[4]
# tuples are of schema (path_to_file, filename)
matching_files = ((os.path.join(_root, filename), os.path.splitext(filename)[0]) for _root, _dirs, _files in walk(ROOT_INPUT_PATH) for filename in fnmatch.filter(_files, INPUT_FILE_EXTENSION))
# patch ERRORS_BAD_CONTEXT as per https://mail.python.org/pipermail/python-win32/2002-March/000265.html
global ERRORS_BAD_CONTEXT
ERRORS_BAD_CONTEXT.append(winerror.E_NOTIMPL)
for filename_with_path, filename_without_extension in matching_files:
print "Processing '{}'".format(filename_without_extension)
acrobat_extract_text(filename_with_path, ROOT_OUTPUT_PATH, filename_without_extension, OUTPUT_FILE_EXTENSION)
I have tested this on WinPython x64 2.7.6.3, Acrobat X Pro
makepy.py is a script that comes with the win32com python package.
Running it for your installation "wires" python into the COM/OLE object in Windows. The following is an excerpt of some code I used to talk to Excel and do some stuff in it. This example gets the name of sheet 1 in the current workbook. It automatically runs makepy if it has an exception:
import win32com;
import win32com.client;
from win32com.client import selecttlb;
def attachExcelCOM():
makepyExe = r'python C:\Python25\Lib\site-packages\win32com\client\makepy.py';
typeList = selecttlb.EnumTlbs();
for tl in typeList:
if (re.match('^Microsoft.*Excel.*', tl.desc, re.IGNORECASE)):
makepyCmd = "%s -d \"%s\"" % (makepyExe, tl.desc);
os.system(makepyCmd);
# end if
# end for
# end def
def getSheetName(sheetNum):
try:
xl = win32com.client.Dispatch("Excel.Application");
wb = xl.Workbooks.Item(sheetNum);
except Exception, detail:
print 'There was a problem attaching to Excel, refreshing connect config...';
print Exception, str(detail);
attachExcelCOM();
try:
xl = win32com.client.Dispatch("Excel.Application");
wb = xl.Workbooks.Item(sheetNum);
except:
print 'Could not attach to Excel...';
sys.exit(-1);
# end try/except
# end try/except
wsName = wb.Name;
if (wsName == 'PERSONAL.XLS'):
return( None );
# end if
print 'The target worksheet is:';
print ' ', wsName;
print 'Is this correct? [Y/N]',;
answer = string.strip( sys.stdin.readline() );
answer = answer.upper();
if (answer != 'Y'):
print 'Sheet not identified correctly.';
return(None);
# end if
return( (wb, wsName) );
# end def
# -- Main --
sheetInfo = getSheetName(sheetNum);
if (sheetInfo == None):
print 'Sheet not found';
sys.exit(-1);
else:
(wb, wsName) = sheetInfo;
# end if

Categories

Resources