Compare RPM Packages using Python

Compare RPM Packages using Python - python

I'm trying to compare a csv file containing required Linux packages with the current installed packages. The comparison should output any packages not installed or newer than the current installed packages.
The problem is that I'm unable to loop through the list of installed packages and show all hits, for instance packages with the same name and version, but different architecture should be shown twice(for instance compat-libstdc++-33), but I only getting the first hit with the script below.
#!/usr/bin/python
import rpm
import csv
import sys
import os
'''
Script to check installed rpms against a csv file containing the package name and version similar to the list below:
atk,1.12.2
libart_lgpl,2.3
info,4.9
libsepol,1.15.2
libusb,0.1.12
libfontenc,1.4.2
'''
if len(sys.argv) !=2:
print ''
print 'Usage: ', sys.argv[0], '/path/to/csv_input_file'
print ''
sys.exit(1)
if not os.path.isfile(sys.argv[1]):
print ''
print sys.argv[1], 'not found!'
print ''
sys.exit(1)
else:
input_csv = sys.argv[1]
pkgRequired = csv.reader(open(input_csv),delimiter=',')
pkgInstalledName = []
pkgInstalledVersion = []
pkgInstalledArch = []
ts = rpm.TransactionSet()
mi = ts.dbMatch()
for h in mi:
pkgInstalledName.append((h['name']))
pkgInstalledVersion.append((h['version']))
pkgInstalledArch.append((h['arch']))
for row in pkgRequired:
pkgRequiredName = row[0]
pkgRequiredVersion = row[1]
#pkgRequiredArch = row[2]
if pkgRequiredName in pkgInstalledName:
if pkgInstalledVersion[pkgInstalledName.index(pkgRequiredName)] >= pkgRequiredVersion:
pass
else:
print '\nInstalled: ',pkgInstalledName[pkgInstalledName.index(pkgRequiredName)], pkgInstalledVersion[pkgInstalledName.index(pkgRequiredName)], pkgInstalledArch[pkgInstalledName.index(pkgRequiredName)], ' \nRequired: ', ' ', pkgRequiredName,pkgRequiredVersion

Assuming that there's no problem with the way that you're reading the list of installed packages (I'm not familiar with the rpm module), then your only problem is with using the index() function. This function return the first occurrence of an item with the specified value - and it isn't what you want.
A correct implementation (which is also much more efficient) would be:
installedPackages = {} #create a hash table, mapping package names to LISTS of installed package versions and architectures
for h in mi:
l = installedPackages.get(h['name'], list()) #return either the existing list, or a new one if this is the first time that the name appears.
l.append( (h['version'], h['arch']) )
...
if requiredPackageName in installedPackages:
for ver, arch in installedPackages[requiredPackageName]: print ...

This is what I ended up doing to get this working. The script currently is not checking for the architecture of required packages, but at least it shows the arch installed. The script works (as far as I know) but can be improved as its my first at python :)
#!/usr/bin/python
import rpm
import csv
import sys
import os
'''
Script to check installed rpms against a csv file containing the package name and version similar to the list below:
atk,1.12.2
libart_lgpl,2.3
info,4.9
libsepol,1.15.2
libusb,0.1.12
libfontenc,1.4.2
'''
#silverbullet - 20120301
if len(sys.argv) !=2:
print ''
print 'Usage: ', sys.argv[0], '/path/to/csv_input_file'
print ''
sys.exit(1)
if not os.path.isfile(sys.argv[1]):
print ''
print sys.argv[1], 'not found!'
print ''
sys.exit(1)
else:
input_csv = sys.argv[1]
pkgRequired = csv.reader(open(input_csv),delimiter=',')
pkgInstalledName = []
pkgInstalledVersion = []
pkgInstalledArch = []
ts = rpm.TransactionSet()
mi = ts.dbMatch()
for h in mi:
pkgInstalledName.append((h['name']))
pkgInstalledVersion.append((h['version']))
pkgInstalledArch.append((h['arch']))
for row in pkgRequired:
try:
pkgRequiredName = row[0]
pkgRequiredVersion = row[1]
#pkgRequiredArch = row[2] - This is not implemented yet, ie, script will ignore architecture in csv input file
except:
print "Unexpected Error. Check if input is csv format with no blank lines. "#, sys.exc_info()[1]
break
else:
for pos, pkg in enumerate(pkgInstalledName):
if pkg == pkgRequiredName:
if pkgInstalledVersion[pos] >= pkgRequiredVersion:
pass
else:
print '\nInstalled:', pkgInstalledName[pos], pkgInstalledVersion[pos], pkgInstalledArch[pos], '\nRequired: ', pkg, pkgRequiredVersion

Related

Python string comparison using pymarc marc8_to_unicode no longer working

My code imports a MARC file using MARCReader and compares a string against a list of acceptable answers. If the string from MARC has no match in my list, it gets added to an error list. This has worked for years in Python 2.7.4 installations on Windows 7 with no issue. I recently got a Windows 10 machine and installed Python 2.7.10, and now strings with non-standard characters fail that match. the issue is not Python 2.7.10 alone; I've installed every version from 2.7.4 through 2.7.10 on this new machine, and get the same problem. A new install of Python 2.7.10 on a Windows 7 machine also gets the problem.
I've trimmed out functions that aren't relevant, and I've dramatically trimmed the master list. In this example, "Académie des Sciences" is an existing repository, but "Acadm̌ie des Sciences" now appears in our list of new repositories.
# -*- coding: utf-8 -*-
from aipmarc import get_catdb, get_bibno, parse_date
from phfawstemplate import browsepage #, nutchpage, eadpage, titlespage
from pymarc import MARCReader, marc8_to_unicode
from time import strftime
from umlautsort import alafiling
import urllib2
import sys
import os
import string
def make_newrepos_list(list, fn): # Create list of unexpected repositories found in the MArcout database dump
output = "These new repositories are not yet included in the master list in phfaws.py. Please add the repository code (in place of ""NEWCODE*""), and the URL (in place of ""TEST""), and then add these lines to phfaws.py. Please keep the list alphabetical. \nYou can find repository codes at http://www.loc.gov/marc/organizations/ \n \n"
for row in list:
output = '%s reposmasterlist.append([u"%s", "%s", "%s"])\n' % (output, row[0], row[1], row[2])
fh = open(fn,'w')
fh.write(output.encode("utf-8"))
fh.close()
def main(marcfile):
reader = MARCReader(file(marcfile))
'''
Creating list of preset repository codes.
'''
reposmasterlist =[[u"American Institute of Physics", "MdCpAIP", "http://www.aip.org/history/nbl/index.html"]]
reposmasterlist.append([u"Académie des Sciences", "FrACADEMIE", "http://www.academie-sciences.fr/fr/Transmettre-les-connaissances/inventaires-des-fonds-d-archives-personnelles.html"])
reposmasterlist.append([u"American Association for the Advancement of Science", "daaas", "http://archives.aaas.org/"])
newreposcounter = 0
newrepos = ""
newreposlist = []
findingaidcounter = 0
reposcounter = 0
for record in reader:
if record['903']: # Get only records where 903a="PHFAWS"
phfawsfull = record.get_fields('903')
for field in phfawsfull:
phfawsnote = field['a']
if 'PHFAWS' in phfawsnote:
if record['852'] is not None: # Get only records where 852/repository is not blank
repository = record.get_fields('852')
for field in repository:
reposname = field['a']
reposname = marc8_to_unicode(reposname) # Convert repository name from MARC file to Unicode
reposname = reposname.rstrip('.,')
reposcode = None
reposurl = None
for row in reposmasterlist: # Match field 852 repository against the master list.
if row[0] == reposname: # If it's in the master list, use the master list to populate our repository-related fields
reposcode = row[1]
reposurl = row[2]
if record['856'] is not None: # Get only records where 856 is not blank and includes "online finding aid"
links = record.get_fields('856')
for field in links:
linksthree = field['3']
if linksthree is not None and "online finding aid" in linksthree:
if reposcode == None: # If this record's repository wasn't in the master list, add to list of new repositories
newreposcounter += 1
newrepos = '%s %s \n' % (newrepos, reposname)
reposcode = "NEWCODE" + str(newreposcounter)
reposurl = "TEST"
reposmasterlist.append([reposname, reposcode, reposurl])
newreposlist.append([reposname, reposcode, reposurl])
human_url = field['u']
else:
pass
else:
pass
else:
pass
else:
pass
else:
pass
# Output list of new repositories
newreposlist.sort(key = lambda rep: rep[0])
if newreposcounter != 0:
status = '%d new repositories found. you must add information on these repositories, then run phfaws.py again. Please see the newly updated rewrepos.txt for details.' % (newreposcounter)
sys.stderr.write(status)
make_newrepos_list(newreposlist, 'newrepos.txt')
if __name__ == '__main__':
try:
mf = sys.argv[1]
sys.exit(main(mf))
except IndexError:
sys.exit('Usage: %s <marcfile>' % sys.argv[0])
Edit: I've found that simply commenting out the "reposname = marc8_to_unicode(reposname)" line gets me the results I want. I still don't understand why this is, since it was a necessary step before.

Edit: I've found that simply commenting out the "reposname = marc8_to_unicode(reposname)" line gets me the results I want. I still don't understand why this is, since it was a necessary step before.
This suggests to me that the encoding of strings in your database changed from MARC8 to Unicode. Have you upgraded your cataloging system recently?

Print new line not working in Python

The \n just doesn't seem to work for me when I use it with print. I am using Python 2.7.8. I don't get whats wrong, I think \n with a print should print a new line very straight forwardly.
import sys
import os
import subprocess
from collections import OrderedDict
import xmlrpclib
import hawkey
op_name = sys.argv[1]
pkg_name = sys.argv[2]
# Hawkey Configurations
sack = hawkey.Sack()
path = "/home/thejdeep/test_repo/repodata/%s"
repo = hawkey.Repo("test")
repo.repomd_fn = path % "repomd.xml"
repo.primary_fn = path % "b6f6911f7d9fb63f001388f1ecd0766cec060c1d04c703c6a74969eadc24ec97-primary.xml.gz"
repo.filelists_fn = path % "df5897ed6d3f87f2be4432543edf2f58996e5c9e6a7acee054f9dbfe513df4da-filelists.xml.gz"
sack.load_repo(repo,load_filelists=True)
# Main Function
if __name__ == "__main__":
print "Querying the repository\n"
print "-----------------------\n"
print "Found packages :\n"
print "--------------\n"
q = hawkey.Query(sack)
q = q.filter(name=pkg_name,latest_per_arch=True)[0]
if q:
for pkg in q:
print str(pkg)
else:
print "No packages with name "+pkg_name+" found. Exiting"
sys.exit()
print "--------------------"
print "Performing Dependency Check"
Output is something like this. Basically its printing in the same line :
Querying the repository ----------------------- Found packages : --------------

Using print method automatically add \n at the end of the line so its not necessary put \n at the end of each line.

Python refresh file from disk

I have a python script that calls a system program and reads the output from a file out.txt, acts on that output, and loops. However, it doesn't work, and a close investigation showed that the python script just opens out.txt once and then keeps on reading from that old copy. How can I make the python script reread the file on each iteration? I saw a similar question here on SO but it was about a python script running alongside a program, not calling it, and the solution doesn't work. I tried closing the file before looping back but it didn't do anything.
EDIT:
I already tried closing and opening, it didn't work. Here's the code:
import subprocess, os, sys
filename = sys.argv[1]
file = open(filename,'r')
foo = open('foo','w')
foo.write(file.read().rstrip())
foo = open('foo','a')
crap = open(os.devnull,'wb')
numSolutions = 0
while True:
subprocess.call(["minisat", "foo", "out"], stdout=crap,stderr=crap)
out = open('out','r')
if out.readline().rstrip() == "SAT":
numSolutions += 1
clause = out.readline().rstrip()
clause = clause.split(" ")
print clause
clause = map(int,clause)
clause = map(lambda x: -x,clause)
output = ' '.join(map(lambda x: str(x),clause))
print output
foo.write('\n'+output)
out.close()
else:
break
print "There are ", numSolutions, " solutions."

You need to flush foo so that the external program can see its latest changes. When you write to a file, the data is buffered in the local process and sent to the system in larger blocks. This is done because updating the system file is relatively expensive. In your case, you need to force a flush of the data so that minisat can see it.
foo.write('\n'+output)
foo.flush()

I rewrote it to hopefully be a bit easier to understand:
import os
from shutil import copyfile
import subprocess
import sys
TEMP_CNF = "tmp.in"
TEMP_SOL = "tmp.out"
NULL = open(os.devnull, "wb")
def all_solutions(cnf_fname):
"""
Given a file containing a set of constraints,
generate all possible solutions.
"""
# make a copy of original input file
copyfile(cnf_fname, TEMP_CNF)
while True:
# run minisat to solve the constraint problem
subprocess.call(["minisat", TEMP_CNF, TEMP_SOL], stdout=NULL,stderr=NULL)
# look at the result
with open(TEMP_SOL) as result:
line = next(result)
if line.startswith("SAT"):
# Success - return solution
line = next(result)
solution = [int(i) for i in line.split()]
yield solution
else:
# Failure - no more solutions possible
break
# disqualify found solution
with open(TEMP_CNF, "a") as constraints:
new_constraint = " ".join(str(-i) for i in sol)
constraints.write("\n")
constraints.write(new_constraint)
def main(cnf_fname):
"""
Given a file containing a set of constraints,
count the possible solutions.
"""
count = sum(1 for i in all_solutions(cnf_fname))
print("There are {} solutions.".format(count))
if __name__=="__main__":
if len(sys.argv) == 2:
main(sys.argv[1])
else:
print("Usage: {} cnf.in".format(sys.argv[0]))

You take your file_var and end the loop with file_var.close().
for ... :
ga_file = open(out.txt, 'r')
... do stuff
ga_file.close()
Demo of an implementation below (as simple as possible, this is all of the Jython code needed)...
__author__ = ''
import time
var = 'false'
while var == 'false':
out = open('out.txt', 'r')
content = out.read()
time.sleep(3)
print content
out.close()
generates this output:
2015-01-09, 'stuff added'
2015-01-09, 'stuff added' # <-- this is when i just saved my update
2015-01-10, 'stuff added again :)' # <-- my new output from file reads
I strongly recommend reading the error messages. They hold quite a lot of information.
I think the full file name should be written for debug purposes.

"Not implemented" Exception when using pywin32 to control Adobe Acrobat

I have written a script in python using pywin32 to save pdf files to text that up until recently was working fine. I use similar methods in Excel. The code is below:
def __pdf2Txt(self, pdf, fileformat="com.adobe.acrobat.accesstext"):
outputLoc = os.path.dirname(pdf)
outputLoc = os.path.join(outputLoc, os.path.splitext(os.path.basename(pdf))[0] + '.txt')
try:
win32com.client.gencache.EnsureModule('{E64169B3-3592-47d2-816E-602C5C13F328}', 0, 1, 1)
adobe = win32com.client.DispatchEx('AcroExch.App')
pdDoc = win32com.client.DispatchEx('AcroExch.PDDoc')
pdDoc.Open(pdf)
jObject = pdDoc.GetJSObject()
jObject.SaveAs(outputLoc, "com.adobe.acrobat.accesstext")
except:
traceback.print_exc()
return False
finally:
del jObject
pdDoc.Close()
del pdDoc
adobe.Exit()
del adobe
However this code has suddenly stopped working and I get the following output:
Traceback (most recent call last):
File "C:\Documents and Settings\ablishen\workspace\HooverKeyCreator\src\HooverKeyCreator.py", line 38, in __pdf2Txt
jObject.SaveAs(outputLoc, "com.adobe.acrobat.accesstext")
File "C:\Python27\lib\site-packages\win32com\client\dynamic.py", line 505, in __getattr__
ret = self._oleobj_.Invoke(retEntry.dispid,0,invoke_type,1)
com_error: (-2147467263, 'Not implemented', None, None)
False
I have similar code written in VB that works correctly so I'm guessing that it has something to do with the COM interfaces not binding to the appropriate functions correctly? (my COM knowledge is patchy).

Blish, this thread holds the key to the solution you are looking for: https://mail.python.org/pipermail/python-win32/2002-March/000260.html
I admit that the post above is not the easiest to find (probably because Google scores it low based on the age of the content?).
Specifically, applying this piece of advice will get things running for you: https://mail.python.org/pipermail/python-win32/2002-March/000265.html
For reference, the complete piece of code that does not require you to manually patch dynamic.py (snippet should run pretty much out of the box):
# gets all files under ROOT_INPUT_PATH with FILE_EXTENSION and tries to extract text from them into ROOT_OUTPUT_PATH with same filename as the input file but with INPUT_FILE_EXTENSION replaced by OUTPUT_FILE_EXTENSION
from win32com.client import Dispatch
from win32com.client.dynamic import ERRORS_BAD_CONTEXT
import winerror
# try importing scandir and if found, use it as it's a few magnitudes of an order faster than stock os.walk
try:
from scandir import walk
except ImportError:
from os import walk
import fnmatch
import sys
import os
ROOT_INPUT_PATH = None
ROOT_OUTPUT_PATH = None
INPUT_FILE_EXTENSION = "*.pdf"
OUTPUT_FILE_EXTENSION = ".txt"
def acrobat_extract_text(f_path, f_path_out, f_basename, f_ext):
avDoc = Dispatch("AcroExch.AVDoc") # Connect to Adobe Acrobat
# Open the input file (as a pdf)
ret = avDoc.Open(f_path, f_path)
assert(ret) # FIXME: Documentation says "-1 if the file was opened successfully, 0 otherwise", but this is a bool in practise?
pdDoc = avDoc.GetPDDoc()
dst = os.path.join(f_path_out, ''.join((f_basename, f_ext)))
# Adobe documentation says "For that reason, you must rely on the documentation to know what functionality is available through the JSObject interface. For details, see the JavaScript for Acrobat API Reference"
jsObject = pdDoc.GetJSObject()
# Here you can save as many other types by using, for instance: "com.adobe.acrobat.xml"
jsObject.SaveAs(dst, "com.adobe.acrobat.accesstext")
pdDoc.Close()
avDoc.Close(True) # We want this to close Acrobat, as otherwise Acrobat is going to refuse processing any further files after a certain threshold of open files are reached (for example 50 PDFs)
del pdDoc
if __name__ == "__main__":
assert(5 == len(sys.argv)), sys.argv # <script name>, <script_file_input_path>, <script_file_input_extension>, <script_file_output_path>, <script_file_output_extension>
#$ python get.txt.from.multiple.pdf.py 'C:\input' '*.pdf' 'C:\output' '.txt'
ROOT_INPUT_PATH = sys.argv[1]
INPUT_FILE_EXTENSION = sys.argv[2]
ROOT_OUTPUT_PATH = sys.argv[3]
OUTPUT_FILE_EXTENSION = sys.argv[4]
# tuples are of schema (path_to_file, filename)
matching_files = ((os.path.join(_root, filename), os.path.splitext(filename)[0]) for _root, _dirs, _files in walk(ROOT_INPUT_PATH) for filename in fnmatch.filter(_files, INPUT_FILE_EXTENSION))
# patch ERRORS_BAD_CONTEXT as per https://mail.python.org/pipermail/python-win32/2002-March/000265.html
global ERRORS_BAD_CONTEXT
ERRORS_BAD_CONTEXT.append(winerror.E_NOTIMPL)
for filename_with_path, filename_without_extension in matching_files:
print "Processing '{}'".format(filename_without_extension)
acrobat_extract_text(filename_with_path, ROOT_OUTPUT_PATH, filename_without_extension, OUTPUT_FILE_EXTENSION)
I have tested this on WinPython x64 2.7.6.3, Acrobat X Pro

makepy.py is a script that comes with the win32com python package.
Running it for your installation "wires" python into the COM/OLE object in Windows. The following is an excerpt of some code I used to talk to Excel and do some stuff in it. This example gets the name of sheet 1 in the current workbook. It automatically runs makepy if it has an exception:
import win32com;
import win32com.client;
from win32com.client import selecttlb;
def attachExcelCOM():
makepyExe = r'python C:\Python25\Lib\site-packages\win32com\client\makepy.py';
typeList = selecttlb.EnumTlbs();
for tl in typeList:
if (re.match('^Microsoft.*Excel.*', tl.desc, re.IGNORECASE)):
makepyCmd = "%s -d \"%s\"" % (makepyExe, tl.desc);
os.system(makepyCmd);
# end if
# end for
# end def
def getSheetName(sheetNum):
try:
xl = win32com.client.Dispatch("Excel.Application");
wb = xl.Workbooks.Item(sheetNum);
except Exception, detail:
print 'There was a problem attaching to Excel, refreshing connect config...';
print Exception, str(detail);
attachExcelCOM();
try:
xl = win32com.client.Dispatch("Excel.Application");
wb = xl.Workbooks.Item(sheetNum);
except:
print 'Could not attach to Excel...';
sys.exit(-1);
# end try/except
# end try/except
wsName = wb.Name;
if (wsName == 'PERSONAL.XLS'):
return( None );
# end if
print 'The target worksheet is:';
print ' ', wsName;
print 'Is this correct? [Y/N]',;
answer = string.strip( sys.stdin.readline() );
answer = answer.upper();
if (answer != 'Y'):
print 'Sheet not identified correctly.';
return(None);
# end if
return( (wb, wsName) );
# end def
# -- Main --
sheetInfo = getSheetName(sheetNum);
if (sheetInfo == None):
print 'Sheet not found';
sys.exit(-1);
else:
(wb, wsName) = sheetInfo;
# end if

IOError opening an existing file with Python

Running the following code:
import os
import datetime
import ftplib
currdate = datetime.datetime.now()
formatdate = currdate.strftime("%m-%d-%Y %H%M")
def log():
fqn = os.uname()[1]
ext_ip = urllib2.urlopen('http://whatismyip.org').read()
log = open ('/Users/admin/Documents/locatelog.txt','w')
log.write(str("Asset: %s " % fqn))
log.write(str("Checking in from IP#: %s" % ext_ip))
smush = str(fqn +' # ' + formatdate)
os.rename('/Users/admin/Documents/locatelog.txt','/Users/admin/Documents/%s.txt' % smush )
s = ftplib.FTP('10.7.1.71','username','password')
f = open('/Users/admin/Documents/%s.txt' % smush,'r')
s.storbinary("STOR /Users/admin/Documents/%s.txt" % smush,f)
Generates the following error:
ftplib.error_perm: 550 /Users/admin/Documents/678538.local # 02-24-2010 1301.txt: No such file or directory
I have a feeling something is amiss in this line :
s.storbinary("STOR /Users/admin/Documents/%s.txt" % smush,f)
678538 is the host I am testing on...using Mac OS X 10.5 and Python 2.5.1

Shouldn't it bef = open('/Users/admin/Documents/%s.txt' % smush,'r') ? notice the / in front of Users
If you dont put the first /, the script will think the path to the file is relative to the current directory (where the script is run from)
Edit:
I m not too familiar with Python (I wish) but shouldnt it be:
s.storbinary('STOR /Users/admin/Documents/%s.txt' % smush,f) ?
In your example, Python will treat your string as literal and you want to interpolate the value of smush with %s
Edit 2:
Does the directory /Users/admin/Documents/ exist on your server? If not, I think you will have to create them before copying anything. (Since the error message is about some files/folders missing).
You can create them yourself first. Run your script. If the file is copied successfully, then you can add the creation of the directories from within your script.

remove all spaces from file name .eg in smush = str(fqn +' # ' + formatdate), you are putting a space in front of and after "#". you path looks like
/Users/admin/Documents/something # something
and when you pass it to ftplib, it may have problem. another way is to try putting quotes, eg
s.storbinary("STOR '/Users/admin/Documents/%s.txt'" % smush,f)

Edit:
This version works: Problem was that I was writing to buffer, and not to file.
import os
import urllib2
import datetime
import ftplib
currdate = datetime.datetime.now()
formatdate = currdate.strftime("%m-%d-%Y-%H%M")
def log():
fqn = os.uname()[1]
ext_ip = urllib2.urlopen('http://whatismyip.org').read()
smush = str(fqn + formatdate)
s = ftplib.FTP('10.7.1.71','username','password')
f = open('/Users/admin/Documents/%s.txt' % smush,'w')
f.write(str("Asset: %s " % fqn))
f.write('\n')
f.write(str("Checking in from IP#: %s" % ext_ip))
f.write('\n')
f.write(str("On: %s" % formatdate))
f.close
f = open('/Users/admin/Documents/%s.txt' % smush,'rb')
s.storbinary('STOR %s.txt' % smush , f)
s.close
f.close

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Compare RPM Packages using Python - python

Related

Python string comparison using pymarc marc8_to_unicode no longer working

Print new line not working in Python

Python refresh file from disk

"Not implemented" Exception when using pywin32 to control Adobe Acrobat

IOError opening an existing file with Python

Categories

Resources