How to filter command line output? - python

I need to filter the output of a command executed in network equipment in order to bring only the lines that match a text like '10.13.32.34'.
I created a python code that brings all the output of the command but I need only part of this.
I am using Python 3.7.3 running on a Windows 10 Pro.
The code I used is below and I need the filtering part because I am a network engineer without the basic notion of python programming. (till now...)
from steelscript.steelhead.core import steelhead
from steelscript.common.service import UserAuth
auth = UserAuth(username='admin', password='password')
sh = steelhead.SteelHead(host='em01r001', auth=auth)
from steelscript.cmdline.cli import CLIMode
sh.cli.exec_command("show connections optimized", mode=CLIMode.CONFIG)
output = (sh.cli.exec_command("show connections optimized"))

I have no idea what your output looks like, so used the text in your question as example data. Anyway, for a simple pattern such as what's shown in your question, you could do it like this:
output = '''\
I need to filter the output of a command executed
in network equipment in order to bring only the
lines that match a text like '10.13.32.34'. I
created a python code that brings all the output
of the command but I need only part of this.
I am using Python 3.7.3 running on a Windows 10
Pro.
The code I used is below and I need the filtering
part because I am a network engineer without the
basic notion of python programming. (till now...)
'''
# Filter the lines of text in output.
filtered = ''.join(line for line in output.splitlines()
if '10.13.32.34' in line)
print(filtered) # -> lines that match a text like '10.13.32.34'. I
You could do something similar for more complex patterns by using the re.search() function in Python's built-in regular expression re module. Using regular expressions is more complicated, but extremely powerful. There are many tutorials on using them, including one in Python's own documentation titled the Regular Expression HOWTO.

Related

How to Extract Versions from Software Packages

I'm trying to extract the version number from software packages hosted on SourceForge based on this Stack Overflow post. Specifically, I'm using the Release API and the "best_release.json" call. I have the following examples:
7-zip: https://sourceforge.net/projects/sevenzip/best_release.json
KeePass: https://sourceforge.net/projects/keepass/best_release.json
OpenOffice.org:
https://sourceforge.net/projects/openofficeorg.mirror/best_release.json
Using the following code snippet:
import requests
"""
Un/comment the following lines to change the project name and test
different responses.
"""
proj = "keepass"
# proj = "sevenzip"
# proj = "openofficeorg.mirror"
r = requests.get(f'https://sourceforge.net/projects/{proj}/best_release.json')
json_resp = r.json()
print(json_resp['release']['filename'])
I receive the respective results for each package:
7-Zip: /7-Zip/22.00/7z2200-linux-x86.tar.xz
KeePass: /KeePass 2.x/2.51.1/KeePass-2.51.1.zip
Openoffice.org: /extended/iso/en/OOo_3.3.0_Win_x86_install_en-US_20110219.iso
I'm wondering how I can extract the file versions from these disparate packages. Looking at the results, one can see that there are different naming conventions. For example, 7-Zip puts the file version as "22.00" in the second directory level. KeePass, however, puts it in the second directory level as well as the filename itself. OpenOffice.org puts it inside the filename.
Is there a way to do some sort of fuzzy match that can attempt to extract a "best guess" file version given a filename?
I thought of using regular expressions, re. For example, I can use the (\d+) capture group to capture one or more digits, as demonstrated here. However, this would also capture text such as "x86," which I don't want. I just desire some text that looks closest to a version number, but I'm unsure how to do this.

bibtex to html with pybtex, python 3

I want to take a file of one or more bibtex entries and output it as an html-formatted string. The specific style is not so important, but let's just say APA. Basically, I want the functionality of bibtex2html but with a Python API since I'm working in Django. A few people have asked similar questions here and here. I also found someone who provided a possible solution here.
The first issue I'm having is pretty basic, which is that I can't even get the above solutions to run. I keep getting errors similar to ModuleNotFoundError: No module named 'pybtex.database'; 'pybtex' is not a package. I definitely have pybtex installed and can make basic API calls in the shell no problem, but whenever I try to import pybtex.database.whatever or pybtex.plugin I keep getting ModuleNotFound errors. Is it maybe a python 2 vs python 3 thing? I'm using the latter.
The second issue is that I'm having trouble understanding the pybtex python API documentation. Specifically, from what I can tell it looks like the format_from_string and format_from_file calls are designed specifically for what I want to do, but I can't seem to get the syntax correct. Specifically, when I do
pybtex.format_from_file('foo.bib',style='html')
I get pybtex.plugin.PluginNotFound: plugin pybtex.style.formatting.html not found. I think I'm just not understanding how the call is supposed to work, and I can't find any examples of how to do it properly.
Here's a function I wrote for a similar use case--incorporating bibliographies into a website generated by Pelican.
from pybtex.plugin import find_plugin
from pybtex.database import parse_string
APA = find_plugin('pybtex.style.formatting', 'apa')()
HTML = find_plugin('pybtex.backends', 'html')()
def bib2html(bibliography, exclude_fields=None):
exclude_fields = exclude_fields or []
if exclude_fields:
bibliography = parse_string(bibliography.to_string('bibtex'), 'bibtex')
for entry in bibliography.entries.values():
for ef in exclude_fields:
if ef in entry.fields.__dict__['_dict']:
del entry.fields.__dict__['_dict'][ef]
formattedBib = APA.format_bibliography(bibliography)
return "<br>".join(entry.text.render(HTML) for entry in formattedBib)
Make sure you've installed the following:
pybtex==0.22.2
pybtex-apa-style==1.3

Get output from SyntaxNet as python object, not text

After executing some of an example syntaxnet scripts(like parse.sh) I receive output in text-conll format. My goal is to take some features and proceed them to next network. One possible choice is to parse text output with something like nltk.corpus.reader.ConllCorpusReader to a python object. But for me interesting
is:
It is possible with some code modification to get from SyntaxNet not text, but Python object related to parsed results?
I've found that in parser_eval.py on lines 133-138 syntaxnet fetched already text version of results.
while True:
tf_eval_epochs, tf_eval_metrics, tf_documents = sess.run([
parser.evaluation['epochs'],
parser.evaluation['eval_metrics'],
parser.evaluation['documents'],
])
But I cannot locate the place from what object this text was generated and how.
There are many ways to do it, and from what I know all involve parsing the output of SyntaxNet, and load it into NLTK objects. I wrote a simple post on my blog, exemplifying it:
http://www.davidsbatista.net/blog/2017/03/25/syntaxnet/

Regular Expression Python Variable

I have data like this:
>Px016979
MSPWMKKVFLQCMPKLLMMRRTKYSLPDYDDTFVSNGYTNELEMSRDSLT
DAFGNSKEDSGDYRKSPAPEDDMVGAGAYQRPSVTESENMLPRHLSPEVA
AALQSVRFIAQHIKDADKDNEVVEDWKFMSMVLDRFFLWLFTIACFVGTF
GIIFQSPSLYDTRVPVDQQISSIPMRKNNFFYPKDIETIGIIS
>Px016980
MQFIKKVLLIALTLSGAMGISREKRGLIFPPTSLYGTFLAIAVPIDIPDK
NVFVSYNFESNYSTLNNITEIDEVLFPNLPVVTARHSRSITRELAYTVLE
TKFKEHGLGGRECLLRNICEAAETPLHHNGLLGHIMHIVFTPSSSAEEGL
DDEYYEAEASGRAGSCARYEELCPVGLFDLITRIVEFKHT
>Px002185
MLSPSVAIKVQVLYIGKVRISQRKVPDTLIDDALVKFVHHEAEKVKANML
RRHSLLSSTGTSIYSSESAENLNEDKTKTDTSEHNIFLMMLLRAHCEAKQ
LRHVHDTAENRTEFLNQYLGGSTIFMKAKRSLSSGFDQLLKRKSSRDEGS
GLVLPVKKVT
>Px006321
MFPGRTIGIMITASHNLEPDNGVKLVDPDGEMLDGSWEEIATRMANVRYL
PMSLITKFLVNSYY
What I want to do is if I have the number >Px016979 or I can get the data bellow it.like this:
>Px016979
MSPWMKKVFLQCMPKLLMMRRTKYSLPDYDDTFVSNGYTNELEMSRDSLT
DAFGNSKEDSGDYRKSPAPEDDMVGAGAYQRPSVTESENMLPRHLSPEVA
AALQSVRFIAQHIKDADKDNEVVEDWKFMSMVLDRFFLWLFTIACFVGTF
GIIFQSPSLYDTRVPVDQQISSIPMRKNNFFYPKDIETIGIIS
I am new with Python.
#coding:utf-8
import os,re
a = """
>Px016979
MSPWMKKVFLQCMPKLLMMRRTKYSLPDYDDTFVSNGYTNELEMSRDSLT
DAFGNSKEDSGDYRKSPAPEDDMVGAGAYQRPSVTESENMLPRHLSPEVA
AALQSVRFIAQHIKDADKDNEVVEDWKFMSMVLDRFFLWLFTIACFVGTF
GIIFQSPSLYDTRVPVDQQISSIPMRKNNFFYPKDIETIGIIS
>Px016980
MQFIKKVLLIALTLSGAMGISREKRGLIFPPTSLYGTFLAIAVPIDIPDK
NVFVSYNFESNYSTLNNITEIDEVLFPNLPVVTARHSRSITRELAYTVLE
TKFKEHGLGGRECLLRNICEAAETPLHHNGLLGHIMHIVFTPSSSAEEGL
DDEYYEAEASGRAGSCARYEELCPVGLFDLITRIVEFKHT"
>Px002185
MLSPSVAIKVQVLYIGKVRISQRKVPDTLIDDALVKFVHHEAEKVKANML
RRHSLLSSTGTSIYSSESAENLNEDKTKTDTSEHNIFLMMLLRAHCEAKQ
LRHVHDTAENRTEFLNQYLGGSTIFMKAKRSLSSGFDQLLKRKSSRDEGS
GLVLPVKKVT
>Px006321
MFPGRTIGIMITASHNLEPDNGVKLVDPDGEMLDGSWEEIATRMANVRYL
PMSLITKFLVNSYY
"""
b = '>Px016979'
matchbj = re.match( r'$b(.*?)>',a,re.M|re.I)
print matchbj.group()
My code can not work. I have two questions:
I think my data has carriage return so my code can't work.
I don't know how to use variables in Python regular expression. If I write re.match( r'>Px016797(.*?)>',a,re.M|re.I) it can work, but I need to use variables.
Thanks.
It looks like your data is a FASTA file with protein sequences. So instead of using regular expressions, you should consider installing BioPython. That is a library specifically for bioinformatics use and research.
The goal of Biopython is to make it as easy as possible to use Python for bioinformatics by creating high-quality, reusable modules and classes. Biopython features include parsers for various Bioinformatics file formats (BLAST, Clustalw, FASTA, Genbank,...), access to online services (NCBI, Expasy,...), interfaces to common and not-so-common programs (Clustalw, DSSP, MSMS...), a standard sequence class, various clustering modules, a KD tree data structure etc. and even documentation.
Using BioPython, you would extract a sequence from a FASTA file for a given identifier in the following way:
from Bio import SeqIO
input_file = r'C:\path\to\proteins.fasta'
record_id = 'Px016979'
record_dict = SeqIO.to_dict(SeqIO.parse(input_file, 'fasta'))
record = record_dict[record_id]
sequence = str(record.seq)
print sequence
The following should work for each of the entries you have:
a = """
>Px016979
MSPWMKKVFLQCMPKLLMMRRTKYSLPDYDDTFVSNGYTNELEMSRDSLT
DAFGNSKEDSGDYRKSPAPEDDMVGAGAYQRPSVTESENMLPRHLSPEVA
AALQSVRFIAQHIKDADKDNEVVEDWKFMSMVLDRFFLWLFTIACFVGTF
GIIFQSPSLYDTRVPVDQQISSIPMRKNNFFYPKDIETIGIIS
>Px016980
MQFIKKVLLIALTLSGAMGISREKRGLIFPPTSLYGTFLAIAVPIDIPDK
NVFVSYNFESNYSTLNNITEIDEVLFPNLPVVTARHSRSITRELAYTVLE
TKFKEHGLGGRECLLRNICEAAETPLHHNGLLGHIMHIVFTPSSSAEEGL
DDEYYEAEASGRAGSCARYEELCPVGLFDLITRIVEFKHT"
>Px002185
MLSPSVAIKVQVLYIGKVRISQRKVPDTLIDDALVKFVHHEAEKVKANML
RRHSLLSSTGTSIYSSESAENLNEDKTKTDTSEHNIFLMMLLRAHCEAKQ
LRHVHDTAENRTEFLNQYLGGSTIFMKAKRSLSSGFDQLLKRKSSRDEGS
GLVLPVKKVT
>Px006321
MFPGRTIGIMITASHNLEPDNGVKLVDPDGEMLDGSWEEIATRMANVRYL
PMSLITKFLVNSYY
"""
for b in ['>Px016979', '>Px016980', '>Px002185', '>Px006321']:
re_search = re.search(re.escape(b) + r'(.*?)(?:>|\Z)', a, re.M|re.I|re.S)
print re_search.group()
This will display the following:
>Px016979
MSPWMKKVFLQCMPKLLMMRRTKYSLPDYDDTFVSNGYTNELEMSRDSLT
DAFGNSKEDSGDYRKSPAPEDDMVGAGAYQRPSVTESENMLPRHLSPEVA
AALQSVRFIAQHIKDADKDNEVVEDWKFMSMVLDRFFLWLFTIACFVGTF
GIIFQSPSLYDTRVPVDQQISSIPMRKNNFFYPKDIETIGIIS
>
>Px016980
MQFIKKVLLIALTLSGAMGISREKRGLIFPPTSLYGTFLAIAVPIDIPDK
NVFVSYNFESNYSTLNNITEIDEVLFPNLPVVTARHSRSITRELAYTVLE
TKFKEHGLGGRECLLRNICEAAETPLHHNGLLGHIMHIVFTPSSSAEEGL
DDEYYEAEASGRAGSCARYEELCPVGLFDLITRIVEFKHT"
>
>Px002185
MLSPSVAIKVQVLYIGKVRISQRKVPDTLIDDALVKFVHHEAEKVKANML
RRHSLLSSTGTSIYSSESAENLNEDKTKTDTSEHNIFLMMLLRAHCEAKQ
LRHVHDTAENRTEFLNQYLGGSTIFMKAKRSLSSGFDQLLKRKSSRDEGS
GLVLPVKKVT
>
>Px006321
MFPGRTIGIMITASHNLEPDNGVKLVDPDGEMLDGSWEEIATRMANVRYL
PMSLITKFLVNSYY
I would also consider installing biopython and checking out the book python for biologists which is free online (http://pythonforbiologists.com/). I worked with fastas a lot and for a quick and dirty solution you can just use this (leave the rest of the code as is):
matchbj = re.findall( '>.*', a, re.DOTALL)
for item in matchbj:
print item
It basically matches over lines because of the re.DOTALL flag, and looks for any number of any things between '>' characters.
Be advised, this will give them to you in list for, not an object. In my experience, re.match in the first thing people learn but they are often looking for the effect of re.findall.

Extracting data from MS Word

I am looking for a way to extract / scrape data from Word files into a database. Our corporate procedures have Minutes of Meetings with clients documented in MS Word files, mostly due to history and inertia.
I want to be able to pull the action items from these meeting minutes into a database so that we can access them from a web-interface, turn them into tasks and update them as they are completed.
Which is the best way to do this:
VBA macro from inside Word to create CSV and then upload to the DB?
VBA macro in Word with connection to DB (how does one connect to MySQL from VBA?)
Python script via win32com then upload to DB?
The last one is attractive to me as the web-interface is being built with Django, but I've never used win32com or tried scripting Word from python.
EDIT: I've started extracting the text with VBA because it makes it a little easier to deal with the Word Object Model. I am having a problem though - all the text is in Tables, and when I pull the strings out of the CELLS I want, I get a strange little box character at the end of each string. My code looks like:
sFile = "D:\temp\output.txt"
fnum = FreeFile
Open sFile For Output As #fnum
num_rows = Application.ActiveDocument.Tables(2).Rows.Count
For n = 1 To num_rows
Descr = Application.ActiveDocument.Tables(2).Cell(n, 2).Range.Text
Assign = Application.ActiveDocument.Tables(2).Cell(n, 3).Range.Text
Target = Application.ActiveDocument.Tables(2).Cell(n, 4).Range.Text
If Target = "" Then
ExportText = ""
Else
ExportText = Descr & Chr(44) & Assign & Chr(44) & _
Target & Chr(13) & Chr(10)
Print #fnum, ExportText
End If
Next n
Close #fnum
What's up with the little control character box? Is some kind of character code coming across from Word?
Word has a little marker thingy that it puts at the end of every cell of text in a table.
It is used just like an end-of-paragraph marker in paragraphs: to store the formatting for the entire paragraph.
Just use the Left() function to strip it out, i.e.
Left(Target, Len(Target)-1))
By the way, instead of
num_rows = Application.ActiveDocument.Tables(2).Rows.Count
For n = 1 To num_rows
Descr = Application.ActiveDocument.Tables(2).Cell(n, 2).Range.Text
Try this:
For Each row in Application.ActiveDocument.Tables(2).Rows
Descr = row.Cells(2).Range.Text
Well, I've never scripted Word, but it's pretty easy to do simple stuff with win32com. Something like:
from win32com.client import Dispatch
word = Dispatch('Word.Application')
doc = word.Open('d:\\stuff\\myfile.doc')
doc.SaveAs(FileName='d:\\stuff\\text\\myfile.txt', FileFormat=?) # not sure what to use for ?
This is untested, but I think something like that will just open the file and save it as plain text (provided you can find the right fileformat) – you could then read the text into python and manipulate it from there. There is probably a way to grab the contents of the file directly, too, but I don't know it off hand; documentation can be hard to find, but if you've got VBA docs or experience, you should be able to carry them across.
Have a look at this post from a while ago: http://mail.python.org/pipermail/python-list/2002-October/168785.html Scroll down to COMTools.py; there's some good examples there.
You can also run makepy.py (part of the pythonwin distribution) to generate python "signatures" for the COM functions available, and then look through it as a kind of documentation.
You could use OpenOffice. It can open word files, and also can run python macros.
I'd say look at the related questions on the right -->
The top one seems to have some good ideas for going the python route.
how about saving the file as xml. then using python or something else and pull the data out of word and into the database.
It is possible to programmatically save a Word document as HTML and to import the table(s) contained into Access. This requires very little effort.

Categories

Resources