Finding IP Addresses in File using Python - python

I want to extract IP addresses present in various files in a folder. I am making use of Python to invoke OS(linux) commands such as strings, grep to achieve that.
Here is the code I have-
p1 = subprocess.Popen(['strings -a -f /root/Downloads/Apps/androidproject/*'],shell=True, stdout=subprocess.PIPE)
#p2 = subprocess.Popen(["grep", "-E","10.1.1"],stdin=p1.stdout,stdout=subprocess.PIPE)
p2 = subprocess.Popen(["grep", "-E","'((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'"],stdin=p1.stdout,stdout=subprocess.PIPE)
#p1.stdout.close()
out = p2.communicate()[0]
print "The IPs are:\n",out
The problem I am facing is that Python is not able to evaluate the RE expression I have specified to extract IP addresses from a set of text. I have verified the following things-
The RE expression is correct and works standalone,
The script is able to find/grep text which is explicitly provided like the line I have in which I am looking for IP's starting from 10.1.1
The problem lies in running the RE expression on the strings output (in the code execution) and am unable to figure out whats wrong. Could use some help.
P.S. I am open to learning an easier, better way to achieve this as well if anyone can suggest.
Thanks everyone!

Related

Import multiple files output from bash script into Python lists

I have a bash script that connects to multiple compute nodes and pulls data from each one depending on some arguments entered after the bash script is called. For simplicity sake, I'm essentially doing this:
for h in node{0..7}; do ssh $h 'fold -w 80 /program.headers | grep "RA"
| head -600 | tr -d "RA =" > '$h'filename'; done
I'm trying to take the 8 files that come out of this (each have 600 pieces of information) and save them each as a list in Python. I then need to manipulate them in Python (split and convert to float) to be able to plot the data with Matplotlib.
For a bash script that only outputs one file, I can easily make a variable name equal to check_output and then manipulate from there:
test = subprocess.check_output("./bashscript")
test_list = test.split()
test = [float(a) for a in test_list]
I am also able to read a saved file from my bash script by using:
test = subprocess.check_output(['cat', '/path/filename'])
test_list = test.split()
test = [float(a) for a in test_list]
The problem is, I'm working with over 80 files after I get all that I need. Is there some way in Python to say, "for every file made store the contents of that as a list"?
Instead of capturing data by using subprocess you can use os.popen() to execute scripts. The benefit of using it is that you can read the output of a command/script as you are reading a file. So you can use read(), readlines(),readline() according to your wish which all will give you a list. By using that you can execute the script and capture output like this
import os
output=os.popen("./bashscript").readlines() #now output has the op of bashsceipt with each line as a seperate item as list.
check this for more info on how to use os.popen(). check this to know difference between read(),readlines(),readline(),xreadlines()
Define a simple interface between your bash script and your python script
It looks like the simple interface used to be a print out of the file, but this solution did not scale to multiple files. Now, I recommend the interface be printing out the names of files created. It would look something like this:
filenames = subprocess.check_output("./bashscript").split()
for filename in filenames:
with open(filename) as file_obj:
file_data = [float(a) for a in file_obj.readlines()]
It looks like you are unfamiliar with Python but are familiar with bash. As a result, you are programming hobbled on bash crutches, instead you should embrace Python and use it in your application. You probably do not need the bash script at all.

Python Parallel SSH get only command output

I'm new to Python and i'm looking to run multiple parallel ssh connections and commands to the devices. I'm using pssh link for it.
Issue is the device returns some big header after the connection like 20-30 lines.
When i use the below code what is printed out is the result of the command but at the top there's also the big header that is printed after the login.
hosts = ['XX.XXX.XX.XXX']
client = ParallelSSHClient(hosts, user='XXXX', password='XXXXX')
output = client.run_command('command')
for host in output:
for line in output[host]['stdout']:
print line
Anyway i can get JUST the command output ?
Not sure I understand what you mean.
I'm using pssh as well, and seems like I'm using the same method as you are in order to print my command's output, see below:
client = pssh.ParallelSSHClient(nodes, pool_size=args.batch, timeout=10, num_retries=1)
output = client.run_command(command, sudo=True)
for node in output:
for line in output[node]['stdout']:
print '[{0}] {1}'.format(node, line)
Could you elaborate a bit more? Maybe provide an example of a command you run and the output you get?
checkout pssh.
This tool uses multi-threads and performs quickly.
you can read more about it here.

Python script not passing variable correctly

On the latest version of Debian 32-bit with Python 2.7.3, I've compiled Plink (part of the PuTTY suite of tools) from source. For those unfamiliar, Plink is a great tool to issue commands on SSH servers so you can script your commands (I've found them wonderful for Cisco switches, which is what I'm doing here).
I have a file called switch.list containing names of switches on each line, such as:
Net-Switch-1
Net-Switch-2
Backbone-1
Now my Python script looks like this:
import subprocess
Switches = []
SwitchFile = open("switch.list")
for line in SwitchFile:
Switches.append(line)
SwitchFile.close()
for sw in Switches:
p = subprocess.Popen(["./plink","-ssh","-l","admin","-pw","REDACTED","-noagent","-batch",sw,"show","clock"], stdout=subprocess.PIPE)
print p.communicate()
My output is:
Unable to open connection:
Name or service not known
('', None)
Over and over, for as many times as my switch count. That tells me it's reading the file and populating the array just fine, but the Plink for-loop is messed up.
Troubleshooting: If I replace sw with a hard-coded switch name, like Net-Switch-1, then it runs fine. This is why I know the variable, sw, isn't being passed along correctly.
More Troubleshooting: If I run the Plink command from CLI, omitting the switch name, I get the same error output, but without the third line of "('', None)"
Troubleshooting where I start to get tricky: This doesn't work either:
p = subprocess.Popen(["./plink","-ssh","-l","admin","-pw","REDACTED","-noagent","-batch",(" "+sw),"show","clock"], stdout=subprocess.PIPE)
When reading from file, lines contain the terminating new-line character. Try this instead:
for line in SwitchFile:
Switches.append(line.strip())
Note also, that the common practice is to call variables with lower-case names.
Are you reading the file correctly? If the info in the file is separated by newlines, Try something different like.
with open(file_path, 'r') as f:
data = f.readlines() #extracts to a list.
print data

Need to get output as lists rather than strings in Popen or any other system commands

I am trying to run a command from Python script(using Popen()) get the output as list, instead of string.
For Example, When I use Popen(), it gives the output as string. For commands like "vgs, vgdisplay, pvs, pvdisplay", I need to get the output as lists and should be able to parse it row and column, so that I can do the necessary action(like deleting the already existing Vg's etc etc). I was just wondering, if it is possible to get as lists or atleast convert into lists....
I started learning python a week ago, so I might have missed some simple tricks, please pardon me.....
Just to elaborate on the existing comments
from subprocess import PIPE
import subprocess
pro = subprocess.Popen("ifconfig", stdout=PIPE, stderr=PIPE)
data = pro.communicate()[0].split()
for line in data:
print "THIS IS A LINE"
print line
print "**************"

running BLAST (bl2seq) without creating sequence files

I have a script that performs BLAST queries (bl2seq)
The script works like this:
Get sequence a, sequence b
write sequence a to filea
write sequence b to fileb
run command 'bl2seq -i filea -j fileb -n blastn'
get output from STDOUT, parse
repeat 20 million times
The program bl2seq does not support piping.
Is there any way to do this and avoid writing/reading to the harddrive?
I'm using Python BTW.
Depending on what OS you're running on, you may be able to use something like bash's process substitution. I'm not sure how you'd set that up in Python, but you're basically using a named pipe (or named file descriptor). That won't work if bl2seq tries to seek within the files, but it should work if it just reads them sequentially.
How do you know bl2seq does not support piping.? By the way, pipes is an OS feature, not the program. If your bl2seq program outputs something, whether to STDOUT or to a file, you should be able to parse the output. Check the help file of bl2seq for options to output to file as well, eg -o option. Then you can parse the file.
Also, since you are using Python, an alternative you can use is BioPython module.
Is this the bl2seq program from BioPerl? If so, it doesn't look like you can do piping to it. You can, however, code your own hack using Bio::Tools::Run::AnalysisFactory::Pise, which is the recommended way of going about it. You'd have to do it in Perl, though.
If this is a different bl2seq, then disregard the message. In any case, you should probably provide some more detail.
Wow. I have it figured out.
The answer is to use python's subprocess module and pipes!
EDIT: forgot to mention that I'm using blast2 which does support piping.
(this is part of a class)
def _query(self):
from subprocess import Popen, PIPE, STDOUT
pipe = Popen([BLAST,
'-p', 'blastn',
'-d', self.database,
'-m', '8'],
stdin=PIPE,
stdout=PIPE)
pipe.stdin.write('%s\n' % self.sequence)
print pipe.communicate()[0]
where self.database is a string containing the database filename, ie 'nt.fa'
self.sequence is a string containing the query sequence
This prints the output to the screen but you can easily just parse it. No slow disk I/O. No slow XML parsing. I'm going to write a module for this and put it on github.
Also, I haven't gotten this far yet but I think you can do multiple queries so that the blast database does not need to be read and loaded into RAM for each query.
I call blast2 using R script:
....
system("mkfifo seq1")
system("mkfifo seq2")
system("echo sequence1 > seq1"), wait = FALSE)
system("echo sequence2 > seq2"), wait = FALSE)
system("blast2 -p blastp -i seq1 -j seq2 -m 8", intern = TRUE)
....
This is 2 times slower(!) vs. writing and reading from hard drive!

Categories

Resources