Making a Block of Text into a list - python

I am writing a Python script that enumerates all processes running on the computer. My current code does this but prints this out in a large block of text that is hard to read. How can I improve my script to have the output text in a vertical list for each process and all?
import subprocess
print(subprocess.check_output('set',shell=True)
*Edit: Here is the output text from the above script

set is an internal command that displays cmd.exe environment variables in your case.
To get environment variables in Python, use os.environ instead.
If you want to get the output of set command as a list of strings (not tested):
#!/usr/bin/env python3
import os
from subprocess import check_output
lines = check_output('cmd.exe /U /c set').decode('utf-16').split(os.linesep)

set should already print with newlines, so if they're not showing up, something is more wrong than you're telling us. You could always double up the newlines if you want to split the settings apart, e.g.:
import subprocess
print(subprocess.check_output('set',shell=True).replace('\n', '\n\n'))
If the problem is that you're running on Python 3 and the bytes object is a big blob, you can make subprocess decode it to a friendly printable string for you:
print(subprocess.check_output('set',shell=True, universal_newlines=True))
# Yes, the name of the keyword is dumb; it sounds like it handles different
# line ending conventions, but on Python 3, it also decodes from `bytes`
# to `str` for you.
For the general case of line wrapping nicely (though it does nothing for paragraphs of text that are just "too big"), you might want to look at the textwrap module; it splits a block of text up into a list of lines wrapped nicely at word boundaries so you don't have words split across lines.

Disclaimer: I have not done what you are doing before but this might work.
import subprocess
processes = subprocess.check_output('set',shell=True)
processes = processes.decode('UTF-8').split('\n') # convert bytes to unicodes and split
for process in processes:
print(process)

Related

Efficient way to find a string based on a list

I'm new to scripting and have been reading up on Python for about 6 weeks. The below is meant to read a log file and send an alert if one of the keywords defined in srchstring is found. It works as expected and doesn't alert on strings previously found, as expected. However the file its processing is actively being written to by an application and the script is too slow on files around 500mb. under 200mb it works fine ie within 20secs.
Could someone suggest a more efficient way to search for a string within a file based on a pre-defined list?
import os
srchstring = ["Shutdown", "Disconnecting", "Stopping Event Thread"]
if os.path.isfile(r"\\server\\share\\logfile.txt"):
with open(r"\\server\\share\\logfile.txt","r") as F:
for line in F:
for st in srchstring:
if st in line:
print line,
#do some slicing of string to get dd/mm/yy hh:mm:ss:ms
# then create a marker file called file_dd/mm/yy hh:mm:ss:ms
if os.path.isfile("file_dd/mm/yy hh:mm:ss:ms"): # check if a file already exists named file_dd/mm/yy hh:mm:ss:ms
print "string previously found- ignoring, continuing search" # marker file exists
else:
open("file_dd/mm/yy hh:mm:ss:ms", 'a') # create file_dd/mm/yy hh:mm:ss:ms
print "error string found--creating marker file sending email alert" # no marker file, create it then send email
else:
print "file not exist"
Refactoring the search expression to a precompiled regular expression avoids the (explicit) innermost loop.
import os, re
regex = re.compile(r'Shutdown|Disconnecting|Stopping Event Thread')
if os.path.isfile(r"\\server\\share\\logfile.txt"):
#Indentation fixed as per comment
with open(r"\\server\\share\\logfile.txt","r") as F:
for line in F:
if regex.search(line):
# ...
I assume here that you use Linux. If you don't, install MinGW on Windows and the solution below will become suitable too.
Just leave the hard part to the most efficient tools available. Filter your data before you go to the python script. Use grep command to get the lines containing "Shutdown", "Disconnecting" or "Stopping Event Thread"
grep 'Shutdown\|Disconnecting\|"Stopping Event Thread"' /server/share/logfile.txt
and redirect the lines to your script
grep 'Shutdown\|Disconnecting\|"Stopping Event Thread"' /server/share/logfile.txt | python log.py
Edit: Windows solution. You can create a .bat file to make it executable.
findstr /c:"Shutdown" /c:"Disconnecting" /c:"Stopping Event Thread" \server\share\logfile.txt | python log.py
In 'log.py', read from stdin. It's file-like object, so no difficulties here:
import sys
for line in sys.stdin:
print line,
# do some slicing of string to get dd/mm/yy hh:mm:ss:ms
# then create a marker file called file_dd/mm/yy hh:mm:ss:ms
# and so on
This solution will reduce the amount of work your script has to do. As Python isn't a fast language, it may speed up the task. I suspect it can be rewritten purely in bash and it will be even faster (20+ years of optimization of a C program is not the thing you compete with easily), but I don't know bash enough.

Python: calling Fortran with subprocess and giving commands via communicate

I want to call a Fortran program from python. I use the Popen statement from subprocess like this:
p = Popen(['./finput'], stdout=PIPE, stdin=PIPE, stderr=STDOUT)
I then want to send some file names to the fortran program. The fortran program reads them from stdin and then opens the files.
If I use something like:
p_stdout = p.communicate(input='file1.dat\nfile2.dat\n')[0]
everything is fine and the fortran program works as expected.
However I want to give the file names as a variable from within the python program.
So if I use
p_stdout = p.communicate(input=file1+'\n'+file2+'\n')[0]
my fortran program can not open the file names. The problem is that the string that fortran reads looks like this
f i l e 1 . d a t
with a blank character as a first character and some strange character inbetween every correct character. Unfortunately this only shows up if you print every character of the string individually. If you just print the file name with
print*,file1
you get
file1.dat
So my question is, why is python putting in these strange characters into the communication with the child process and, more important, how do I get rid of the?
many thanks
Sounds like your Fortran might be getting Unicode, are you using Python 3? If so, then construct the string to be passed then use string.encode()

Open a new window in Vim-embedded python script

I've just started wrapping my head around vim+python scripts (having no experience with native vim scripts).
How can I open a new window to contain the stdout from a background process?
Currently, after reading some :help python, the only option I see is something like:
cmd = ":bel new"
vim.command(cmd)
Since vim.command can execute most (if not all?) ex commands, you can simply call :new +read!ls from within it.
:new splits the current window and puts a new (empty, no name) buffer into the upper window. It takes an argument +[cmd] which we use to execute read!cmd which reads the stdout of cmd after the bang into the buffer. Be aware that you need to escape spaces in your command with \
All in all you get vim.command("new +read!cmd")
:python vim.command("new +read!ls")
to read in the contents of the current directory into a new buffer in a n cichew, horizontally split window.
If you want to handle escaping of special characters, consider using python's re.escape():
:py import re;vim.command("new +read!"+re.escape("ls Dire*"))
which should be sufficient for most cases. If in doubt, check its documentation and compare it to that of your shell.

running BLAST (bl2seq) without creating sequence files

I have a script that performs BLAST queries (bl2seq)
The script works like this:
Get sequence a, sequence b
write sequence a to filea
write sequence b to fileb
run command 'bl2seq -i filea -j fileb -n blastn'
get output from STDOUT, parse
repeat 20 million times
The program bl2seq does not support piping.
Is there any way to do this and avoid writing/reading to the harddrive?
I'm using Python BTW.
Depending on what OS you're running on, you may be able to use something like bash's process substitution. I'm not sure how you'd set that up in Python, but you're basically using a named pipe (or named file descriptor). That won't work if bl2seq tries to seek within the files, but it should work if it just reads them sequentially.
How do you know bl2seq does not support piping.? By the way, pipes is an OS feature, not the program. If your bl2seq program outputs something, whether to STDOUT or to a file, you should be able to parse the output. Check the help file of bl2seq for options to output to file as well, eg -o option. Then you can parse the file.
Also, since you are using Python, an alternative you can use is BioPython module.
Is this the bl2seq program from BioPerl? If so, it doesn't look like you can do piping to it. You can, however, code your own hack using Bio::Tools::Run::AnalysisFactory::Pise, which is the recommended way of going about it. You'd have to do it in Perl, though.
If this is a different bl2seq, then disregard the message. In any case, you should probably provide some more detail.
Wow. I have it figured out.
The answer is to use python's subprocess module and pipes!
EDIT: forgot to mention that I'm using blast2 which does support piping.
(this is part of a class)
def _query(self):
from subprocess import Popen, PIPE, STDOUT
pipe = Popen([BLAST,
'-p', 'blastn',
'-d', self.database,
'-m', '8'],
stdin=PIPE,
stdout=PIPE)
pipe.stdin.write('%s\n' % self.sequence)
print pipe.communicate()[0]
where self.database is a string containing the database filename, ie 'nt.fa'
self.sequence is a string containing the query sequence
This prints the output to the screen but you can easily just parse it. No slow disk I/O. No slow XML parsing. I'm going to write a module for this and put it on github.
Also, I haven't gotten this far yet but I think you can do multiple queries so that the blast database does not need to be read and loaded into RAM for each query.
I call blast2 using R script:
....
system("mkfifo seq1")
system("mkfifo seq2")
system("echo sequence1 > seq1"), wait = FALSE)
system("echo sequence2 > seq2"), wait = FALSE)
system("blast2 -p blastp -i seq1 -j seq2 -m 8", intern = TRUE)
....
This is 2 times slower(!) vs. writing and reading from hard drive!

Passing a multi-line string as an argument to a script in Windows

I have a simple python script like so:
import sys
lines = sys.argv[1]
for line in lines.splitlines():
print line
I want to call it from the command line (or a .bat file) but the first argument may (and probably will) be a string with multiple lines in it. How does one do this?
Of course, this works:
import sys
lines = """This is a string
It has multiple lines
there are three total"""
for line in lines.splitlines():
print line
But I need to be able to process an argument line-by-line.
EDIT: This is probably more of a Windows command-line problem than a Python problem.
EDIT 2: Thanks for all of the good suggestions. It doesn't look like it's possible. I can't use another shell because I'm actually trying to invoke the script from another program which seems to use the Windows command-line behind the scenes.
I know this thread is pretty old, but I came across it while trying to solve a similar problem, and others might as well, so let me show you how I solved it.
This works at least on Windows XP Pro, with Zack's code in a file called
"C:\Scratch\test.py":
C:\Scratch>test.py "This is a string"^
More?
More? "It has multiple lines"^
More?
More? "There are three total"
This is a string
It has multiple lines
There are three total
C:\Scratch>
This is a little more readable than Romulo's solution above.
Just enclose the argument in quotes:
$ python args.py "This is a string
> It has multiple lines
> there are three total"
This is a string
It has multiple lines
there are three total
The following might work:
C:\> python something.py "This is a string^
More?
More? It has multiple lines^
More?
More? There are three total"
This is the only thing which worked for me:
C:\> python a.py This" "is" "a" "string^
More?
More? It" "has" "multiple" "lines^
More?
More? There" "are" "three" "total
For me Johannes' solution invokes the python interpreter at the end of the first line, so I don't have the chance to pass additional lines.
But you said you are calling the python script from another process, not from the command line. Then why don't you use dbr' solution? This worked for me as a Ruby script:
puts `python a.py "This is a string\nIt has multiple lines\nThere are three total"`
And in what language are you writing the program which calls the python script? The issue you have is with argument passing, not with the windows shell, not with Python...
Finally, as mattkemp said, I also suggest you use the standard input to read your multi-line argument, avoiding command line magic.
Not sure about the Windows command-line, but would the following work?
> python myscript.py "This is a string\nIt has multiple lines\there are three total"
..or..
> python myscript.py "This is a string\
It has [...]\
there are [...]"
If not, I would suggest installing Cygwin and using a sane shell!
Have you tried setting you multiline text as a variable and then passing the expansion of that into your script. For example:
set Text="This is a string
It has multiple lines
there are three total"
python args.py %Text%
Alternatively, instead of reading an argument you could read from standard in.
import sys
for line in iter(sys.stdin.readline, ''):
print line
On Linux you would pipe the multiline text to the standard input of args.py.
$ <command-that-produces-text> | python args.py

Categories

Resources