Unix cat function (cat * > merged.txt) in Python? [duplicate]

Unix cat function (cat * > merged.txt) in Python? [duplicate] - python

This question already has answers here:
Reproduce the Unix cat command in Python
(6 answers)
Closed 9 years ago.
Is there a way to use the cat function from Unix in Python or something similar once a directory has been established ? I want to merge files_1-3 together into merged.txt
I would usually just find the directory in Unix and then run
cat * > merged.txt
file_1.txt
file_2.txt
file_3.txt
merged.txt

Use the fileinput module:
import fileinput
import glob
with open('/path/to/merged.txt', 'w') as f:
for line in fileinput.input(glob.glob('/path/to/files/*')):
f.write(line)
fileinput.close()

As we know we are going to use "Unix" cat command (unless you are looking for a pythonic way or being performance concious)
You can use
import os
os.system("cd mydir;cat * > merged.txt")
or
as pointed by 1_CR (Thanks) and explained here
Python: How to Redirect Output with Subprocess?

Use fileinput. Say you have a python file merge.py with the following code, you could call it like so merge.py dir/*.txt. File merged.txt gets written to current dir. By default, fileinput iterates over the list of files passed on the command line, so you can let the shell handle globbing
#!/usr/bin/env python
import fileinput
with open('merged.txt', 'w') as f:
for line in fileinput.input():
f.write(line)

Related

How to redirect input to python script

How come this command python test.py <(cat file1.txt) does not work accordingly. I could've sworn I had this working previously. Basically, I would like to get the output of that cat command as an input to the python script.
This command
cat file1.txt | python test.py
works okay, which outputs:
reading: file11
reading: file12
Which are based on the following scripts/files below.
The reason I want this to work is because I really want to feed in 2 input files like
python test.py <(cat file1.txt) <(cat file2.txt)
And would like some python output like:
reading: file11 file21
reading: file12 file22
I know this is a very simple example, and I can just read in or open() both files inside the python script and iterate accordingly. This is a simplified version of my current screnario, the cat command is technically another executable doing other things, so its not as easy as just reading/opening the file to read.
Sample script/files:
test.py:
import sys
for line in sys.stdin:
print("reading: ", line.strip())
sys.stdin.close()
file1.txt:
file11
file12
file2.txt:
file21
file22

changing test.py to:
import sys
input1 = open(sys.argv[1], "r")
input2 = open(sys.argv[2], "r")
for line1, line2 in zip(input1, input2):
print("reading: ", line1.strip(), line2.strip())
input1.close()
input2.close()
will enable python test.py <(cat file1.txt) <(cat file2.txt) to work

Actually it depends on shell you are using.
I guess you use bash which unfortunately can't have it working as only last redirection from specific descriptor is taken. You could create temporary file, redirect output of scripts to it and then feed your main script with tmp file.
Or if you don't mind you can switch e.g to zsh, which has such feature enabled by default.

Trying to avoid shell=True in a Python subprocess

I need to concatenate multiple files that begin with the same name inside a Python program. My idea, in a bash shell, would be to something like
cat myfiles* > my_final_file
but there are two shell operators to use: * and >. This could be easily solved using
subprocess.Popen("cat myfiles* > my_final_file", shell=True)
but everybody says the using shell=True is something you have to avoid for security and portability reasons. How can I execute that piece of code, then?

You have to expand the pattern in python:
import glob
subprocess.check_call(['cat'] + glob.glob("myfiles*"), stdout=open("my_final_file", "wb"))
or better do everything in python:
with open("my_final_file", "wb") as output:
for filename in glob.glob("myfiles*"):
with open(filename, "rb") as inp:
output.write(inp.read())

How to write help() documentation to a file Python? [duplicate]

This question already has answers here:
How do I export the output of Python's built-in help() function
(11 answers)
Closed 4 years ago.
So I was looking at the help() documentation of a module but soon realized it was very tedius to read the documentation in the small output box. So therefore I tried pasting the help() documentation to another file for more clearer reading.
myfile = open("file.txt","w")
myfile.write(str(help(random)))
myfile.close()
Instead of the documentation being written, it instead pasted in None.
Any ideas how to do this?

The answer is pydoc!. Run it from the console:
$ pydoc [modulename] > file.txt
and it will basically write the output of the help() command to file.txt

i'm not suggesting you should read the python documentation this way - but here is what you could do: you could redirect stdout and call help:
from contextlib import redirect_stdout
import random
with open('random_help.txt', 'w') as file:
with redirect_stdout(file):
help(random)
or, even simpler (as suggested by Jon Clements):
from pydoc import doc
import random
with open('random_help.txt', 'w') as file:
doc(random, output=file)

How to execute a command for each lines in a txt file [duplicate]

This question already has answers here:
How do I execute a program or call a system command?
(65 answers)
Closed 5 years ago.
i'm trying to automate the research about a list of domain i have (this list is a .txt file, about 350/400 lines).
I need to give the same command (that uses a py script) for each line i have in the txt file. Something like that:
import os
with open('/home/dogher/Desktop/copia.txt') as f:
for line in f:
process(line)
os.system("/home/dogher/Desktop/theHarvester-master/theHarvester.py -d "(line)" -l 300 -b google -f "(line)".html")
I know there is wrong syntax with "os.system" but i don't know how to insert the text in the line, into the command..
Thanks so much and sorry for bad english..

import os
with open('data.txt') as f:
for line in f:
os.system('python other.py ' + line)
If the contents of other.py are as follows:
import sys
print sys.argv[1]
then the output of the first code snippet would be the contents of your data.txt.
I hope this was what you wanted, instead of simply printing by print, you can process your line too.

Due to the Linux tag i suggest you a way to do what you want using bash
process_file.sh:
#!/bin/bash
#your input file
input=my_file.txt
#your python script
py_script=script.py
# use each line of the file `input` as argument of your script
while read line
do
python $py_script $line
done < "$input"
you can access the passed lines in python as follow:
script.py:
import sys
print sys.argv[1]

Hope below solution will be helpful for you :
with open('name.txt') as fp:
for line in fp:
subprocess.check_output('python name.py {}'.format(line), shell=True)
Sample File I have used :
name.py
import sys
name = sys.argv[1]
print name
name.txt:
harry
kat
patrick

Your approach subjects each line of your file to evaluation by the shell, which will break when (not if) it comes across a line with any of the characters with special meaning to the shell: spaces, quotes, parentheses, ampersands, semicolons, etc. Even if today's input file doesn't contain any such character, your next project will. So learn to do this correctly today:
for line in openfile:
subprocess.call("/home/dogher/Desktop/theHarvester-master/theHarvester.py",
"-d", line, "-l", "300", "-b", "google", "-f", line+".html")
Since the command line arguments do not need to be parsed, subprocess will execute your command without involving a shell.

Python - glob.glob doesn't find *.txt in specified filepath within Unix OS

I am converting some Python scripts I wrote in a Windows environment to run in Unix (Red Hat 5.4), and I'm having trouble converting the lines that deal with filepaths. In Windows, I usually read in all .txt files within a directory using something like:
pathtotxt = "C:\\Text Data\\EJC\\Philosophical Transactions 1665-1678\\*\\*.txt"
for file in glob.glob(pathtotxt):
It seems one can use the glob.glob() method in Unix as well, so I'm trying to implement this method to find all text files within a directory entitled "source" using the following code:
#!/usr/bin/env python
import commands
import sys
import glob
import os
testout = open('testoutput.txt', 'w')
numbers = [1,2,3]
for number in numbers:
testout.write(str(number + 1) + "\r\n")
testout.close
sourceout = open('sourceoutput.txt', 'w')
pathtosource = "/afs/crc.nd.edu/user/d/dduhaime/data/hill/source/*.txt"
for file in glob.glob(pathtosource):
with open(file, 'r') as openfile:
readfile = openfile.read()
souceout.write (str(readfile))
sourceout.close
When I run this code, the testout.txt file comes out as expected, but the sourceout.txt file is empty. I thought the problem might be solved if I change the line
pathtosource = "/afs/crc.nd.edu/user/d/dduhaime/data/hill/source/*.txt"
to
pathtosource = "/source/*.txt"
and then run the code from the /hill directory, but that didn't resolve my problem. Do others know how I might be able to read in the text files in the source directory? I would be grateful for any insights others can offer.
EDIT: In case it is relevant, the /afs/ tree of directories referenced above is located on a remote server that I'm ssh-ing into via Putty. I'm also using a test.job file to qsub the Python script above. (This is all to prepare myself to submit jobs on the SGE cluster system.) The test.job script looks like:
#!/bin/csh
#$ -M dduhaime#nd.edu
#$ -m abe
#$ -r y
#$ -o tmp.out
#$ -e tmp.err
module load python/2.7.3
echo "Start - `date`"
python tmp.py
echo "Finish - `date`"

Got it! I had misspelled the output command. I wrote
souceout.write (str(readfile))
instead of
sourceout.write (str(readfile))
What a dunce. I also added a newline bit to the line:
sourceout.write (str(readfile) + "\r\n")
and it works fine. I think it's time for a new IDE!

You haven't really closed the file. The function testout.close() isn't called, because you have forgotten the parentheses. The same is for sourceout.close()
testout.close
...
sourceout.close
Has to be:
testout.close()
...
sourceout.close()
If the program finishes all files are automatically closed so it is only important if you reopen the file.
Even better (the pythonic version) would be to use the with statement. Instead of this:
testout = open('testoutput.txt', 'w')
numbers = [1,2,3]
for number in numbers:
testout.write(str(number + 1) + "\r\n")
testout.close()
you would write this:
with open('testoutput.txt', 'w') as testout:
numbers = [1,2,3]
for number in numbers:
testout.write(str(number + 1) + "\r\n")
In this case the file will be automatically closed even when an error occurs.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unix cat function (cat * > merged.txt) in Python? [duplicate] - python

Use the fileinput module: import fileinput import glob with open('/path/to/merged.txt', 'w') as f: for line in fileinput.input(glob.glob('/path/to/files/*')): f.write(line) fileinput.close()

As we know we are going to use "Unix" cat command (unless you are looking for a pythonic way or being performance concious) You can use import os os.system("cd mydir;cat * > merged.txt") or as pointed by 1_CR (Thanks) and explained here Python: How to Redirect Output with Subprocess?

Related

How to redirect input to python script

Trying to avoid shell=True in a Python subprocess

How to write help() documentation to a file Python? [duplicate]

How to execute a command for each lines in a txt file [duplicate]

Python - glob.glob doesn't find *.txt in specified filepath within Unix OS

Categories

Resources