with ruby I can
File.open('yyy.mp4', 'w') { |f| f.write(File.read('xxx.mp4')}
Can I do this using Python ?
Sure you can:
with open('yyy.mp4', 'wb') as f:
f.write(open('xxx.mp4', 'rb').read())
Note the binary mode flag there (b), since you are copying over mp4 contents, you don't want python to reinterpret newlines for you.
That'll take a lot of memory if xxx.mp4 is large. Take a look at the shutil.copyfile function for a more memory-efficient option:
import shutil
shutil.copyfile('xxx.mp4', 'yyy.mp4')
Python is not about writing ugly one-liner code.
Check the documentation of the shutil module - in particular the copyfile() method.
http://docs.python.org/library/shutil.html
You want to copy a file, do not manually read then write bytes, use file copy functions which are generally much better and efficient for a number of reasons in this simple case.
If you want a true one-liner, you can replace line-breaks by semi-colons :
import shutil; shutil.copyfile("xxx.mp4","yyy.mp4")
Avoid this! I did that once to speed-up an extremely specific case completely unrelated to Python, but by the presence of line-breaks in my python -c "Put 🐍️ code here" command-line and the way Meson handle it.
Related
I'm planning on using a Python script to change different BAM (Binary Alignment Map) file headers. Right now I am just testing the output of one bam file but every time I want to check my output, the stdout is not human readable. How can I see the output of my script? Should I use samtools view bam.file on my script? This is my code.
#!/usr/bin/env python
import os
import subprocess
if __name__=='__main__':
for file in os.listdir(os.getcwd()):
if file == "SRR4209928.bam":
with open("SRR4209928.bam", "r") as input:
content = input.readlines()
for line in content:
print(line)
Since BAM is a binary type of SAM, you will need to write something that knows how to deal with the compressed data before you can extract something meaningful from it. Unfortunately, you can't just open() and readlines() from that type of file.
If you are going to write a module by yourself, you will need to read Sequence Alignment/Map Format Specification.
Fortunately someone already did that and created a Python module: You can go ahead and check pysam out. It will surely make your life easier.
I hope it helps.
There are actually three ways I have in mind to determine a files size:
open and read it, and get the size of the string with len()
using os.stat and getting it via st_size -> what should be the "right" way because its handled by the underlying os
os.path.getsize what should be the same as above
So what is the actual right way to determine the filesize? What is the worst way to do?
Or doesn't it even matter because at the end it is all the same?
(I can imagine the first method having a problem with really large files, while the two others have not)
The first method would be a waste if you don't need the contents of the file anyway. Either of your other two options are fine. os.path.getsize() uses os.stat()
From genericpath.py
def getsize(filename):
"""Return the size of a file, reported by os.stat()."""
return os.stat(filename).st_size
Edit:
In case it isn't obvious, os.path.getsize() comes from genericpath.py.
>>> os.path.getsize.__code__
<code object getsize at 0x1d457b0, file "/usr/lib/python2.7/genericpath.py", line 47>
Method 1 is the slowest way possible. Don't use it unless you will need the entire contents of the file as a string later.
Methods 2 and 3 are the fastest, since they don't even have to open the file.
Using f.seek(os.SEEK_END) and f.tell() requires opening the file, and might be a bit slower than 2&3 unless you're going to open the file anyway.
All methods will give the same result when no other program is writing to the file. If the file is in the middle of being modified when your code runs, seek+tell can sometimes give you a more up-to-date answer than 2&3.
no. 1 is definitely the worst. If at all, it's better to seek() and tell(), but that's not as good as the other two.
no. 2 and no. 3 are equally ok IMO. I think no. 3 is a bit clearer to read, but that's negligible.
I am looking for a few scripts which would allow to manipulate generic csv files...
typically something like:
add-row FILENAME INSERT_ROW
get-row FILENAME GREP_ROW
replace-row FILENAME GREP_ROW INSERT_ROW
delete-row FILENAME GREP_ROW
where
FILENAME the name of a csv file, with the first row containing headers, "" used to delimit strings which might contain ','
GREP_ROW a string of pairs field1=value1[,fieldN=valueN,...] used to identify a row based on its fields values in a csv file
INSERT_ROW a string of pairs field1=value1[,fieldN=valueN,...] used to replace(or add) the fields of a row.
peferably in python using the csv package...
ideally leveraging python to associate each field as a variable and allowing more advanced GREP rules like fieldN > XYZ...
Perl has a tradition of in-place editing derived from the unix philosophy.
We could for example write simple add-row-by-num.pl command as follows :
#!/usr/bin/perl -pi
BEGIN { $ln=shift; $line=shift; }
print "$line\n" if $ln==$.;
close ARGV if eof;
Replace the third line by $_="$line\n" if $ln==$.; to replace lines. Eliminate the $line=shift; and replace the third line by $_ = "" if $ln==$.; to delete lines.
We could write a simple add-row-by-regex.pl command as follows :
#!/usr/bin/perl -pi
BEGIN { $regex=shift; $line=shift; }
print "$line\n" if /$regex/;
Or simply the perl command perl -pi -e 'print "LINE\n" if /REGEX/'; FILES. Again, we may replace the print $line by $_="$line\n" or $_ = "" for replace or delete, respectively.
We do not need the close ARGV if eof; line anymore because we need not rest the $. counter after each file is processed.
Is there some reason the ordinary unix grep utility does not suffice? Recall the regular expression (PATERN){n} matches PATERN exactly n times, i.e. (\s*\S+\s*,){6}{\s*777\s*,) demands a 777 in the 7th column.
There is even a perl regular expression to transform your fieldN=value pairs into this regular expression, although I'd use split, map, and join myself.
Btw, File::Inplace provides inplace editing for file handles.
Perl has the DBD::CSV driver, which lets you access a CSV file as if it were an SQL database. I've played with it before, but haven't used it extensively, so I can't give a thorough review of it. If your needs are simple enough, this may work well for you.
App::CCSV does some of that.
The usual way in Python is to use the csv.reader to load the data into a list of tuples, then do your add/replace/get/delete operations on that native python object, and then use csv.writer to write the file back out.
In-place operations on CSV files wouldn't make much sense anyway. Since the records are not typically of fixed length, there is no easy way to insert, delete, or modify a record without moving all the other records at the same time.
That being said, Python's fileinput module has a mode for in-place file updates.
We're using a python based application which reads a configuration file containing a couple of arrays:
Example layout of config file:
array1 = [
'bob',
'sue',
'jayne'
]
Currently changes to the configuration are done by hand, but I've written a little interface to streamline the process (mainly to avoid errors).
It currently reads in the existing configuration, using a simple "import". However what I'm not sure how to do, is get my script to write it's output in valid python, so that the main application can read it again.
How can I can dump the array back into the file, but in valid python?
Cheers!
I'd suggest JSON or YAML (Less verbose than JSON) for configuration files. That way, the configuration file becomes more readable for the less pythonate ;) It's also easier to throw adequate errors, e.g. if the configuration is incomplete.
To save python objects you can always use pickle.
Generally using repr() will create a string that can be re-avaluated. But pprint does a little nicer output.
from pprint import pprint
outf.write("array1 = "); pprint(array1, outf)
repr(array1) (and write that into the file) would be a very simple solution, but it should work here.
I need to generate a tar file but as a string in memory rather than as an actual file. What I have as input is a single filename and a string containing the assosiated contents. I'm looking for a python lib I can use and avoid having to role my own.
A little more work found these functions but using a memory steam object seems a little... inelegant. And making it accept input from strings looks like even more... inelegant. OTOH it works. I assume, as most of it is new to me. Anyone see any bugs in it?
Use tarfile in conjunction with cStringIO:
c = cStringIO.StringIO()
t = tarfile.open(mode='w', fileobj=c)
# here: do your work on t, then...:
s = c.getvalue() # extract the bytestring you need