Python's os.copy_file_range not working with O_APPEND

Python's os.copy_file_range not working with O_APPEND - python

I want to copy the content of a file 'from_path' to the end of another file 'to_path'. I wrote the code
fd_from = os.open(from_path, os.O_RDONLY)
fd_to = os.open(to_path, os.O_WRONLY | os.O_APPEND)
os.copy_file_range(fd_from, fd_to, os.path.getsize(from_path))
os.close(fd_from)
os.close(fd_to)
However, I get the following error
OSError: [Errno 9] Bad file descriptor
on the third line.
This (or something similar) was working fine, but now I can't avoid said error, even though (I believe) I haven't changed anything.
I looked around online and figured that this error usually happens because a file was not properly opened/close. However, that should not be the case here.
If we do, for example
fd_to = os.open(to_path, os.O_WRONLY | os.O_APPEND)
os.write(fd_to, b'something')
os.close(fd_to)
Everything works smoothly.
Also, if I write the exact same code as the problematic one, but without O_APPEND, everything works as well.
I am using Python 3.8.13, glibc 2.35 and linux kernel 5.15.0.
Note that efficiency is important in my case, thus many of the alternatives I've came across are undesirable.
Some of the alternatives that were found to be slower than this particular method are:
Using subprocess to launch the unix utility cat;
Iterating over the lines of the first file and appending them to the second.
While I had the implementation with copy_file_range working, I managed to find that this was around 2.6 times faster than cat and 14 times faster than iterating over the lines.
I've also read about shutil and other methods, but those don't seem to allow appending of the copied contents.
Can anyone explain the problem? Does this function not work with append mode? Or maybe there is a workaround?
Thank you in advance for your help!

Related

Using os.system to operate a .py file on many files

I hope that I can ask this in a clear way, im very much a beginner to python and forums in general so I apologise if i've got anything wrong from the start!
My issue is that I am currently trying to use os.system() to enable a program to run on every file within a directory (this is a directory of ASCII tables which I am crossing with a series of other tables to find matches.
import os
for filename in os.listdir('.'):
os.system('stilts tmatch2 ifmt1=ascii ifmt2=ascii in1=intern in2= %s matcher=2d values1='col1 col2' values2='col1 col2' params=5 out= %s-table.fits'%(filename,filename))
So what im hoping this would do is for every 'filename' it would operate this program known as stilts. Im guessing this gets interrupted/doesn't work because of the presence of apostrophes ' in the line of code itself, which must disrupt the syntax? (please correct me if I am wrong)
I then replaced the ' in os.system() with "" instead. This, however, stops me using the %s notation to refer to filenames throughout the code (at least I am pretty sure anyway).
import os
for filename in os.listdir('.'):
os.system("stilts tmatch2 ifmt1=ascii ifmt2=ascii in1=intern in2= %s matcher=2d values1='col1 col2' values2='col1 col2' params=5 out= %s-table.fits"%(filename,filename))
This now runs but obviously doesn't work, as it inteferes with the %s input.
Any ideas how I can go about fixing this? are there any alternative ways to refer to all of the other files given by 'filename' without using %s?
Thanks in advance and again, sorry for my inexperience with both coding and using this forum!

I am not familiar with os.system() but maybe if you try do some changes about the string you are sending to that method before it could behave differently.
You must know that in python you can "sum" strings so you can save your commands in a variable and add the filenames as in:
os.system(commands+filename+othercommands+filename)
other problem that could be working is that when using:
for file in os.listdir()
you may be recievin file types instead of the strings of their names. Try using a method such as filename.name to check if this is a different type of thing.
Sorry I cant test my answers for you but the computer I am using is too slow for me to try downloading python.

Writing text in the Windows console using Python after a specific program have taken control

I have a software - FUNCOR2.exe - for calculating the autocorrelation function given a data file, i.e. test.txt.
First, I execute FUNCOR2.exe from Windows command line and then the program takes control and asks me for the input data file.
I want to automate this in Python, so I can use working:
os.system("FUNCOR2")
But then I am not able to type INTO the program the input file name.
So far I've tried:
PressKey(0x54) # T
PressKey(0x45) # E
PressKey(0x53) # S
PressKey(0x54) # T
PressKey(110) # .
PressKey(0x54) # T
PressKey(0x58) # X
PressKey(0x54) # T
which I took from Generate keyboard events, but it does not work, and also:
win32api.keybd_event(0x54, 0)
win32api.keybd_event(0x45, 0)
win32api.keybd_event(0x53, 0)
win32api.keybd_event(0x54, 0)
win32api.keybd_event(110, 0)
win32api.keybd_event(0x54, 0)
win32api.keybd_event(0x58, 0)
win32api.keybd_event(0x54, 0)
it does not work either.
This program does not accept arguments, so I cannot use:
FUNCOR2.exe test.txt
I've found something similar in here: Writing in the cmd after executing an app.exe from java, but not at all.
Any ideas?

The only thing that I've found (other than your question) which refers to FUNCOR2 is the paper "A library of computer programs for assisting teaching and research in cyclostratigraphic analysis" . Since this is from the Journal "Computers and Geosciences" and since you are in the GIS field, I assume that this is correct.
There are a couple of possibilities (beyond sending keystrokes, which is dicey at best):
1) The paper clearly gives the formula being used (and a quick glance at the actual Fortran code downloadable from the Journal's website confirms that nothing else is done) suggests that this is easily implemented in Python, either straight-Python or using pandas (which has functions for computing autocorrelations).
2) Modify the Fortran source and recompile with an open source Fortran compiler (I'm not a Fortran expert, but this seems to be Fortran 77). Towards the beginning of the code you see:
READ (5,100,ERR=1) CFIL1
OPEN (1,FILE=CFIL1)
The first line is how the code gets the file name from the user. Replace that line by a line which reads a command line argument and voila -- you have a version of FUNCOR2 which gets its input file from the command line, hence easily invoked from Python. It should be easy enough to find examples for getting a file name from a command line argument in Fortran. My guess is just 1 or 2 lines of code replacing that line would be enough. I am not interested enough to try, and doubt that it is worthwhile. This is because another line in the source is:
DIMENSION X(1024),V(1024),COR(200),NPA(200)
Oddly enough, the program would fail if your file has more than 1024 observations. Maybe something like that made sense in the late 90s when the paper was written, but there is almost certainly equivalent pandas code which will be able to handle millions of observations. Sometimes old code should be allowed to die.

Python OSError: Too many open files

I'm using Python 2.7 on Windows XP.
My script relies on tempfile.mkstemp and tempfile.mkdtemp to create a lot of files and directories with the following pattern:
_,_tmp = mkstemp(prefix=section,dir=indir,text=True)
<do something with file>
os.close(_)
Running the script always incurs the following error (although the exact line number changes, etc.). The actual file that the script is attempting to open varies.
OSError: [Errno 24] Too many open files: 'path\\to\\most\\recent\\attempt\\to\\open\\file'
Any thoughts on how I might debug this? Also, let me know if you would like additional information. Thanks!
EDIT:
Here's an example of use:
out = os.fdopen(_,'w')
out.write("Something")
out.close()
with open(_) as p:
p.read()

You probably don't have the same value stored in _ at the time you call os.close(_) as at the time you created the temp file. Try assigning to a named variable instead of _.
If would help you and us if you could provide a very small code snippet that demonstrates the error.

why not use tempfile.NamedTemporaryFile with delete=False? This allows you to work with python file objects which is one bonus. Also, it can be used as a context manager (which should take care of all the details making sure the file is properly closed):
with tempfile.NamedTemporaryFile('w',prefix=section,dir=indir,delete=False) as f:
pass #Do something with the file here.

Python code not writing to file unless run in interpreter

I have written a few lines of code in Python to see if I can make it read a text file, make a list out of it where the lines are lists themselves, and then turn everything back into a string and write it as output on a different file. This may sound silly, but the idea is to shuffle the items once they are listed, and I need to make sure I can do the reading and writing correctly first. This is the code:
import csv,StringIO
datalist = open('tmp/lista.txt', 'r')
leyendo = datalist.read()
separando = csv.reader(StringIO.StringIO(leyendo), delimiter = '\t')
macrolist = list(separando)
almosthere = ('\t'.join(i) for i in macrolist)
justonemore = list(almosthere)
arewedoneyet = '\n'.join(justonemore)
with open('tmp/randolista.txt', 'w') as newdoc:
newdoc.write(arewedoneyet)
newdoc.close()
datalist.close()
This seems to work just fine when I run it line by line on the interpreter, but when I save it as a separate Python script and run it (myscript.py) nothing happens. The output file is not even created. After having a look at similar issues raised here, I have introduced the 'with' parameter (before I opened the output file through output = open()), I have tried flushing as well as closing the file... Nothing seems to work. The standalone script does not seem to do much, but the code can't be too wrong if it works on the interpreter, right?
Thanks in advance!
P.S.: I'm new to Python and fairly new to programming, so I apologise if this is due to a shallow understanding of a basic issue.

Where are the input file and where do you want to save the output file. For this kind of scripts i think that it's better use absolute paths
Use:
open('/tmp/lista.txt', 'r')
instead of:
open('tmp/lista.txt', 'r')
I think that the error can be related to this

It may have something to do with where you start your interpreter.
Try use a absolute path /tmp/randolista.txt instead of relative path tmp/randolista.txt to isolate the problem.

Whats the best way to get the filesize?

There are actually three ways I have in mind to determine a files size:
open and read it, and get the size of the string with len()
using os.stat and getting it via st_size -> what should be the "right" way because its handled by the underlying os
os.path.getsize what should be the same as above
So what is the actual right way to determine the filesize? What is the worst way to do?
Or doesn't it even matter because at the end it is all the same?
(I can imagine the first method having a problem with really large files, while the two others have not)

The first method would be a waste if you don't need the contents of the file anyway. Either of your other two options are fine. os.path.getsize() uses os.stat()
From genericpath.py
def getsize(filename):
"""Return the size of a file, reported by os.stat()."""
return os.stat(filename).st_size
Edit:
In case it isn't obvious, os.path.getsize() comes from genericpath.py.
>>> os.path.getsize.__code__
<code object getsize at 0x1d457b0, file "/usr/lib/python2.7/genericpath.py", line 47>

Method 1 is the slowest way possible. Don't use it unless you will need the entire contents of the file as a string later.
Methods 2 and 3 are the fastest, since they don't even have to open the file.
Using f.seek(os.SEEK_END) and f.tell() requires opening the file, and might be a bit slower than 2&3 unless you're going to open the file anyway.
All methods will give the same result when no other program is writing to the file. If the file is in the middle of being modified when your code runs, seek+tell can sometimes give you a more up-to-date answer than 2&3.

no. 1 is definitely the worst. If at all, it's better to seek() and tell(), but that's not as good as the other two.
no. 2 and no. 3 are equally ok IMO. I think no. 3 is a bit clearer to read, but that's negligible.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.