Is Python's seek() on OS X broken? - python

I'm trying to implement a simple method to read new lines from a log file each time the method is called.
I've looked at the various suggestions both on stackoverflow (e.g. here) and elsewhere for simulating "tail" functionality; most involve using readline() to read in new lines as they're appended to the file. It should be simple enough, but can't get it to work properly on OS X 10.6.4 with the included Python 2.6.1.
To get to the heart of the problem, I tried the following:
Open two terminal windows.
In one, create a text file "test.log" with three lines:
one
two
three
In the other, start python and execute the following code:
Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.stat('test.log')
posix.stat_result(st_mode=33188, st_ino=23465217, st_dev=234881025L, st_nlink=1, st_uid=666, st_gid=20, st_size=14, st_atime=1281782739, st_mtime=1281782738, st_ctime=1281782738)
>>> log = open('test.log')
>>> log.tell()
0
>>> log.seek(0,2)
>>> log.tell()
14
>>>
So we see with the tell() that seek(0,2) brought us to the end of the file as reported by os.stat(), byte 14.
In the first shell, add another two lines to "test.log" so that it looks like this:
one
two
three
four
five
Go back to the second shell, and execute the following code:
>>> os.stat('test.log')
posix.stat_result(st_mode=33188, st_ino=23465260, st_dev=234881025L, st_nlink=1, st_uid=666, st_gid=20, st_size=24, st_atime=1281783089, st_mtime=1281783088, st_ctime=1281783088)
>>> log.seek(0,2)
>>> log.tell()
14
>>>
Here we see from os.stat() that the file's size is now 24 bytes, but seeking to the end of the file somehow still points to byte 14?? I've tried the same on Ubuntu with Python 2.5 and it works as I expect. I tried with 2.5 on my Mac, but got the same results as with 2.6.
I must be missing something fundamental here. Any ideas?

How are you adding two more lines to the file?
Most text editors will go through operations a lot like this:
fd = open(filename, read)
file_data = read(fd)
close(fd)
/* you edit your file, and save it */
unlink(filename)
fd = open(filename, write, create)
write(fd, file_data)
The file is different. (Check it with ls -li; the inode number will change for almost every text editor.)
If you append to the log file using your shell's >> redirection, it'll work exactly as it should:
$ echo one >> test.log
$ echo two >> test.log
$ echo three >> test.log
$ ls -li test.log
671147 -rw-r--r-- 1 sarnold sarnold 14 2010-08-14 04:15 test.log
$ echo four >> test.log
$ ls -li test.log
671147 -rw-r--r-- 1 sarnold sarnold 19 2010-08-14 04:15 test.log
>>> log=open('test.log')
>>> log.tell()
0
>>> log.seek(0,2)
>>> log.tell()
19
$ echo five >> test.log
$ echo six >> test.log
>>> log.seek(0,2)
>>> log.tell()
28
Note that the tail(1) command has an -F command line option to handle the case where the file is changed, but a file by the same name exists. (Great for watching log files that might be periodically rotated.)

Short answer: no, your assumptions are.
Your text editor is creating a new file with the same name, not modifying the old file in place. You can see in your stat result that the st_ino is different. If you were to do os.fstat(log.fileno()), you'd get the old size and old st_ino.
If you want to check for this in your implementation of tail, periodically compare the st_ino of the stat and fstat results. If they differ, there's a new file with the same name.

Related

Using ! to assign output of shell command to variable in .ipynb

Is there a way to get this on one line:
> ipython
Python 3.11.1 (main, Jan 24 2023, 17:02:06) [Clang 14.0.0 (clang-1400.0.29.202)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.8.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: l = ! ls
In [2]: l[:3]
Out[2]: ['___x.ipynb', 'data', 'e.txt']
In [3]: l = (! ls)[:3]
Cell In[3], line 1
l = (! ls)[:3]
^
SyntaxError: invalid syntax
?
Ref: https://jakevdp.github.io/PythonDataScienceHandbook/01.05-ipython-and-shell-commands.html
I came to the same conclusion as Azeem, that there is no direct way to do this since iPython only seems to support direct assignment of shell output with =. l = !ls | head -n 3 is the probably the best and most readable way to do this in one line in iPython, but here's a Python alternative that will work anywhere:
import subprocess
l = subprocess.run(['ls'], stdout=subprocess.PIPE).stdout.decode('utf-8').splitlines()[:3]
There doesn't seem to exist a direct way to achieve that in one line under IPython.
According to that blog post:
anything appearing after ! on a line will be executed not by the Python kernel, but by the system command-line
and, IPython's System shell access:
Any input line beginning with a ! character is passed verbatim (minus the !, of course) to the underlying operating system.
Except for the variables in curly braces that are expanded before the command is passed to the underlying OS shell.
One possible solution could be to leverage the shell head command for this:
ls | head -n 3
and, in IPython:
files = !ls | head -n 3
Another alternative could be to resort to Python solutions such as os.listdir():
import os
files = [e for e in os.listdir() if not e.startswith('.') and os.path.isfile(e) ][:3]
or, glob.glob():
import glob
files = glob.glob('*')[:3]

What does `if file.find('freq-') != -1` mean?

I'm a chemistry student and want to write a script to extract some data (like coupling constants and interproton distance) from gaussian output files.
I found a script which extracts chemical shifts from gaussian output files. However, I don't understand what does if file.find('freq-') !=-1 mean in the script.
Here's part of the script (since the script also does other things as well so I've just sown the bit relevant to my question):
def read_gaussian_freq_outfiles(list_of_files):
list_of_freq_outfiles = []
for file in list_of_files:
if file.find('freq-') !=-1:
list_of_freq_outfiles.append([file,int(get_conf_number(file)),open(file,"r").readlines()])
return list_of_freq_outfiles
def read_gaussian_outputfiles():
list_of_files = []
for file in glob.glob('*.out'):
list_of_files.append(file)
return list_of_files
I think in the def read_gaussian_outputfiles() bit, we create a list of file and simply add all file with extension '.out' to the list.
The read_gaussian_freq_outfiles(list_of_files) bit has probably list files which has "freq-" in the file name. But what does the file.find('freq-')!=-1 mean?
Does it mean if whatever we find in the file name doesn't equal to -1, or something else?
Some other additional information: the format of the gaussian output filename is: xxxx-opt_freq-conf-yyyy.out where xxxx is the name of your molecule and yyyy is a number.
When s.find(foo) fails to find foo in s, it returns -1. Therefore, when s.find(foo) does not return -1, we know it didn't fail.
read_gaussian_freq_outfiles looks for the term "freq-" in each of the names of files in list_of_files. If it succeeds in finding this phrase in the name of a file, it appends a list containing this file, a "conf number" (not sure what this is), and the contents of the file, to a list called list_of_freq_outfiles.
I created three files, goodbye.txt, hello.txt, and helloworld.txt to demonstrate usage.
In this example, I'll print all files that end with .txt, create a list of files, then print all files that have the phrase "goodbye" in the filename. This should only print goodbye.txt.
09:53 $ ls
goodbye.txt hello.txt helloworld.txt
(venv) ✔ ~/Desktop/ex
09:53 $ python
Python 2.7.11 (default, Dec 5 2015, 14:44:47)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import glob
>>> for file in glob.glob('*.txt'):
... print(file)
...
goodbye.txt
hello.txt
helloworld.txt
>>> list_of_files = [ file for file in glob.glob('*.txt') ]
>>> print(list_of_files)
['goodbye.txt', 'hello.txt', 'helloworld.txt']
>>> for file in list_of_files:
... if file.find('goodbye') != -1:
... print(file)
...
goodbye.txt
Indeed, goodbye.txt is the only file printed.
As the other answers also show: if .find() retrieves -1, it cannot find what you're looking for. This has to do with the fact that .find will return the first index at which it can find your query. So in the following sentence
The cat is on the mat
and sentence.find('cat'), it will return 4 (since 'cat' starts at index 4 (it starts at 0!)).
However, sentence.find('dog') will return the only thing it can return if it cannot find it: -1. If it returned 0 as the 'cannot find', you might think your query starts at index 0. With -1, you know it could not find it.
String find method in python looks at the occurrence of a sub-string in a given string (ref http://www.tutorialspoint.com/python/string_find.htm)
Here it is looking for all the filenames with 'freq-' sub-string in them.

ASK SPARQL queries in rdflib

I'm trying to learn SPARQL and I'm using python's rdflib to train.
I've done a few tries, but any ASK query always seems to give me back a True result.
For instance, i tried the following:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import rdflib
mygraph=rdflib.Graph();
mygraph.parse('try.ttl',format='n3');
results=mygraph.query("""
ASK {?p1 a <http://false.com>}
""")
print bool(results)
The result is true, even if there is no subject of type false.com in 'try.ttl'.
Can anyone explain me why?
Thank you in advance for your help!
UPDATE: Reading the rdflib manual, I found out that results is of type list and (in my case) should contain a single boolean with the return value from the ask query.
I tried the following:
for x in results:
print x
And I got "None".
I'm guessing I don't use the query method in the right way.
The documentation doesn't actually says that it's of type list, but that you can iterate over it, or you can convert it to a boolean:
If the type is "ASK", iterating will yield a single bool (or
bool(result) will return the same bool)
This means that print bool(results), as you've done, should work. In fact, your code does work for me:
$ touch try.ttl
$ cat try.ttl # it's empty
$ cat test.py # same code
#!/usr/bin/python
# -*- coding: utf-8 -*-
import rdflib
mygraph=rdflib.Graph();
mygraph.parse('try.ttl',format='n3');
results=mygraph.query("""
ASK {?p1 a <http://false.com>}
""")
print bool(results)
$ ./test.py # the data is empty, so there's no match
False
If we add some data to the file that would make the query return true, we get true:
$ cat > try.ttl
<http://example.org> a <http://false.com> .
$ cat try.ttl
<http://example.org> a <http://false.com> .
$ ./test.py
True
Maybe you're using an older version of the library? Or a newer version and a bug was introduced? I'm using 4.0.1:
$ python
Python 2.7.3 (default, Feb 27 2014, 19:58:35)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pkg_resources
>>> pkg_resources.get_distribution("rdflib").version
'4.0.1'

Learn Python the Hard Way, Exercise 15

I'm trying to solve exercise 15's extra credit questions of Zed Shaw's Learn Python the Hard Way but I've ran into a problem. The code is as follows:
from sys import argv
script, filename = argv
txt = open(filename)
print "Here's your file %r:" % filename
print txt.read()
print "I'll also ask you to type it again:"
file_again = raw_input("> ")
txt_again = open(file_again)
print txt_again.read()
print txt_again.read()
I understand all the code that has been used, but extra credit question 7 asks:
Startup python again and use open from the prompt. Notice how you can open files and run read on them right there?
I've tried inputting everything I could think of in terminal (on a mac) after first starting up python with the 'python' command, but I can't get the code to run. What should I be doing to get this piece of code to run from the prompt?
Zed doesn't say to run this particular piece of code from within Python. Obviously, that code is getting the filename value from the parameters you used to invoke the script, and if you're just starting up the Python shell, you haven't used any parameters.
If you did:
filename = 'myfilename.txt'
txt = open(filename)
then it would work.
I just started with open(xyz.txt)
Well, yes, of course that isn't going to work, because you don't have a variable xyz, and even if you did, it wouldn't have an attribute txt. Since it's a file name, you want a string "xyz.txt", which you create by putting it in quotes: 'xyz.txt'. Notice that Python treats single and double quotes more or less the same; unlike in languages like C++ and Java, there is not a separate data type for individual characters - they're just length-1 strings.
Basically, just like in this transcript (I've added blank lines to aid readability):
pax:~$ python
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> xyz = open ("minimal_main.c")
>>> print xyz.read()
int main (void) {
return 0;
}
>>> xyz.close()
>>> <CTRL-D>
pax:~$ _
All it's showing you is that you don't need a script in order to run Python commands, the command line interface can be used in much the same way.
print open('ex15_sample.txt').read()
After running python in terminal, we'll use open('filename.txt') to open the file and using the dot operator we can apply the read() function directly on it.
After running Python in terminal,
abc = open ("ex15_sample.txt")
print abc.read()
That should do.

Gzip and subprocess' stdout in python

I'm using python 2.6.4 and discovered that I can't use gzip with subprocess the way I might hope. This illustrates the problem:
May 17 18:05:36> python
Python 2.6.4 (r264:75706, Mar 10 2010, 14:41:19)
[GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import gzip
>>> import subprocess
>>> fh = gzip.open("tmp","wb")
>>> subprocess.Popen("echo HI", shell=True, stdout=fh).wait()
0
>>> fh.close()
>>>
[2]+ Stopped python
May 17 18:17:49> file tmp
tmp: data
May 17 18:17:53> less tmp
"tmp" may be a binary file. See it anyway?
May 17 18:17:58> zcat tmp
zcat: tmp: not in gzip format
Here's what it looks like inside less
HI
^_<8B>^H^Hh<C0><F1>K^B<FF>tmp^#^C^#^#^#^#^#^#^#^#^#
which looks like it put in the stdout as text and then put in an empty gzip file. Indeed, if I remove the "Hi\n", then I get this:
May 17 18:22:34> file tmp
tmp: gzip compressed data, was "tmp", last modified: Mon May 17 18:17:12 2010, max compression
What is going on here?
UPDATE:
This earlier question is asking the same thing: Can I use an opened gzip file with Popen in Python?
You can't use file-likes with subprocess, only real files. The fileno() method of GzipFile returns the FD of the underlying file, so that's what the echo redirects to. The GzipFile then closes, writing an empty gzip file.
just pipe that sucker
from subprocess import Popen,PIPE
GZ = Popen("gzip > outfile.gz",stdin=PIPE,shell=True)
P = Popen("echo HI",stdout=GZ.stdin,shell=True)
# these next three must be in order
P.wait()
GZ.stdin.close()
GZ.wait()
I'm not totally sure why this isn't working (perhaps the output redirection is not calling python's write, which is what gzip works with?) but this works:
>>> fh.write(subprocess.Popen("echo Hi", shell=True, stdout=subprocess.PIPE).stdout.read())
You don't need to use subprocess to write to the gzip.GzipFile. Instead, write to it like any other file-like object. The result is automagically gzipped!
import gzip
with gzip.open("tmp.gz", "wb") as fh:
fh.write('echo HI')

Categories

Resources