Dynamic file creation(naming) - python

Is it possible to do the following in python?
i=1
while True:
w = open("POSCAR_i","w")
i=i+1
if i<10:
break
So Basically, it should create POSCAR_1 through POSCAR_10.
Thank you.

It is much more pythonic (and all around better) to use a for loop:
for idx in range(1,11):
f = open("POSCAR_%d" % idx, "w")
f.close()
You can also use the format() method, which is now officially preferred, although the % operator is still much more common in the wild.

The problem is that your break statement was executed on the first iteration.
The i variable is indeed less than 10 - so the loop terminates.
What you would need to do is something like this:
i = 1
while True:
w = open("POSCAR_%d" % i, "w")
w.close()
i += 1
if i == 10:
break
Don't forget to close the file object once you have finished with it (which in this case is immediately).
You could also just put the termination condition in the loop's definition:
i = 1
while i <= 10:
w = open("POSCAR_%d" % i, "w")
w.close()
i += 1

You would use str.format to pass in i as a variable:
w = open("POSCAR_{}".format(i),"w")
If you wanted 1 - 10, a for loop would do the same.
for i in range(1,11):
w = open("POSCAR_{}".format(i),"w")
But w will be reassigned each time.
You need to use if i==10 or your loop will end straight away as i is < 10 initially
You can use i <=10 as the condition and remove the if statement:
i=1
while i <= 10:
with open("POSCAR_{}".format(i),"w") as w: # with will close your files automatically
i+=1

Almost correct
i=1
while True:
w = open("POSCAR_%d" % i,"w")
i=i+1
if i>10:
break
will work

Basically the answer is no. Here is some information regarding why it is not possible and also some polishing.
There are a couple of problems regarding your code:
the line w = open("POSCAR_i","w") creates a file with the name of POSCAT_i. The i is perceived as a part of a string. You should construct such a string via using w = open("POSCAR_%d" % i, "w") instead of w = open("POSCAR_i","w").
The if executes the first time because its condition is met. As in the first iteration, i = 1, the condition i < 10 is met so break is executed. Change the if condition to i > 10.
Also it would b a good practice to use for loop instead of while loop. In this case, you would not an if block anymore.
It is also a good practice to close the file handle before using it for other purposes. Or an even better practice is using different file handles.
So I would write the code in this way:
# Previous codes
.
.
.
Files = [];
for i in range(1,11):
w = open("POSCAR_%d" % i, "w");
Files.append(w);
# Rest of the code
.
.
.
for w in Files:
w.close();
# End of code
Enjoy!

Related

Can't correctly extract information of a file

I am trying to extract some data of a file. For that purpose have made a script which reads the file and if some keyword is detected, it starts copying and then, when finds a blank line, it stops copying. I think it is not too bad, but is not working.
The python script i wrote is:
def out_to_mop (namefilein, namefileout):
print namefilein
filein=open(namefilein, "r")
fileout=open(namefileout, "w")
lines = filein.readlines()
filein.close()
#look for keyword "CURRENT.." to start copying
try:
indexmaxcycle = lines.index(" CURRENT BEST VALUE OF HEAT OF FORMATION")
indexmaxcycle += 5
except:
indexmaxcycle = 0
if indexmaxcycle != 0:
while lines[indexmaxcycle]!=" \n":
linediv = lines[indexmaxcycle].split()
symbol = linediv[0]
x = float(linediv[1])
indexmaxcycle += 1
fileout.write("%s \t %3.8f 1 \n" %(symbol, x))
else:
print "structure not found"
exit()
fileout.close()
This function is supposed to extract info from this file called file1.out:
CURRENT BEST VALUE OF HEAT OF FORMATION = -1161.249249
cycles=200 pm6 opt singlet eps=80 charge=-1
C -3.87724655 +1 1.30585983 +1 4.53273224 +1
H -7.60628859 +1 0.53968618 +1 3.72680573 +1
O -4.76978297 +1 4.45409715 +1 1.42608903 +1
H -4.66890488 +1 4.47267425 +1 2.41952335 +1
H -5.59468165 +1 3.93399792 +1 1.27757138 +1
**********************
* *
* JOB ENDED NORMALLY *
* *
**********************
but it prints "structure not found"
Would you help me a bit?
You try to find the beginning of the structure with the code line
indexmaxcycle = lines.index(" CURRENT BEST VALUE OF HEAT OF FORMATION")
The documentation for the index method says, "Return zero-based index in the list of the first item whose value is x. Raises a ValueError if there is no such item." However, that line you are searching for is not one of the file lines. The actual file line is
CURRENT BEST VALUE OF HEAT OF FORMATION = -1161.249249
Note the number at the end, which is not in your search string. Therefore, the index method raises an exception and you get an indexmaxcycle value of zero.
Since you apparently do not know the full contents of the file line in advance, you should loop through the input lines yourself and use the in operator to find a line that contains your search string. You could also use the startswith string method in this way:
for j, line in enumerate(lines):
if line.startswith(" CURRENT BEST VALUE OF HEAT OF FORMATION"):
indexmaxcycle = j + 5
break
else:
indexmaxcycle = 0
I dropped the try..except structure here, since I see no way an exception could be raised for this code. I could be wrong, of course.
You are looking for an exact match, but the line in the textfile is longer than the pattern you are looking for. Try to search for the beginning of the line instead:
pattern = " CURRENT BEST VALUE OF HEAT OF FORMATION"
try:
indexmaxcycle = [i for (i,s) in enumerate(lines) if s.startswith(pattern)][0]
indexmaxcycle += 5
etc.
[i for (i,s) in enumerate(lines) if s.startswith(pattern)] gives you all indices of elements that start with your pattern. If you add the [0] you get the first one.
I just noticed you can speed this up if you use generator expressions instead of list comprehensions:
pattern = " CURRENT BEST VALUE OF HEAT OF FORMATION"
try:
indexmaxcycle = next((i for (i,s) in enumerate(lines) if s.startswith('foo'))) + 5
except:
etc.
This will only search the list until it finds the first match.

How to optimize code of comparison between a list and strings through stdin - performance is important

Right now i have a script that receives strings from stdin and i have also a list which is about 70mb on disk (in many partial files) which i load into memory into one list.
I then search each string as it comes in from stdin and search if it exists in the list. I understand this is slow because of the huge list and the number of strings that can also be a great amount that come in.
It goes like this:
def buildindex():
# j = 0
# while j < len(parts_list):
# f = urllib2.urlopen("https://s3.amazonaws.com/source123/output/" + parts_list[j])
j = 0
while j <= 200:
if j < 10:
f = urllib2.urlopen("https://s3.amazonaws.com/source123/output/part-0000" + str(j))
if j < 100 and j >= 10:
f = urllib2.urlopen("https://s3.amazonaws.com/source123/output/part-000" + str(j))
if j >= 100:
f = urllib2.urlopen("https://s3.amazonaws.com/source123/output/part-00" + str(j))
for line in f.readlines():
line = line.rstrip()
yield line
print line
j += 1
f.close()
linelist = list(buildindex())
for suspicious_line in sys.stdin:
if "," in suspicious_line:
suspicious_key, suspicious_source, suspicious_start, suspicious_end = suspicious_line.strip().split(",")
x=re.compile(suspicious_key)
sub_list = filter(x.match, linelist)
# do something
I tried to run this locally and its been over 20 minutes and its still going. Also i will use these scripts on Amazon EMR (hadoop) and it also fails for some reason. If i try a subset of the list, it works.
What performance wise changes can i make to keep things neat and relatively fast?
The problem maybe not in for suspicious_line in sys.stdin block, but in the build_index. Reading files from s3 can be slow. Have you timed buildindex? Run the script without the for suspicious_line in sys.stdin block and see how much time it takes.
If buildindex is the problem, download the files to the disk.
If buildindex is not the problem, you can try using the "simpler" filter in instead of regex (creating a regex is expensive)
sub_list = [line for line in linelist if suspicious_line in line]

Turning Equation Into Function?

I need to turn following equation into a python function.
k = people[i]
i = people[j]
costs[i][j]
costs[j][k]
change = -costs[i][k] - costs[j][l] + costs[i][l] + cost[j][k]
I think what you're looking for is:
def change(i,j,costs,people):
k = people[i]
l = people[j] # was originally i = people[j] but unsure where l comes from otherwise
result = -costs[i][k] - costs[j][l] + costs[i][l] + cost[j][k]
return result
(and then call with:
mychange = change(i,j,costs,people)
)
Note that if my assumption on where l comes from is wrong, change that 3rd line back to i = people[j] and pass in l as well.
Also note that using l as a variable is a bad idea as it's rather hard to tell apart from 1 (and also, using i,j,k,l is bad when you could use a more descriptive value to make it clearer what this means)

Code: Code isn't working to sort through a list of 1 million integers, printing top 10

This is for homework, so I must try to use as little python functions as possible, but still allow for a computer to process a list of 1 million numbers efficiently.
#!/usr/bin/python3
#Find the 10 largest integers
#Don't store the whole list
import sys
import heapq
def fOpen(fname):
try:
fd = open(fname,"r")
except:
print("Couldn't open file.")
sys.exit(0)
all = fd.read().splitlines()
fd.close()
return all
words = fOpen(sys.argv[1])
numbs = map(int,words)
print(heapq.nlargest(10,numbs))
li=[]
count = 1
#Make the list
for x in words:
li.append(int(x))
count += 1
if len(li) == 10:
break
#Selection sort, largest-to-smallest
for each in range(0,len(li)-1):
pos = each
for x in range(each+1,10):
if li[x] > li[pos]:
pos = x
if pos != each:
li[each],li[pos] = li[pos],li[each]
for each in words:
print(li)
each = int(each)
if each > li[9]:
for x in range(0,9):
pos = x
if each > li[x]:
li[x] = each
for i in range(x+1,10):
li[pos],li[i] = li[i],li[pos]
break
#Selection sort, largest-to-smallest
for each in range(0,len(li)-1):
pos = each
for x in range(each+1,10):
if li[x] > li[pos]:
pos = x
if pos != each:
li[each],li[pos] = li[pos],li[each]
print(li)
The code is working ALMOST the way that I want it to. I tried to create a list from the first 10 digits. Sort them, so that it in descending order. And then have python ONLY check the list, if the digits are larger than the smaller one (instead of reading through the list 10(len(x)).
This is the output I should be getting:
>>>[9932, 9885, 9779, 9689, 9682, 9600, 9590, 9449, 9366, 9081]
This is the output I am getting:
>>>[9932, 9689, 9885, 9779, 9682, 9025, 9600, 8949, 8612, 8575]
If you only need the 10 top numbers, and don't care to sort the whole list.
And if "must try to use as little python functions as possible" means that you (or your theacher) prefer to to avoid heapq.
Another way could be to keep track of the 10 top numbers while you parse the whole file only one time:
top = []
with open('numbers.txt') as f:
# the first ten numbers are going directly in
for line in f:
top.add(int(line.strip()))
if len(top) == 10:
break
for line in f:
num = int(line.strip())
min_top = min(top)
if num > min_top: # check if the new number is a top one
top.remove(min_top)
top.append(num)
print(sorted(top))
Update: If you don't really need an in-place sort and since you're going to sort only 10 numebrs, I'd avoid the pain of reordering.
I'd just build a new list, example:
sorted_top = []
while top:
max_top = max(top)
sorted_top.append(max_top)
top.remove(max_top)
well, by both reading in the entire file and splitting it, then using map(), you are keeping a lot of data in memory.
As Adrien pointed out, files are iterators in py3k, so you can just use a generator literal to provide the iterable for nlargest:
nums = (int(x) for x in open(sys.argv[1]))
then, using
heapq.nlargest(10, nums)
should get you what you need, and you haven't stored the entire list even once.
the program is even shorter than the original, as well!
#!/usr/bin/env python3
from heapq import nlargest
import sys
nums = (int(x) for x in open(sys.argv[1]))
print(nlargest(10, nums))

Python Unknown pattern finding

Okay, basically what I want is to compress a file by reusing code and then at runtime replace missing code. What I've come up with is really ugly and slow, at least it works. The problem is that the file has no specific structure, for example 'aGVsbG8=\n', as you can see it's base64 encoding. My function is really slow because the length of the file is 1700+ and it checks for patterns 1 character at the time. Please help me with new better code or at least help me with optimizing what I got :). Anything that helps is welcome! BTW i have already tried compression libraries but they didn't compress as good as my ugly function.
def c_long(inp, cap=False, b=5):
import re,string
if cap is False: cap = len(inp)
es = re.escape; le=len; ref = re.findall; ran = range; fi = string.find
c = b;inpc = inp;pattern = inpc[:b]; l=[]
rep = string.replace; ins = list.insert
while True:
if c == le(inpc) and le(inpc) > b+1: c = b; inpc = inpc[1:]; pattern = inpc[:b]
elif le(inpc) <= b: break
if c == cap: c = b; inpc = inpc[1:]; pattern = inpc[:b]
p = ref(es(pattern),inp)
pattern += inpc[c]
if le(p) > 1 and le(pattern) >= b+1:
if l == []: l = [[pattern,le(p)+le(pattern)]]
elif le(ref(es(inpc[:c+2]),inp))+le(inpc[:c+2]) < le(p)+le(pattern):
x = [pattern,le(p)+le(inpc[:c+1])]
for i in ran(le(l)):
if x[1] >= l[i][1] and x[0][:-1] not in l[i][0]: ins(l,i,x); break
elif x[1] >= l[i][1] and x[0][:-1] in l[i][0]: l[i] = x; break
inpc = inpc[:fi(inpc,x[0])] + inpc[le(x[0]):]
pattern = inpc[:b]
c = b-1
c += 1
d = {}; c = 0
s = ran(le(l))
for x in l: inp = rep(inp,x[0],'{%d}' % s[c]); d[str(s[c])] = x[0]; c += 1
return [inp,d]
def decompress(inp,l): return apply(inp.format, [l[str(x)] for x in sorted([int(x) for x in l.keys()])])
The easiest way to compress base64-encoded data is to first convert it to binary data -- this will already save 25 percent of the storage space:
>>> s = "YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXo=\n"
>>> t = s.decode("base64")
>>> len(s)
37
>>> len(t)
26
In most cases, you can compress the string even further using some compression algorithm, like t.encode("bz2") or t.encode("zlib").
A few remarks on your code: There are lots of factors that make the code hard to read: inconsistent spacing, overly long lines, meaningless variable names, unidiomatic code, etc. An example: Your decompress() function could be equivalently written as
def decompress(compressed_string, substitutions):
subst_list = [substitutions[k] for k in sorted(substitutions, key=int)]
return compressed_string.format(*subst_list)
Now it's already much more obvious what it does. You could go one step further: Why is substitutions a dictionary with the string keys "0", "1" etc.? Not only is it strange to use strings instead of integers -- you don't need the keys at all! A simple list will do, and decompress() will simplify to
def decompress(compressed_string, substitutions):
return compressed_string.format(*substitutions)
You might think all this is secondary, but if you make the rest of your code equally readable, you will find the bugs in your code yourself. (There are bugs -- it crashes for "abcdefgabcdefg" and many other strings.)
Typically one would pump the program through a compression algorithm optimized for text, then run that through exec, e.g.
code="""..."""
exec(somelib.decompress(code), globals=???, locals=???)
It may be the case that .pyc/.pyo files are compressed already, and one could check by creating one with x="""aaaaaaaa""", then increasing the length to x="""aaaaaaaaaaaaaaaaaaaaaaa...aaaa""" and seeing if the size changes appreciably.

Categories

Resources