I need to create a script that reads four lines and, if a condition is met, reads the next four lines in the file, and so on. If the condition isn't met, the script must restart the test from the second line of the previously read block. Therefore, the first line of what was the would be next block becomes the new fourth line. For instance I want to retrieve all the blocks that sum 4 from the following file.
printf "1\n1\n1\n1\n2\n1\n1\n1\n1" > file1.txt #In BASH
Lines from 1 to 4 sum 4, so they produce a positive results. Lines from 5 to 8 sum 5,so they produce a negative results and the sum must be redone starting in the 6th line and ending in the 9th, which sum 4 and therefore throw a positive results. I'm aware that I could do something like this,
with open("file1.txt") as infile:
while not EOF:
lines = []
for i in range(next N lines):
lines.append(infile.readline())
make_the_sum(lines)
but this will move the reader four lines and will make impossible to go backwards if the sum is larger than 4. How can I achieve this effect? Consider that my files are large and I can't load them whole in memory.
I am simplifying by ignoring the end of file issue. You could use tell and seek to handle recovering an earlier position (you could save as many positions as you required in a list, say:
>>> with open('testmedium.txt') as infile:
... times = 0
... EOF = 0
... while not EOF:
... pos = infile.tell()
... print(f"\nPosition is {pos}")
... lines = []
... for i in range(4):
... lines.append(infile.readline())
... [print(l[:20]) for l in lines]
... if times==0 and '902' in lines[0]:
... times = 1
... infile.seek(pos)
... elif '902' in lines[0]:
... break
Position is 0
271,848,690,44,511,5
132,427,793,452,85,6
62,617,183,843,456,3
668,694,659,691,242,
Position is 125
902,550,177,290,828,
326,603,623,79,803,5
803,949,551,947,71,8
661,881,124,382,126,
Position is 125
902,550,177,290,828,
326,603,623,79,803,5
803,949,551,947,71,8
661,881,124,382,126,
>>>
The following code will read lines into a "cache" (just a list) and do some work on the cached lines when the cache has four lines. If the test passes, the cache gets cleared. If the test fails, the cache is updated to contain only the last three lines of the cache. You can do additional work in the if-else blocks as necessary.
def passes_test(lines, target_value=4):
return sum([int(line) for line in lines]) == target_value
with open('file1.txt') as f:
cached = []
for line in f:
cached.append(line)
if len(cached) == 4:
if passes_test(cached):
cached = []
else:
cached = cached[1:]
As Martijn has suggested,
with open("file1.txt") as f:
rd = lambda: int(next(f))
try:
a, b, c, d = rd(), rd(), rd(), rd()
if a + b + c + d == 4:
# found a block
a, b, c, d = rd(), rd(), rd(), rd()
else:
# nope
a, b, c, d = b, c, d, rd()
except StopIteration:
# found end of file
So I have a file with letters and numbers related to them in it which is written as a list like this:
a 1
b 2
c 3
d 4
etc
I also have another file with the letters in it and a number of times to multiple them by so its like this:
a 3 b 5
c 6 d 2
so basically it means that I want to get the value of A from the original file and multiply it by 3 and then get B from the other file and multiply it by 5 etc.
I have made a dictionary of the original file but I don't know how to retrieve the number to use it to multiply. python essentially needs to go through the file being used to multipy and then see the A and get the value from the other file that corresponds to it and to then multiply it by 3.
d = {}
with open("numbers.txt") as numbers:
for line in numbers:
(key, val) = line.split()
d[key] = int(val)
print(d)
d = {}
with open("numbers.txt") as numbers:
for line in numbers:
pairs = line.split()
for i in range(0,len(pairs),2):
d[pairs[i]] = int(pairs[i+1])
print(d)
I am trying to read through a data file and only select/print rows where the value of a particular column is greater than e.g. 20. I have included my 'test' code. (I can get it to work if i specify the values, as shown in the commented 'a=...', but not when reading from the file).
import numpy as np
a = open('data_file.dat', 'r')
#a=[432,2,34,542]
header0 = a.readline()
x=[]
y=[]
z=[]
for line in a:
line = line.strip()
columns = line.split()
x=columns[0]
y=columns[1]
z=columns[2]
if (x > 20 for x in a):
print x
You're close I think. I'm really not even sure what that last if statement is going to be doing, but it looks like a list comprehension mixed with an if, also mixed with the fact that you're reassigning x (if that statement actually worked...). Try something like this:
for line in a:
columns = line.strip().split()
if columns[0] > 20:
print line
It looks like the problem might be the last two lines.
for line in a:
...
if (x > 20 for x in a): # a is being consumed entirely here, which isn't what you want
print x
The second for __ in a is reading through the whole file and only printing lines if the first row and first column of data is greater than 20. Just check for
if x > 20:
instead.
First of all, the line if (x > 20 for x in a) doesn't make any sense. This will always be true, since it creates a generator object.
What you want is simply: if int(x) > 20:. Notice, that you first have to convert x to an integer, since it is a string.
So if you have the file data.dat, which looks like:
col1 col2 col3
10 5 2
59 24 8
18 199 -0
Then you can read it with
f = open('data.dat', 'r')
header = f.readline()
for line in f:
line = line.strip()
columns = line.split()
x=columns[0]
y=columns[1]
z=columns[2]
if int(x) > 20:
print columns
Additional note: if all your values are integer, you could convert them on very quickly with map(int,...). Like
for line in f:
columns = map(int, line.strip().split())
if columns[0] > 20:
print columns
I am new to python and trying to write my dictionary values to a file using Python 2.7. The values in my Dictionary D is a list with at least 2 items.
Dictionary has key as TERM_ID and
value has format [[DOC42, POS10, POS22], [DOC32, POS45]].
It means the TERM_ID (key) lies in DOC42 at POS10, POS22 positions and it also lies in DOC32 at POS45
So I have to write to a new file in the format: a new line for each TERM_ID
TERM_ID (tab) DOC42:POS10 (tab) 0:POS22 (tab) DOC32:POS45
Following code will help you understand what exactly am trying to do.
for key,valuelist in D.items():
#first value in each list is an ID
docID = valuelist[0][0]
for lst in valuelist:
file.write('\t' + lst[0] + ':' + lst[1])
lst.pop(0)
lst.pop(0)
for n in range(len(lst)):
file,write('\t0:' + lst[0])
lst.pop(0)
The output I get is :
TERM_ID (tab) DOC42:POS10 (tab) 0:POS22
DOC32:POS45
I tried using the new line tag as well as commas to continue file writing on the same line at no of places, but it did not work. I fail to understand how the file write really works.
Any kind of inputs will be helpful. Thanks!
#Falko I could not find a way to attach the text file hence here is my sample data-
879\t3\t1
162\t3\t1
405\t4\t1455
409\t5\t1
13\t6\t15
417\t6\t13
422\t57\t1
436\t4\t1
141\t8\t1
142\t4\t145
170\t8\t1
11\t4\t1
184\t4\t1
186\t8\t14
My sample running code is -
with open('sampledata.txt','r') as sample,open('result.txt','w') as file:
d = {}
#term= ''
#docIndexLines = docIndex.readlines()
#form a d with format [[doc a, pos 1, pos 2], [doc b, poa 3, pos 8]]
for l in sample:
tID = -1
someLst = l.split('\\t')
#if len(someLst) >= 2:
tID = someLst[1]
someLst.pop(1)
#if term not in d:
if not d.has_key(tID):
d[tID] = [someLst]
else:
d[tID].append(someLst)
#read the dionary to generate result file
docID = 0
for key,valuelist in d.items():
file.write(str(key))
for lst in valuelist:
file.write('\t' + lst[0] + ':' + lst[1])
lst.pop(0)
lst.pop(0)
for n in range(len(lst)):
file.write('\t0:' + lst[0])
lst.pop(0)
My Output:
57 422:1
3 879:1
162:1
5 409:1
4 405:1455
436:1
142:145
11:1
184:1
6 13:15
417:13
8 141:1
170:1
186:14
Expected output:
57 422:1
3 879:1 162:1
5 409:1
4 405:1455 436:1 142:145 11:1 184:1
6 13:15 417:13
8 141:1 170:1 186:14
You probably don't get the result you're expecting because you didn't strip the newline characters \n while reading the input data. Try replacing
someLst = l.split('\\t')
with
someLst = l.strip().split('\\t')
To enforce the mentioned line breaks in your output file, add a
file.write('\n')
at the very end of your second outer for loop:
for key,valuelist in d.items():
// ...
file.write('\n')
Bottom line: write never adds a line break. If you do see one in your output file, it's in your data.
I have a large list of lists like:
X = [['a','b','c','d','e','f'],['c','f','r'],['r','h','l','m'],['v'],['g','j']]
each inner list is a sentence and the members of these lists are actually the word of this sentences.I want to write this list in a file such that each sentence(inner list) is in a separate line in the file, and each line has a number corresponding to the placement of this inner list(sentence) in the large this. In the case above. I want the output to look like this:
1. a b c d e f
2. c f r
3. r h l m
4.v
5.g j
I need them to be written in this format in a "text" file. Can anyone suggest me a code for it in python?
Thanks
with open('somefile.txt', 'w') as fp:
for i, s in enumerate(X):
print >>fp, '%d. %s' % (i + 1, ' '.join(s))
with open('file.txt', 'w') as f:
i=1
for row in X:
f.write('%d. %s'%(i, ' '.join(row)))
i+=1