Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have a text file that contains:
Week OrangeTotal ODifference AppleTotal ADifference
1 2 - 3 -
2 5 ? 4 ?
3 10 ? 10 ?
4 50 ? 100 ?
I would like it to skip the first line since it's the start of the new year, but fill in the column next to it with the subtraction of that row and the row below.
It should be:
Week OrangeTotal ODifference AppleTotal ADifference
1 2 - 3 -
2 5 3 4 1
3 10 5 10 6
4 50 40 100 90
import os
def main():
name = 'log.txt'
tmpName = 'tmp.txt'
f = open(name, 'r')
tmp = open(tmpName, 'w')
titleLine = f.readline().strip()
tmp.write(titleLine+'\n')
prevLine = f.readline().strip()
tmp.write(prevLine+'\n')
prevLine = prevLine.split('\t')
for line in f:
line = line.split('\t')
line[2] = str(int(line[1]) - int(prevLine[1]))
line[4] = str(int(line[3]) - int(prevLine[3]))
prevLine = line
displayLine=''
for i in range(len(line)-1):
displayLine += line[i]+'\t'
displayLine += line[len(line)-1]
tmp.write(displayLine+'\n')
f.close()
tmp.close()
os.remove(name)
os.rename(tmpName, name)
main()
so far I think it may be easier for you to work with each line in a for loop like for lines in ds[1:]: loop. it is also important to note that readlines produces an array of the lines of the file.
So ds[0] ='Week OrangeTotal ODifference AppleTotal ADifference'
so you need to loop over the lines
old=0 # this is to store the last value
done = list()
for i in range(1, len(ds), 1): #[range()][1]
l=0 # we define l here so that the garbage collector does not decide we no longer need it
if(old!=0): #if this is not the first one
l = ds[i].split()
# [split()][2] gets rid of whitespace and turns it into a list
for v in range(1, 3, 2):
#we skip the first value as that is the week and then next as that is the answer
ds[v+1] = ds[v] - old[v] #here is where we do the actual subtraction and store the value
old = l #when we are done we set the row we finished as old
done[i] = l.join(" ")
print(str(done[i]))
what you do with this from here is your decision
import os
import sys
ds = open("Path.txt",'r').readlines()
a = list()
b = list()
for words in ds[1:]:
a.append(words)
for words in ds:
b.append(words)
for lines in a:
again = int(lines)
for words in b:
bse = int(words)
print bse-again
Related
I have this input test.txt file with the output interleaved as #Expected in it (after finding the last line containing 1 1 1 1 within a *Title region
and this code in Python 3.6
index = 0
insert = False
currentTitle = ""
testfile = open("test.txt","r")
content = testfile.readlines()
finalContent = content
testfile.close()
# Should change the below line of code I guess to adapt
#titles = ["TitleX","TitleY","TitleZ"]
for line in content:
index = index + 1
for title in titles:
if line in title+"\n":
currentTitle = line
print (line)
if line == "1 1 1 1\n":
insert = True
if (insert == True) and (line != "1 1 1 1\n"):
finalContent.insert(index-1, currentTitle[:6] + "2" + currentTitle[6:])
insert = False
f = open("test.txt", "w")
finalContent = "".join(finalContent)
f.write(finalContent)
f.close()
Update:
Actual output with the answer provided
*Title Test
12125
124125
asdas 1 1 1 1
rthtr 1 1 1 1
asdasf 1 1 1 1
asfasf 1 1 1 1
blabla 1 1 1 1
#Expected "*Title Test2" here <-- it didn't add it
124124124
*Title Dunno
12125
124125
12763125 1 1 1 1
whatever 1 1 1 1
*Title Dunno2
#Expected "*Title Dunno2" here <-- This worked great
214142122
#and so on for thousands of them..
Also is there a way to overwrite this in the test.txt file?
Because you are already reading the entire file into memory anyway, it's easy to scan through the lines twice; once to find the last transition out of a region after each title, and once to write the modified data back to the same filename, overwriting the previous contents.
I'm introducing a dictionary variable transitions where the keys are the indices of the lines which have a transition, and the value for each is the text to add at that point.
transitions = dict()
in_region = False
reg_end = -1
current_title = None
with open("test.txt","r") as testfile:
content = testfile.readlines()
for idx, line in enumerate(content):
if line.startswith('*Title '):
# Commit last transition before this to dict, if any
if current_title:
transitions[reg_end] = current_title
# add suffix for printing
current_title = line.rstrip('\n') + '2\n'
elif line.strip().endswith(' 1 1 1 1'):
in_region = True
# This will be overwritten while we remain in the region
reg_end = idx
elif in_region:
in_region = False
if current_title:
transitions[reg_end] = current_title
with open("test.txt", "w") as output:
for idx, line in enumerate(content):
output.write(line)
if idx in transitions:
output.write(transitions[idx])
This kind of "remember the last time we saw something" loop is very common, but takes some time getting used to. Inside the loop, keep in mind that we are looping over all the lines, and remembering some things we saw during a previous iteration of this loop. (Forgetting the last thing you were supposed to remember when you are finally out of the loop is also a very common bug!)
The strip() before we look for 1 1 1 1 normalizes the input by removing any surrounding whitespace. You could do other kinds of normalizations, too; normalizing your data is another very common technique for simplifying your logic.
Demo: https://ideone.com/GzNUA5
try this, using itertools.zip_longest
from itertools import zip_longest
with open("test.txt","r") as f:
content = f.readlines()
results, title = [], ""
for i, j in zip_longest(content, content[1:]):
# extract title.
if i.startswith("*"):
title = i
results.append(i)
# compare value in i'th index with i+1'th (if mismatch add title)
if "1 1 1 1" in i and "1 1 1 1" not in j:
results.append(f'{title.strip()}2\n')
print("".join(results))
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
me=1
while (me < 244):
f=open('%s' % me, 'rb')
tdata = f.read()
f.close()
ss = '\xff\xd8'
se = '\xff\xd9'
count = 0
start = 0
while True:
x1 = tdata.find(ss,start)
if x1 < 0:
break
x2 = tdata.find(se,x1)
jpg = tdata[x1:x2+1]
count += 1
fname = 'extracted%d03.jpg' % (count)
fw = open(fname,'wb')
fw.write(jpg)
fw.close()
start = x2+2
me=me+1
I am trying to run this for multiple files.
But it only does the operation for file 1 and rest of the files are ignored.
I am very new with python can anyone tweak this a bit?
In your last line of code you're incrementing me inside of your nested while loop that you want to run for each of your files. To fix it, just un-indent me like so.
#!/usr/bin/python
me=1
while (me < 244):
f=open('%s' % me, 'rb')
tdata = f.read()
f.close()
ss = '\xff\xd8'
se = '\xff\xd9'
count = 0
start = 0
while True:
x1 = tdata.find(ss,start)
if x1 < 0:
break
x2 = tdata.find(se,x1)
jpg = tdata[x1:x2+1]
count += 1
fname = 'extracted%d03.jpg' % (count)
fw = open(fname,'wb')
fw.write(jpg)
fw.close()
start = x2+2
me=me+1 # this needs to be outside of your nested while loop
That being said, you probably want to improve the names of the variables in this code (make them more descriptive!), and it would probably also be a good idea to extract the code in the while loop into a function. It's also worth mentioning that the outer while loop can be (and should be) replaced with a for loop.
Something like this:
def do_something_with_file(me):
f=open('%s' % me, 'rb')
tdata = f.read()
f.close()
ss = '\xff\xd8'
se = '\xff\xd9'
count = 0
start = 0
while True:
x1 = tdata.find(ss,start)
if x1 < 0:
break
x2 = tdata.find(se,x1)
jpg = tdata[x1:x2+1]
count += 1
fname = 'extracted%d03.jpg' % (count)
fw = open(fname,'wb')
fw.write(jpg)
fw.close()
start = x2+2
for i in range(1, 244):
do_something_with_file(i)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
So i have an array and i want to append certain values to it in a while loop but i want them in a certain order. It's something like this:
num = [1,2,3,4,5]
count = -1
while count < 2:
for x in num:
c.execute('SELECT column FROM table WHERE restriction = %s;', (x + count,))
rows = c.fetchall()
array = []
for row in rows:
array.append(row[0])
count += 1
but i want the values where count = 0 to go first in the array
I know i can just split it up but is there any way to do it in this format and keep it short?
EDIT: Don't worry about the code itself as its just an example and not the code i am using i just want to know if i can loop count = 0 first and then count = -1 and then count = 1 so that in the final array, the results from count = 0 will be first in the list.
I think you are trying to achieve something like this. In the main loop, only store the items in array if count == 0. Otherwise put them into a temporary array. Then after the main loop finishes, move the temporary stuff to the end of the main array. But your code is still a poor solution* and without knowing what you are trying to achieve, this is the best I can provide.
num = [1,2,3,4,5]
count = -1
array = []
tempArray = []
while count < 2:
for x in num:
c.execute('SELECT column FROM table WHERE restriction = %s;', (x + count,))
rows = c.fetchall()
for row in rows:
if count == 0:
array.append(row)
else:
tempArray.append(row)
count += 1
for item in tempArray:
array.append(item)
*I say poor solution because your code is so hard to understand. I am guessing you are trying to do this:
array = []
for x in range(5):
c.execute('SELECT column FROM table WHERE restriction = %s;', (x + 1)
rows = c.fetchall()
for row in rows:
array.append(row)
# now get your remaining data
I think what you really need to do is write a method:
def getRows(x):
c.execute('SELECT column FROM table WHERE restriction = %s;', (x)
rows = c.fetchall()
return rows
for count in range(4):
array.append(getRows(count + 1))
for count in range(4):
array.append(getRows(count))
for count in range(4)
array.append(getRows(count + 2))
which you can then rewrite as:
def getRows(x):
c.execute('SELECT column FROM table WHERE restriction = %s;', (x)
rows = c.fetchall()
return rows
def addToArray(modifier):
for count in range(4)
array.append(getRows(count + modifier))
and then call it 3 times:
addToArray(1)
addToArray(0)
addToArray(2)
I have to divide a file.txt into more files. Here's the code:
a = 0
b = open("sorgente.txt", "r")
c = 5
d = 16 // c
e = 1
f = open("out"+str(e)+".txt", "w")
for line in b:
a += 1
f.writelines(line)
if a == d:
e += 1
a = 0
f.close()
f.close()
So , if i run it it gives me this error :
todoController\WordlistSplitter.py", line 9, in <module>
f.writelines(line)
ValueError: I/O operation on closed file
I understood that if you do a for loop the file gets closed so I tried to put the f in the for loop but it doesn't work because instead of getting:
out1.txt
1
2
3
4
out2.txt
5
6
7
8
I get only the last line of the file. What should I do? Are there any way I can recall the open function I defined earlier?
You f.close() inside the for loop, then do not open a new file as f, hence the error on the next iteration. You should also use with to handle files, which saves you needing to explicitly close them.
As you want to write four lines at a time to each out file, you can do this as follows:
file_num = 0
with open("sorgente.txt") as in_file:
for line_num, line in enumerate(in_file):
if not line_num % 4:
file_num += 1
with open("out{0}.txt".format(file_num), "a") as out_file:
out_file.writelines(line)
Note that I have used variable names to make it a bit clearer what is happening.
You close the file but you don't break from for loop.
if a == d you are closing f and then later (in the next iteration) you are trying to write to it which causes the error.
also - why are you closing f twice?
You should probably remove the first f.close():
a = 0
b = open("sorgente.txt", "r")
c = 5
d = 16 // c
e = 1
f = open("out"+str(e)+".txt", "w")
for line in b:
a += 1
f.writelines(line)
if a == d:
e += 1
a = 0
# f.close()
f.close()
I am trying to create genetic signatures. I have a textfile full of DNA sequences. I want to read in each line from the text file. Then add 4mers which are 4 bases into a dictionary.
For example: Sample sequence
ATGATATATCTATCAT
What I want to add is ATGA, TGAT, GATA, etc.. into a dictionary with ID's that just increment by 1 while adding the 4mers.
So the dictionary will hold...
Genetic signatures, ID
ATGA,1
TGAT, 2
GATA,3
Here is what I have so far...
import sys
def main ():
readingFile = open("signatures.txt", "r")
my_DNA=""
DNAseq = {} #creates dictionary
for char in readingFile:
my_DNA = my_DNA+char
for char in my_DNA:
index = 0
DnaID=1
seq = my_DNA[index:index+4]
if (DNAseq.has_key(seq)): #checks if the key is in the dictionary
index= index +1
else :
DNAseq[seq] = DnaID
index = index+1
DnaID= DnaID+1
readingFile.close()
if __name__ == '__main__':
main()
Here is my output:
ACTC
ACTC
ACTC
ACTC
ACTC
ACTC
This output suggests that it is not iterating through each character in string... please help!
You need to move your index and DnaID declarations before the loop, otherwise they will be reset every loop iteration:
index = 0
DnaID=1
for char in my_DNA:
#... rest of loop here
Once you make that change you will have this output:
ATGA 1
TGAT 2
GATA 3
ATAT 4
TATA 5
ATAT 6
TATC 6
ATCT 7
TCTA 8
CTAT 9
TATC 10
ATCA 10
TCAT 11
CAT 12
AT 13
T 14
In order to avoid the last 3 items which are not the correct length you can modify your loop:
for i in range(len(my_DNA)-3):
#... rest of loop here
This doesn't loop through the last 3 characters, making the output:
ATGA 1
TGAT 2
GATA 3
ATAT 4
TATA 5
ATAT 6
TATC 6
ATCT 7
TCTA 8
CTAT 9
TATC 10
ATCA 10
TCAT 11
This should give you the desired effect.
from collections import defaultdict
readingFile = open("signatures.txt", "r").read()
DNAseq = defaultdict(int)
window = 4
for i in xrange(len(readingFile)):
current_4mer = readingFile[i:i+window]
if len(current_4mer) == window:
DNAseq[current_4mer] += 1
print DNAseq
index is being reset to 0 each time through the loop that starts with for char in my_DNA:.
Also, I think the loop condition should be something like while index < len(my_DNA)-4: to be consistent with the loop body.
Your index counters reset themselves since they are in the for loop.
May I make some further suggestions? My solution would look like that:
readingFile = open("signatures.txt", "r")
my_DNA=""
DNAseq = {} #creates dictionary
for line in readingFile:
line = line.strip()
my_DNA = my_DNA + line
ID = 1
index = 0
while True:
try:
seq = my_DNA[index:index+4]
if not seq in my_DNA:
DNAseq[ID] = my_DNA[index:index+4]
index += 4
ID += 1
except IndexError:
break
readingFile.close()
But what do you want to do with duplicates? E.g., if a sequence like ATGC appears twice? Should both be added under a different ID, for example {...1:'ATGC', ... 200:'ATGC',...} or shall those be omitted?
If I'm understanding correctly, you are counting how often each sequential string of 4 bases occurs? Try this:
def split_to_4mers(filename):
dna_dict = {}
with open(filename, 'r') as f:
# assuming the first line of the file, only, contains the dna string
dna_string = f.readline();
for idx in range(len(dna_string)-3):
seq = dna_string[idx:idx+4]
count = dna_dict.get(seq, 0)
dna_dict[seq] = count+1
return dna_dict
output on a file that contains only "ATGATATATCTATCAT":
{'TGAT': 1, 'ATCT': 1, 'ATGA': 1, 'TCAT': 1, 'TATA': 1, 'TATC': 2, 'CTAT': 1, 'ATCA': 1, 'ATAT': 2, 'GATA': 1, 'TCTA': 1}