I have one files.
File1 which has 3 columns. Data are tab separated
File1:
2 4 Apple
6 7 Samsung
Let's say if I run a loop of 10 iteration. If the iteration has value between column 1 and column 2 of File1, then print the corresponding 3rd column from File1, else print "0".
The columns may or may not be sorted, but 2nd column is always greater than 1st. Range of values in the two columns do not overlap between lines.
The output Result should look like this.
Result:
0
Apple
Apple
Apple
0
Samsung
Samsung
0
0
0
My program in python is here:
chr5_1 = [[]]
for line in file:
line = line.rstrip()
line = line.split("\t")
chr5_1.append([line[0],line[1],line[2]])
# Here I store all position information in chr5_1 list in list
chr5_1.pop(0)
for i in range (1,10):
for listo in chr5_1:
L1 = " ".join(str(x) for x in listo[:1])
L2 = " ".join(str(x) for x in listo[1:2])
L3 = " ".join(str(x) for x in listo[2:3])
if int(L1) <= i and int(L2) >= i:
print(L3)
break
else:
print ("0")
break
I am confused with loop iteration and it break point.
Try this:
chr5_1 = dict()
for line in file:
line = line.rstrip()
_from, _to, value = line.split("\t")
for i in range(int(_from), int(_to) + 1):
chr5_1[i] = value
for i in range (1, 10):
print chr5_1.get(i, "0")
I think this is a job for else:
position_information = []
with open('file1', 'rb') as f:
for line in f:
position_information.append(line.strip().split('\t'))
for i in range(1, 11):
for start, through, value in position_information:
if i >= int(start) and i <= int(through):
print value
# No need to continue searching for something to print on this line
break
else:
# We never found anything to print on this line, so print 0 instead
print 0
This gives the result you're looking for:
0
Apple
Apple
Apple
0
Samsung
Samsung
0
0
0
Setup:
import io
s = '''2 4 Apple
6 7 Samsung'''
# Python 2.x
f = io.BytesIO(s)
# Python 3.x
#f = io.StringIO(s)
If the lines of the file are not sorted by the first column:
import csv, operator
reader = csv.reader(f, delimiter = ' ', skipinitialspace = True)
f = list(reader)
f.sort(key = operator.itemgetter(0))
Read each line; do some math to figure out what to print and how many of them to print; print stuff; iterate
def print_stuff(thing, n):
while n > 0:
print(thing)
n -= 1
limit = 10
prev_end = 1
for line in f:
# if iterating over a file, separate the columns
begin, end, text = line.strip().split()
# if iterating over the sorted list of lines
#begin, end, text = line
begin, end = map(int, (begin, end))
# don't exceed the limit
begin = begin if begin < limit else limit
# how many zeros?
gap = begin - prev_end
print_stuff('0', gap)
if begin == limit:
break
# don't exceed the limit
end = end if end < limit else limit
# how many words?
span = (end - begin) + 1
print_stuff(text, span)
if end == limit:
break
prev_end = end
# any more zeros?
gap = limit - prev_end
print_stuff('0', gap)
Related
def list():
list_name = []
list_name_second = []
with open('CoinCount.txt', 'r', encoding='utf-8') as csvfile:
num_lines = 0
for line in csvfile:
num_lines = num_lines + 1
i = 0
while i < num_lines:
for x in volunteers[i].name:
if x not in list_name: # l
f = 0
while f < num_lines:
addition = []
if volunteers[f].true_count == "Y":
addition.append(1)
else:
addition.append(0)
f = f + 1
if f == num_lines:
decimal = sum(addition) / len(addition)
d = decimal * 100
percentage = float("{0:.2f}".format(d))
list_name_second.append({'Name': x , 'percentage': str(percentage)})
list_name.append(x)
i = i + 1
if i == num_lines:
def sort_percentages(list_name_second):
return list_name_second.get('percentage')
print(list_name_second, end='\n\n')
above is a segment of my code, it essentially means:
If the string in nth line of names hasn't been listed already, find the percentage of accurate coins counted and then add that all to a list, then print that list.
the issue is that when I output this, the program is stuck on a while loop continuously on addition.append(1), I'm not sure why so please can you (using the code displayed) let me know how to update the code to make it run as intended, also if it helps, the first two lines of code within the txt file read:
Abena,5p,325.00,Y
Malcolm,1p,3356.00,N
this doesn't matter much but just incase you need it, I suspect that the reason it is stuck looping addition.append(1) is because the first line has a "Y" as its true_count
Introduction and Explaination
I want to take a two functions (filename and maximum length), where the function is opened, reads all lines, and return strings where the strings defines a line that is filled without exceeding a maximum length defined as a variable (in this case, lineMax = 50 characters)
So the aim is for this is as follows:
"['Alice was beginning to get very tired of sitting',
'by her sister on the bank, and of having nothing',
'to do: once or twice she had peeped into the book',
'her sister was reading, but it had no pictures or',
'conversations in it, "and what is the use of a',
'book," thought Alice, "without pictures or',
'conversations?"']"
The result is that anything can go in as long there is a maximum of 50 characters. the rules defined is that you cannot group together words from different paragraphs, and that no words in the txt file is longer than the maximum length.
What I have tried
In thinking about this, I've formulated this psuedocode to see if this would be viable:
def consistentLineLength(*file_name):
# Opening, reading and writing lines from file.
file_name = open('words.txt', 'w')
lines = file_name.readlines()
file_name.writelines(file_name)
lineMax = 50
file_name = open('words.txt', 'r')
text1 = []
text2 = []
text3 = []
text4 = []
text5 = []
text6 = []
text7 = [] # Empty lists/containers for values.
for line in fileread:
splitLine = line.split(",")
text1.append(splitLine[0]) #
text2.append(splitLine[1].strip()) # Result: ['SAHFS DGDGBD etc'], all compressed up to a value of 50.
text3.append(splitLine[2].strip())
text4.append(splitLine[3].strip())
text5.append(splitLine[4].strip())
text6.append(splitLine[5].strip())
text7.append(splitLine[6].strip()) # .strip() removes backslash \n from ends.
print(line)
lengthlist1 = 0
for length1 in text1:
if length1 >= 0 and length1 < lineMax: # Needs to be a positive integer number, like this. Should be the max number of characters in a string to fill.
lengthlist1 += 1
print (length1)
lengthlist2 = 0
for length2 in text2:
if length2 >= 0 and length2 < lineMax: # Greater than 0, but less than 50.
lengthlist2 += 1
print (length2)
lengthlist3 = 0
for length3 in text3:
if length3 >= 0 and length3 < lineMax:
lengthlist3 += 1
print (length3)
lengthlist4 = 0
for length4 in text4:
if length4 >= 0 and length4 < lineMax:
lengthlist4 += 1
print (length4)
lengthlist5 = 0
for length5 in text5:
if length5 >= 0 and length5 < lineMax:
lengthlist5 += 1
print (length5)
lengthlist6 = 0
for length6 in text6:
if length6 >= 0 and length6 < lineMax:
lengthlist6 += 1
print (length6)
lengthlist7 = 0
for length7 in text7:
if length7 >= 0 and length7 < lineMax:
lengthlist7 += 1
print (length7)
file_name.close() # Close file.
So can be seen, this is a for loop solution with separate sentence lengths defined separately. Is there an algorithm which can make this process more efficient and workable for use?
you may use wrap:
from textwrap import wrap
text = "1234567890123"
texts = wrap(text, 5)
print(texts)
prints
['12345', '67890', '123']
I want to find out a list of elements in a file at a specific index.
For ex, below are the contents of the file "temp.txt"
line_0 1
line_1 2
line_2 3
line_3 4
line_4 1
line_5 1
line_6 2
line_7 1
line_8 2
line_9 3
line_10 4
Now, I need to find out the list of values [1,2,3] occurring in sequence at column 2 of each line in above file.
Output should look like below:
line_2 3
line_9 3
I have tried the below logic, but it some how not working ;(
inf = open("temp.txt", "rt")
count = 0
pos = 0
ListSeq = ["1","2","3"]
for line_no, line in enumerate(inf):
arr = line.split()
if len(arr) > 1:
if count == 1 :
pos = line_no
if ListSeq[count] == arr[1] :
count += 1
elif count > 0 :
inf.seek(pos)
line_no = pos
count = 0
else :
count = 0
if count >= 3 :
print(line)
count = 0
Can somebody help me in finding the issue with above code? or even a different logic which will give a correct output is also fine.
Your code is flawed. Most prominent bug: trying to seek in a text file using line number is never going to work: you have to use byte offset for that. Even if you did that, it would be wrong because you're iterating on the lines, so you shouldn't attempt to change file pointer while doing that.
My approach:
The idea is to "transpose" your file to work with vertical vectors, find the sequence in the 2nd vertical vector, and use the found index to extract data on the first vertical vector.
split lines to get text & number, zip the results to get 2 vectors: 1 of numbers 1 of text.
At this point, one list contains ["line_0","line_1",...] and the other one contains ["1","2","3","4",...]
Find the indexes of the sequence in the number list, and print the couple txt/number when found.
code:
with open("text.txt") as f:
sequence = ('1','2','3')
txt,nums = list(zip(*(l.split()[:2] for l in f))) # [:2] in case there are more columns
for i in range(len(nums)-len(sequence)+1):
if nums[i:i+len(sequence)]==sequence:
print("{} {}".format(txt[i+2],nums[i+2]))
result:
line_2 3
line_9 3
last for loop can be replaced by a list comprehension to generate the tuples:
result = [(txt[i+2],nums[i+2]) for i in range(len(nums)-len(sequence)) if nums[i:i+len(sequence)]==sequence ]
result:
[('line_2', '3'), ('line_9', '3')]
Generalizing for any sequence and any column.
sequence = ['1','2','3']
col = 1
with open(filename, 'r') as infile:
idx = 0
for _i, line in enumerate(infile):
if line.strip().split()[col] == sequence[idx]:
if idx == len(sequence)-1:
print(line)
idx = 0
else:
idx += 1
else:
idx = 0
I wrote a program to answer this question. It says that my program has no output.
Question:
Write a program to read through the mbox-short.txt and figure out the distribution by hour of the day for each of the messages. You can pull the hour out from the 'From ' line by finding the time and then splitting the string a second time using a colon.
From sample.user#example.com.za Sat Jan 5 09:14:16 2008
Once you have accumulated the counts for each hour, print out the counts, sorted by hour as shown below.
Desired Output:
04 3
06 1
07 1
09 2
10 3
11 6
14 1
15 2
16 4
17 2
18 1
19 1
My code:
name = raw_input("Enter file:")
if len(name) < 1 : name = "mbox-short.txt"
handle = open(name)
counts = dict()
for line in handle:
if not line.startswith('From'):
continue
words = line.split()
time = words[5]
timesplit = time.split(':')
hour = timesplit[0]
for x in hour:
counts[x] = counts.get(x, 0) + 1
lists = list()
for key, val in counts.items():
lists.append( (key, val) )
lists.sort(reverse=True)
for val, key in lists:
print key, val
I guess you make the mistake by putting the following codes into the if indentedStatementBlock.
words = line.split()
time = words[5]
timesplit = time.split(':')
hour = timesplit[0]
for x in hour:
counts[x] = counts.get(x, 0) + 1
You have an indentation problem. Nothing beyond continue in your loop will ever be processed. I would recommend you change the if statement to if line.startswith('From'): and remove the continue altogether.
Why are you doing this for x in hour: ? hour appears to be a two character string, so when you iterate over '08', x will equal '0' then '8'. Just count the hour.
Also, counts.items() creates a list of tuples, so you dont need to iterate over that list to create a new list of tuples.
lists = counts.items()
lists.sort(reverse=True)
Additionally, you should make a habit of closing the file again.
Edit:
For completeness sake, this is how I would approach the same problem:
from collections import Counter
def extract_hour(line):
return line.split()[5].split(':')[0]
lists = Counter(extract_hour(line) for line in open("mbox-short.txt") if line.startswith('From')).items()
Through trial and error and a bit help from the suggestions from the previous answers here, I have come up with the solution and my code worked!
name = raw_input("Enter file:")
if len(name) < 1 : name = "mbox-short.txt"
handle = open(name)
counts = dict()
for line in handle:
line = line.rstrip()
if line == '': continue
if line.startswith('From '):
words = line.split()
time = words[5]
tsplit = time.split(":")
counts[tsplit[0]] = counts.get(tsplit[0], 0) + 1
lists = list()
for key, val in counts.items():
lists.append( (key, val) )
lists.sort()
for val, key in lists:
print val, key
i am trying to write this code, so that i can get my sequences of different samples in a file after line breaks by position, the output is always blank for some reason, can you help me?
import readline
count = 0
brk = 0
with open("file.txt") as f:
while (count < 35):
l = f.readline()[brk + 2]
sp = raw_input ("Starting Position:")
sp = int(sp)
rl = sp + 6
print(l[sp:rl])
print(l[-30:0])
count = count + 1
brk = brk + 2
print ("Done")
In the line l = f.readline()[brk + 2] the program puts one character into variable l. So, when you are trying to print substring of l (in the lines print(l[sp:rl]) and print(l[-30:0])), the program prints empty lines. It is expected result.
To find this you could just add print l right after assigning of l.
It seems that you are trying to read 2-nd, 4-th, 6-th, etc lines of the file. To do it you can do something like this:
brk = 0
with open("file.txt") as f:
f.readline()
f.readline() #skip both first lines
while (count < 35):
l = f.readline()
f.readline() #skip next line
sp = raw_input ("Starting Position:")
sp = int(sp)
rl = sp + 6
print(l[sp:rl])
print(l[-30:0])
count = count + 1
brk = brk + 2
Also print(l[-30:0]) must always print empty line. It seems that you need print(l[-30:]) (last 30 characters of the string l).