Counting rows from a text file - python

I am trying to create a Python code that will count the number of rows in a text file. However, it will not recognize the for statement as it gives me errors on the indentation below it.
#*********************end of sec 1*****************************
item=open("back.txt","r")
i=0
countlist=[line.strip() for line in item]#seperate lines
i=1+i
print i
item.close()

print len(list(open("some_file.txt")))
no need for with ... or closing files... this wont keep a reference to the fh so it should garbage collect and destroy just fine

Try this,
with open("back.txt", "r") as backfile:
lines = backfile.readlines()
return len(lines)

with open("back.txt") as f:
print len(f.readlines())

try this:
for line in item:
i=i+1

with open('back.txt', 'r') as item:
nbr = sum(1 for i in item)
Generator expression that should not keep to much unneeded in memory.

The for is encapsulated in the brackets and the i=1+i is not within the scope of the for loop. Therefore, the i=1+1 should not be indented.

You don't actually have a for loop. You have a list comprehension. A list comprehension doesn't allow or require an indented block of statements to execute on each iteration; it just builds a list. If you only want the number of rows, you want this:
with open(filename) as f:
rowcount = sum(1 for line in f)
This closes the file when you're done, even on Python implementations other than CPython, and it doesn't store the whole file in memory, so it works on huge files. If you need an actual list of the rows, you want this:
with open(filename) as f:
rows = [line.strip() for line in f]
rowcount = len(rows)

Related

Printing every line of the CSV file using readlines()

I am trying to print each line of a csv file with a count of the line being printed.
with open('Polly re-records.csv', 'r',encoding='ISO-8859-1') as file: #file1 path
ct=0
while True:
ct+=1
if file.readline():
print(file.readline(),ct)
else:
break #break when reaching empty line
for the above code i am getting the following output:
lg1_1,"Now lets play a game. In this game, you need to find the odd one out.",,,,,,,,,,,,,,,,,,,,,,,,
479
sc_2_1,Youve also learned the strong wordsigns and know how to use them as wordsigns. ,,,,,,,,,,,,,,,,,,,,,,,,
480
so instead of the ct starting from 1,in my output the first value is directly 479 which cant be possible unless the if statement is executed 478 times
what changes should i do or what is the logical flaw preventing the print statement from executing
import csv
with open("data.csv", 'r') as file:
csvreader = csv.reader(file)
header = next(csvreader)
for x in range(len(csvreader)):
print(csvreader[x], x)
Else you can also use other methods as enumerate
It would probably be easier to leverage some of baked in python methods like enumerate()
with open("Polly re-records.csv", "r") as file_in:
for row_number, row in enumerate(file_in, start=1):
print(row_number, row.strip("\n"))
With respect to what you might change and keep your code, the issue you are running into is you are calling readline() too often and discarding half the results.
with open('Polly re-records.csv', 'r',encoding='ISO-8859-1') as file: #file1 path
ct=0
while True:
ct+=1
row = file_in.readline().strip()
if row:
print(row, ct)
else:
break #break when reaching empty line
Python offers useful built-in functions for your use-case. For example enumerate, which yields a consecutive count for each item in an iterable.
with open('Polly re-records.csv', 'r', encoding='ISO-8859-1') as file:
for line_number, line in enumerate(file):
print(line, line_number)
As drdalle noted, it might also be a better idea to use the built-in csv module as it will also handle csv encodings (e.g. if you have multi-line cells containing \n wrapped or other escaped values.

For loop only returns first line of textfile

I am trying to make a program where i have to check if certain numbers are in use in a text file. The problem is that my for loop only loops trough the first line, instead of every line. How can i solve this? I've already used readlines() but that has not worked for me. This is the code and i've got a text file with: 1;, 2; and 3;, each on a seporated line. Hope someone can help!
if int(keuze) == 2:
def new_safe():
with open('fa_kluizen.txt', 'r') as f:
for number in f:
return number
print(new_safe())
My text File:
# TextFile
1;
2;
3;
You are returning too early (at first iteration).
You can read all lines in a list while cleaning the data and then return that list.
with open('fa_kluizen.txt', 'r') as f:
data = [line.strip() for line in f]
return data
Also most of the time its bad to create a function inside an if-statement.
Maybe you can add a little bit more information about what you want to achieve.
you are returning the first line you encounter, and by doing so, the program exits the current function and of course, the loop.
One way to do it is:
def new_safe():
with open('fa_kluizen.txt', 'r') as f:
return f.read().splitlines()
Which returns a each line as a list of the strings.
Output:
['1;', '2;', '3;']
That's beacause "return numbrer", Try
if int(keuze) == 2:
def new_safe():
my_list = []
with open('fa_kluizen.txt', 'r') as f:
for number in f:
my_list.append(number)
return my_list

Improving the speed of a python script

I have an input file with containing a list of strings.
I am iterating through every fourth line starting on line two.
From each of these lines I make a new string from the first and last 6 characters and put this in an output file only if that new string is unique.
The code I wrote to do this works, but I am working with very large deep sequencing files, and has been running for a day and has not made much progress. So I'm looking for any suggestions to make this much faster if possible. Thanks.
def method():
target = open(output_file, 'w')
with open(input_file, 'r') as f:
lineCharsList = []
for line in f:
#Make string from first and last 6 characters of a line
lineChars = line[0:6]+line[145:151]
if not (lineChars in lineCharsList):
lineCharsList.append(lineChars)
target.write(lineChars + '\n') #If string is unique, write to output file
for skip in range(3): #Used to step through four lines at a time
try:
check = line #Check for additional lines in file
next(f)
except StopIteration:
break
target.close()
Try defining lineCharsList as a set instead of a list:
lineCharsList = set()
...
lineCharsList.add(lineChars)
That'll improve the performance of the in operator. Also, if memory isn't a problem at all, you might want to accumulate all the output in a list and write it all at the end, instead of performing multiple write() operations.
You can use https://docs.python.org/2/library/itertools.html#itertools.islice:
import itertools
def method():
with open(input_file, 'r') as inf, open(output_file, 'w') as ouf:
seen = set()
for line in itertools.islice(inf, None, None, 4):
s = line[:6]+line[-6:]
if s not in seen:
seen.add(s)
ouf.write("{}\n".format(s))
Besides using set as Oscar suggested, you can also use islice to skip lines rather than use a for loop.
As stated in this post, islice preprocesses the iterator in C, so it should be much faster than using a plain vanilla python for loop.
Try replacing
lineChars = line[0:6]+line[145:151]
with
lineChars = ''.join([line[0:6], line[145:151]])
as it can be more efficient, depending on the circumstances.

python reading file infinite loop

pronunciation_file = open('dictionary.txt')
pronunciation = {}
line = pronunciation_file.readline()
while line != '':
n_line = line.strip().split(' ' , 1)
pronunciation[n_line[0]] = n_line[1].strip()
line = pronunciation_file.readline()
print(pronunciation)
the code is to turn a file of words and its pronunciation into a dictionary (keys are words and value is pronunciation) for example 'A AH0\n...' into {'A':'AH0'...}
the problem is if I put the print inside the loop, it prints normal(but it prints all the unfinished dictionaries) however if i put the print outside the loop like the one above, the shell returns nothing and when i close it ,it prompts the program is still running(where is probably a infinite loop)
Help please
I also tried cutting out first few hundred words and run the program, it works for very short files but it starts returning nothing at a certain length:|
That is not how to read from a file:
# with will also close your file
with open(your_file) as f:
# iterate over file object
for line in f:
# unpack key/value for your dict and use rstrip
k, v = line.rstrip().split(' ' , 1)
pronunciation[k] = v
You simply open the file and iterate over the file object. Use .rstrip() if you want to remove from the end of string, there is also no need to call strip twice on the same line.
You can also simplify your code to just using dict and a generator expression
with open("dictionary.txt") as f:
pronunciation = dict(line.rstrip().split(" ",1) for line in f)
Not tested, but if you want to use a while loop, the idiom is more like this:
pronunciation={}
with open(fn) as f:
while True:
line=f.readline()
if not line:
break
l, r=line.split(' ', 1)
pronunciation[l]=r.strip()
But the more modern Python idiom for reading a file line-by-line is to use a for loop as Padraic Cunningham's answer uses. A while loop is more commonly used to read a binary file fixed chunk by fixed chunk in Python.

Read a multielement list, look for an element and print it out in python

I am writing a python script in order to write a tex file. But I had to use some information from another file. Such file has names of menus in each line that I need to use. I use split to have a list for each line of my "menu".
For example, I had to write a section with the each second element of my lists but after running, I got anything, what could I do?
This is roughly what I am doing:
texfile = open(outputtex.tex', 'w')
infile = open(txtfile.txt, 'r')
for line in infile.readlines():
linesplit = line.split('^')
for i in range(1,len(infile.readlines())):
texfile.write('\section{}\n'.format(linesplit[1]))
texfile.write('\\begin{figure*}[h!]\n')
texfile.write('\centering\n')
texfile.write('\includegraphics[scale=0.95]{pg_000%i.pdf}\n' %i)
texfile.write('\end{figure*}\n')
texfile.write('\\newpage\n')
texfile.write('\end{document}')
texfile.close()
By the way, in the inclugraphics line, I had to increace the number after pg_ from "0001" to "25050". Any clues??
I really appreciate your help.
I don't quite follow your question. But I see several errors in your code. Most importantly:
for line in infile.readlines():
...
...
for i in range(1,len(infile.readlines())):
Once you read a file, it's gone. (You can get it back, but in this case there's no point.) That means that the second call to readlines is yielding nothing, so len(infile.readlines()) == 0. Assuming what you've written here really is what you want to do (i.e. write file_len * (file_len - 1) + 1 lines?) then perhaps you should save the file to a list. Also, you didn't put quotes around your filenames, and your indentation is strange. Try this:
with open('txtfile.txt', 'r') as infile: # (with automatically closes infile)
in_lines = infile.readlines()
in_len = len(in_lines)
texfile = open('outputtex.tex', 'w')
for line in in_lines:
linesplit = line.split('^')
for i in range(1, in_len):
texfile.write('\section{}\n'.format(linesplit[1]))
texfile.write('\\begin{figure*}[h!]\n')
texfile.write('\centering\n')
texfile.write('\includegraphics[scale=0.95]{pg_000%i.pdf}\n' %i)
texfile.write('\end{figure*}\n')
texfile.write('\\newpage\n')
texfile.write('\end{document}')
texfile.close()
Perhaps you don't actually want nested loops?
infile = open('txtfile.txt', 'r')
texfile = open('outputtex.tex', 'w')
for line_number, line in enumerate(infile):
linesplit = line.split('^')
texfile.write('\section{{{0}}}\n'.format(linesplit[1]))
texfile.write('\\begin{figure*}[h!]\n')
texfile.write('\centering\n')
texfile.write('\includegraphics[scale=0.95]{pg_000%i.pdf}\n' % line_number)
texfile.write('\end{figure*}\n')
texfile.write('\\newpage\n')
texfile.write('\end{document}')
texfile.close()
infile.close()

Categories

Resources