hi I made this little exercise for myself, I want to pull out the last number in each line In this text file which has 5 lines and 6 numbers/line separated by spaces. I made a loop to get all the remaining characters of the selected line starting from the 5th space. it works for every line print(findtext(0 to 3)), except the last line if the last number has less than 3 characters... what is wrong? I can't figure it out
text = open("text","r")
lines = text.readlines()
def findtext(c):
count = 0
count2 = 0
while count < len(lines[c]) and count2<5:
if lines[c][count] == " ":
count2=count2+1
count=count+1
return float(lines[c][count:len(lines[c])-1])
print(findtext(0))
You proposed solution doesn't seem very Pythonic to me.
with open('you_file') as lines:
for line in lines:
# Exhaust the iterator
pass
# Split by whitespace and get the last element
*_, last = line.split()
print(last)
Several things:
Access files within context managers, as this guarantees resources are destroyed correctly
Don't keep track of indexes if you don't need to, it makes the code harder to read
Use split instead of counting the literal whitespace character
with open('file') as f :
numbers = f.readlines()
last_nums = [ line.split()[-1] for line in numbers ]
line.split() will split the string into elements of a list using the space as a separator (if you put no arguments in it),
[-1] will get the last element of this list for you
Related
Im pretty new to this and i was trying to write a program which counts the words in txt files. There is probably a better way of doing this, but this was the idea i came up with, so i wanted to go through with it. I just don´t understand, why i, or any variable, does´nt work for as an index for the string of the page, that i´m counting on...
Do you guys have a solution or should i just take a different approach?
page = open("venv\harrry_potter.txt", "r")
alphabet = "qwertzuiopüasdfghjklöäyxcvbnmßQWERTZUIOPÜASDFGHJKLÖÄYXCVBNM"
# Counting the characters
list_of_lines = page.readlines()
characternum = 0
textstr = "" # to convert the .txt file to string
for line in list_of_lines:
for character in line:
characternum += 1
textstr += character
# Counting the words
i = 0
wordnum = 1
while i <= characternum:
if textstr[i] not in alphabet and textstr[i+1] in alphabet:
wordnum += 1
i += 1
print(wordnum)
page.close()
Counting the characters and converting the .txt file to string is done a bit weird, because i thought the other way could be the source of the problem...
Can you help me please?
Typically you want to use split for simplistically counting words. They way you are doing it you will get right-minded as two words, or don't as 2 words. If you can just rely on spaces then you can just use split like this:
book = "Hello, my name is Inigo Montoya, you killed my father, prepare to die."
words = book.split()
print(f'word count = {len(words)}')
you can also use parameters to split to add more options if the given doesn't suit you.
https://pythonexamples.org/python-count-number-of-words-in-text-file/
You want to get the word count of a text file
The shortest code is this (that I could come up with):
with open('lorem.txt', 'r') as file:
print(len(file.read().split()))
First of for smaller files this is fine but this loads all of the data into the memory so not that great for large files. First of use a context manager (with), it helps with error handling an other stuff. What happens is you print the length of the whole file read and split by space so file.read() reads the whole file and returns a string, so you use .split() on it and it splits the whole string by space and returns a list of each word in between spaces so you get the lenght of that.
A better approach would be this:
word_count = 0
with open('lorem.txt', 'r') as file:
for line in file:
word_count += len(line.split())
print(word_count)
Because here the whole file is not saved into memory, you read each line separately and overwrite the previous in the memory. Here again for each line you split it by space and measure the length of the returned list, then add to the total word count. At the end simply print out the total word count.
Useful sources:
about with
Context Managers - Efficiently Managing Resources (to learn how they work a bit in detail) by Corey Schafer
.split() "docs"
I am stuck on a bit of code and I can't get it to work.
from random import randint
def random_song():
global song
linenum = randint(1,43)
open('data.txt')
band_song = readlines."data.txt"(1)
global band
band = band_song.readlines(linenum)
song = band_song.split(" ,")
What I'm trying to do is generate a random number between the 1st and last line of a text file and then read that specific line. Then split the line to 2 strings. Eg: line 26, "Iron Maiden,Phantom of the Opera" split to "Iron Maiden" and then "Phantom of the Opera
Also, how do I split the second string to the first letter of each word and to get that to work for any length and number of letters per word & number of words?
Thank you,
MiniBitComputers
There's a space in your split string, you don't need it, just split on ',' and using .strip() to get rid of white space on the outside of the result.
There's some odd code around the reading of the code as well. And you're splitting the list of read lines, not just the line you want to read.
There's also no need for using globals, it's a bad practice and best avoided in almost all cases.
All that fixed:
from random import randint
def random_song():
with open('data.txt') as f:
lines = f.readlines()
artist, song = lines[randint(1,43)].split(',')
return artist.strip(), song.strip()
print(random_song())
Note that using with ensures the file is closed once the with block ends.
As for getting the first letter of each word:
s = 'This is a bunch of words of varying length.'
first_letters = [word[0] for word in s.split(' ')]
print(first_letters)
What im trying to do is match a phrase in a text file, then print that line(This works fine). I then need to move the cursor up 4 lines so I can do another match in that line, but I cant get the seek() method to move up 4 lines from the line that has been matched so that I can do another regex search. All I can seem to do with seek() is search from the very end of the file, or the beginning. It doesn't seem to let me just do seek(105,1) from the line that is matched.
### This is the example test.txt
This is 1st line
This is 2nd line # Needs to seek() to this line from the 6th line. This needs to be dynamic as it wont always be 4 lines.
This is 3rd line
This is 4th line
This is 5th line
This is 6st line # Matches this line, now need to move it up 4 lines to the "2nd line"
This is 7 line
This is 8 line
This is 9 line
This is 10 line
#
def Findmatch():
file = open("test.txt", "r")
print file.tell() # shows 0 which is the beginning of the file
string = file.readlines()
for line in string:
if "This is 6th line" in line:
print line
print file.tell() # shows 171 which is the end of the file. I need for it to be on the line that matches my search which should be around 108. seek() only lets me search from end or beginning of file, but not from the line that was matched.
Findmatch()
Since you've read all of it into memory at once with file.readlines(). tell() method does indeed correctly point to the end and your already have all your lines in an array. If you still wanted to, you'd have to read the file in line by line and record position within file for each line start so that you could go back four lines.
For your described problem. You can first find index of the line first match and then do the second operation starting from the list slice four items before that.
Here a very rough example of that (return None isn't really needed, it's just for sake of verbosity, clearly stating intent/expected behavior; raising an exception might be just as well a desired depending on what the overall plan is):
def relevant(value, lines):
found = False
for (idx, line) in enumerate(lines):
if value in line:
found = True
break # Stop iterating, last idx is a match.
if found is True:
idx = idx - 4
if idx < 0:
idx = 0 # Just return all lines up to now? Or was that broken input and fail?
return lines[idx:]
else:
return None
with open("test.txt") as in_file:
lines = in_file.readlines()
print(''.join(relevant("This is 6th line", lines)))
Please also note: It's a bit confusing to name list of lines string (one would probably expect a str there), go with lines or something else) and it's also not advisable (esp. since you indicate to be using 2.7) to assign your variable names already used for built-ins, like file. Use in_file for instance.
EDIT: As requested in a comment, just a printing example, adding it in parallel as the former seem potentially more useful for further extension. :) ...
def print_relevant(value, lines):
found = False
for (idx, line) in enumerate(lines):
if value in line:
found = True
print(line.rstrip('\n'))
break # Stop iterating, last idx is a match.
if found is True:
idx = idx - 4
if idx < 0:
idx = 0 # Just return all lines up to now? Or was that broken input and fail?
print(lines[idx].rstrip('\n'))
with open("test.txt") as in_file:
lines = in_file.readlines()
print_relevant("This is 6th line", lines)
Note, since lines are read in with trailing newlines and print would add one of its own I've rstrip'ed the line before printing. Just be aware of it.
I have the following text file:
abstract 233:1 253:1 329:2 1087:2 1272:1
game 64:1 99:1 206:1 595:1
direct 50:1 69:1 1100:1 1765:1 2147:1 3160:1
each key pair is how many times each string appears in a document [docID]:[stringFq]
How could you calculate the number of key pairs in this text file?
Your regex approach works fine. Here is an iterative approach. If you uncomment the print statements you will uncover some itermediate results.
Given
%%file foo.txt
abstract 233:1 253:1 329:2 1087:2 1272:1
game 64:1 99:1 206:1 595:1
direct 50:1 69:1 1100:1 1765:1 2147:1 3160:1
Code
import itertools as it
with open("foo.txt") as f:
lines = f.readlines()
#print(lines)
pred = lambda x: x.isalpha()
count = 0
for line in lines:
line = line.strip("\n")
line = "".join(it.dropwhile(pred, line))
pairs = line.strip().split(" ")
#print(pairs)
count += len(pairs)
count
# 15
Details
First we use a with statement, which an idiom for safely opening and closing files. We then split the file into lines via readlines(). We define a conditional function (or predicate) that we will use later. The lambda expression is used for convenience and is equivalent to the following function:
def pred(x):
return x.isaplha()
We initialize a count variable and start iterating each line. Every line may have a trailing newline character \n, so we first strip() them away before feeding the line to dropwhile.
dropwhile is a special itertools iterator. As it iterates a line, it will discard any leading characters that satisfy the predicate until it reaches the first character that fails the predicate. In other words, all letters at the start will be dropped until the first non-letter is found (which happens to be a space). We clean the new line again, stripping the leading space, and the remaining string is split() into a list of pairs.
Finally the length of each line of pairs is incrementally added to count. The final count is the sum of all lengths of pairs.
Summary
The code above shows how to tackle basic file handling with simple, iterative steps:
open the file
split the file into lines
while iterating each line, clean and process data
output a result
import re
file = open('input.txt', 'r')
file = file.read()
numbers = re.findall(r"[-+]?\d*\.\d+|\d+", file)
#finds all ints from text file
numLen = len(numbers) / 2
#counts all ints, when I needed to count pairs, so I just divided it by 2
print(numLen)
I'm trying to copy one file to another one ascii ordened, but is giving me some bugs, for example on the first line it adds a \n with no reason, I'm trying to understand it but I don't get it, also if you think this way is not a good one please advice to me to do it better, thanks.
demo.txt (An ascii file)
!=orIh^
-_hIdH2 !=orIh^
-_hIdH2
code .py
count = 0
try:
fcopy = open("demo.txt", 'r')
fdestination = open("demo2.txt", 'w')
for line in fcopy.readlines():
count = len(line) -1
list1 = ''.join(sorted(line))
str1 = ''.join(str(e) for e in list1)
fdestination.write(str(count)+str1)
fcopy.close()
fdestination.close()
except Exception, e:
print(str(e))
Note count is the count of letters that are on a line
Output
7
!=I^hor15
!-2=HII^_dhhor6-2HI_dh
the problem is it should be the number of letters and then ordened asciily
Each line in your code has a newline character at the end. When you sort all characters, the newline character is sorted, too, and moved to the appropriate position (which is in general not at the end of the string anymore). This causes line breaks to happen at almost random places.
What you need is to remove the line break before sorting and add it back after sorting. Also, the second join in your loop is not doing anything, and list1 is not a list but a string.
str1 = ''.join(sorted(line.strip('\n')))
fdestination.write(str(count)+str1+'\n')