How can I read a line from a file and split it - python

I am stuck on a bit of code and I can't get it to work.
from random import randint
def random_song():
global song
linenum = randint(1,43)
open('data.txt')
band_song = readlines."data.txt"(1)
global band
band = band_song.readlines(linenum)
song = band_song.split(" ,")
What I'm trying to do is generate a random number between the 1st and last line of a text file and then read that specific line. Then split the line to 2 strings. Eg: line 26, "Iron Maiden,Phantom of the Opera" split to "Iron Maiden" and then "Phantom of the Opera
Also, how do I split the second string to the first letter of each word and to get that to work for any length and number of letters per word & number of words?
Thank you,
MiniBitComputers

There's a space in your split string, you don't need it, just split on ',' and using .strip() to get rid of white space on the outside of the result.
There's some odd code around the reading of the code as well. And you're splitting the list of read lines, not just the line you want to read.
There's also no need for using globals, it's a bad practice and best avoided in almost all cases.
All that fixed:
from random import randint
def random_song():
with open('data.txt') as f:
lines = f.readlines()
artist, song = lines[randint(1,43)].split(',')
return artist.strip(), song.strip()
print(random_song())
Note that using with ensures the file is closed once the with block ends.
As for getting the first letter of each word:
s = 'This is a bunch of words of varying length.'
first_letters = [word[0] for word in s.split(' ')]
print(first_letters)

Related

Python word counting program for .txt files keeps on showing string index out of range as an error code

Im pretty new to this and i was trying to write a program which counts the words in txt files. There is probably a better way of doing this, but this was the idea i came up with, so i wanted to go through with it. I just don´t understand, why i, or any variable, does´nt work for as an index for the string of the page, that i´m counting on...
Do you guys have a solution or should i just take a different approach?
page = open("venv\harrry_potter.txt", "r")
alphabet = "qwertzuiopüasdfghjklöäyxcvbnmßQWERTZUIOPÜASDFGHJKLÖÄYXCVBNM"
# Counting the characters
list_of_lines = page.readlines()
characternum = 0
textstr = "" # to convert the .txt file to string
for line in list_of_lines:
for character in line:
characternum += 1
textstr += character
# Counting the words
i = 0
wordnum = 1
while i <= characternum:
if textstr[i] not in alphabet and textstr[i+1] in alphabet:
wordnum += 1
i += 1
print(wordnum)
page.close()
Counting the characters and converting the .txt file to string is done a bit weird, because i thought the other way could be the source of the problem...
Can you help me please?
Typically you want to use split for simplistically counting words. They way you are doing it you will get right-minded as two words, or don't as 2 words. If you can just rely on spaces then you can just use split like this:
book = "Hello, my name is Inigo Montoya, you killed my father, prepare to die."
words = book.split()
print(f'word count = {len(words)}')
you can also use parameters to split to add more options if the given doesn't suit you.
https://pythonexamples.org/python-count-number-of-words-in-text-file/
You want to get the word count of a text file
The shortest code is this (that I could come up with):
with open('lorem.txt', 'r') as file:
print(len(file.read().split()))
First of for smaller files this is fine but this loads all of the data into the memory so not that great for large files. First of use a context manager (with), it helps with error handling an other stuff. What happens is you print the length of the whole file read and split by space so file.read() reads the whole file and returns a string, so you use .split() on it and it splits the whole string by space and returns a list of each word in between spaces so you get the lenght of that.
A better approach would be this:
word_count = 0
with open('lorem.txt', 'r') as file:
for line in file:
word_count += len(line.split())
print(word_count)
Because here the whole file is not saved into memory, you read each line separately and overwrite the previous in the memory. Here again for each line you split it by space and measure the length of the returned list, then add to the total word count. At the end simply print out the total word count.
Useful sources:
about with
Context Managers - Efficiently Managing Resources (to learn how they work a bit in detail) by Corey Schafer
.split() "docs"

making a list from a specific parts of text file

hi I made this little exercise for myself, I want to pull out the last number in each line In this text file which has 5 lines and 6 numbers/line separated by spaces. I made a loop to get all the remaining characters of the selected line starting from the 5th space. it works for every line print(findtext(0 to 3)), except the last line if the last number has less than 3 characters... what is wrong? I can't figure it out
text = open("text","r")
lines = text.readlines()
def findtext(c):
count = 0
count2 = 0
while count < len(lines[c]) and count2<5:
if lines[c][count] == " ":
count2=count2+1
count=count+1
return float(lines[c][count:len(lines[c])-1])
print(findtext(0))
You proposed solution doesn't seem very Pythonic to me.
with open('you_file') as lines:
for line in lines:
# Exhaust the iterator
pass
# Split by whitespace and get the last element
*_, last = line.split()
print(last)
Several things:
Access files within context managers, as this guarantees resources are destroyed correctly
Don't keep track of indexes if you don't need to, it makes the code harder to read
Use split instead of counting the literal whitespace character
with open('file') as f :
numbers = f.readlines()
last_nums = [ line.split()[-1] for line in numbers ]
line.split() will split the string into elements of a list using the space as a separator (if you put no arguments in it),
[-1] will get the last element of this list for you

Finding number of words in a file in python

I'm new to python and attempting to do an exercise where I open a txt file and then read the contents of it (probably straight forward for most but I will admit I am struggling a bit).
I opened my file and used .read() to read the file. I then proceeded to remove the file of any punctation.
Next I created a for loop. In this loop I began my using .split() and adding to an expression:
words = words + len(characters)
words being previously defined as 0 outside the loop and characters being what was split at the beginning of the loop.
Very long story short, the problem that I'm having now is that instead of adding the entire word to my counter, each individual character is being added. Anything I can do to fix that in my for loop?
my_document = open("book.txt")
readTheDocument = my_document.read
comma = readTheDocument.replace(",", "")
period = comma.replace(".", "")
stripDocument = period.strip()
numberOfWords = 0
for line in my_document:
splitDocument = line.split()
numberOfWords = numberOfWords + len(splitDocument)
print(numberOfWords)
A more Pythonic way is to use with:
with open("book.txt") as infile:
count = len(infile.read().split())
You've got to understand that by using .split() you are not really getting real grammatical words. You are getting word-like fragments. If you want proper words, use module nltk:
import nltk
with open("book.txt") as infile:
count = len(nltk.word_tokenize(infile.read()))
Just open the file and split to get the count of words.
file=open("path/to/file/name.txt","r+")
count=0
for word in file.read().split():
count = count + 1
print(count)

How to input a line word by word in Python?

I have multiple files, each with a line with, say ~10M numbers each. I want to check each file and print a 0 for each file that has numbers repeated and 1 for each that doesn't.
I am using a list for counting frequency. Because of the large amount of numbers per line I want to update the frequency after accepting each number and break as soon as I find a repeated number. While this is simple in C, I have no idea how to do this in Python.
How do I input a line in a word-by-word manner without storing (or taking as input) the whole line?
EDIT: I also need a way for doing this from live input rather than a file.
Read the line, split the line, copy the array result into a set. If the size of the set is less than the size of the array, the file contains repeated elements
with open('filename', 'r') as f:
for line in f:
# Here is where you do what I said above
To read the file word by word, try this
import itertools
def readWords(file_object):
word = ""
for ch in itertools.takewhile(lambda c: bool(c), itertools.imap(file_object.read, itertools.repeat(1))):
if ch.isspace():
if word: # In case of multiple spaces
yield word
word = ""
continue
word += ch
if word:
yield word # Handles last word before EOF
Then you can do:
with open('filename', 'r') as f:
for num in itertools.imap(int, readWords(f)):
# Store the numbers in a set, and use the set to check if the number already exists
This method should also work for streams because it only reads one byte at a time and outputs a single space delimited string from the input stream.
After giving this answer, I've updated this method quite a bit. Have a look
<script src="https://gist.github.com/smac89/bddb27d975c59a5f053256c893630cdc.js"></script>
The way you are asking it is not possible I guess. You can't read word by word as such in python . Something of this can be done:
f = open('words.txt')
for word in f.read().split():
print(word)

python file manipulation

I have a file with entries such as:
26 1
33 2
.
.
.
and another file with sentences in english
I have to write a script to print the 1st word in sentence number 26
and the 2nd word in sentence 33.
How do I do it?
The following code should do the task. With assumptions that files are not too large. You may have to do some modification to deal with edge cases (like double space, etc)
# Get numers from file
num = []
with open('1.txt') as file:
num = file.readlines()
# Get text from file
text = []
with open('2.txt') as file:
text = file.readlines()
# Parse text into words list.
data = []
for line in text: # For each paragraoh in the text
sentences = l.strip().split('.') # Split it into sentences
words = []
for sentence in sentences: # For each sentence in the text
words = sentence.split(' ') # Split it into words list
if len(words) > 0:
data.append(words)
# get desired result
for i = range(0, len(num)/2):
print data[num[i+1]][num[i]]
Here's a general sketch:
Read the first file into a list (a numeric entry in each element)
Read the second file into a list (a sentence in each element)
Iterate over the entry list, for each number find the sentence and print its relevant word
Now, if you show some effort of how you tried to implement this in Python, you will probably get more help.
The big issue is that you have to decide what separates "sentences". For example, is a '.' the end of a sentence? Or maybe part of an abbreviation, e.g. the one I've just used?-) Secondarily, and less difficult, what separates "words", e.g., is "TCP/IP" one word, or two?
Once you have sharply defined these rules, you can easily read the file of text into a a list of "sentences" each of which is a list of "words". Then, you read the other file as a sequence of pairs of numbers, and use them as indices into the overall list and inside the sublist thus identified. But the problem of sentence and word separation is really the hard part.
In the following code, I am assuming that sentences end with '. '. You can modify it easily to accommodate other sentence delimiters as well. Note that abbreviations will therefore be a source of bugs.
Also, I am going to assume that words are delimited by spaces.
sentences = []
queries = []
english = ""
for line in file2:
english += line
while english:
period = english.find('.')
sentences += english[: period+1].split()
english = english[period+1 :]
q=""
for line in file1:
q += " " + line.strip()
q = q.split()
for i in range(0, len(q)-1, 2):
sentence = q[i]
word = q[i+1]
queries.append((sentence, query))
for s, w in queries:
print sentences[s-1][w-1]
I haven't tested this, so please let me know (preferably with the case that broke it) if it doesn't work and I will look into bugs
Hope this helps

Categories

Resources