I am in need of some help, I'm currently doing an online python course and i can seem to get the desire result to complete the assignment.
basically, there is a text document which i need to call by using "raw_input" i then use"open()" function and then i have an empty "list()"
Now i run a "for" loop for each line in my .txt doc, i need to "r.strip()" all the white space, which leaves me with a 4 live .txt document ( .txt file will be at the bottom of the ask ) now i have to ".split()" those lines into words. now from the i need to loops through those words and ".append()" each word that isnt already in the list, then ".sort()" then print ... hopefully by that stage it looks as the desired output.
Just to make me feel a little better this is the first time im doing any sort of coding. so if you could explain where and why im going wrong that would be great.
CODE SO FAR - currently produces an error
fname = raw_input("Enter file name: ")
fh = open(fname)
lst = list()
for line in fh:
a = line.rstrip()
b = a.split()
for words in b:
if words not in lst:
print lst
.TXT DOCUMENT
But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief
p.s - Theres no point in changing the .txt to one line because the code it wont work in the grader. Ive tried (got the desired output, wrong code)
Please, you help would be greatly appreciated.
if there is anymore info you need, ill try provide it.
This will read the file, add words to list, sort the list, and then print it.
fname = raw_input("Enter file name: ")
fh = open(fname)
lst = list()
for line in fh:
a = line.rstrip()
b = a.split()
for words in b:
if words not in lst:
lst.append(words)
lst.sort()
print lst
fh.close()
lst.append(element) will add the element into the list lst.
lst.sort() will sort the list lst alphabetically.
Check out the document => Lists
l = list()
with open('inp.txt') as inp:
for each_line in inp:
a = each_line.strip()
l += a.split()
print set(l)
use with keyword as it's a better practice as it would close your file after operation.and for the unique portion use set() which only accepts unique elements
You could also make use of set which is like a list but without duplicates. This means you don't have to check for duplicates yourself, since set will do it for you auto-magically.
eg:
fname = raw_input("Enter file name: ")
fh = open(fname)
lst = set()
for line in fh:
a = line.rstrip()
b = a.split()
for words in b:
lst.add(words)
lst = list(lst)
lst.sort()
print lst
Try list comprehension to generate list, use set to remove duplicate entry
lst = [words for line in open(fname) for words in line.rstrip().split()]
lst = list(set(lst))
lst.sort()
print lst
Related
I have a program where I take in a txt file and break up the text into individual words and puts them into a list. the next part is to sort the list alphabetically and print it.
so far I have a text file that just says:
"the quick brown fox jumps over the lazy dog"
and my program so far looks like:
file = input("Enter File Name: ")
myList =[]
readFile = open(file, 'r')
for line in readFile:
myList.append(line.split(" "))
myList.sort()
print(myList)
the problem is that when I run the program the list is being created and filled with each word but when it is printed out it is not being sorted in alphabetical order. I also tried print(myList.sort()) and the only thing that prints is "none"
The problem is that line.split(" ") creates a list of words in line, but myList.append adds this list to myList as a new single item (so you end up with a list of lists rather than list of words). What you probably wanted is:
myList.extend(line.split(" "))
You should probably read the whole file rather than one line at a time:
with open(filename) as f:
words = f.read().split()
words.sort()
This uses the default for split which splits on a space, line break or any other whitespace.
From what I observed the problem is your list of words is in a list,
myList = [["the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]].
So instead of
for line in readFile:
myList.append(line.split(" "))
You should be writing
for line in readFile:
myList = line.split(" "))
The extend property works:
filename = open(input("Enter the input file name: "))
myList =[]
for line in filename:
myList.extend(line.split(" "))
myList.sort()
print(myList)
Python has two sorting functions - one which sorts the list in-place, which is what you are expecting in your code, and one which leaves the original list intact and returns a sorted list (which is what you are using). iirc the other function is called "sorted" - but now you have the info you need to look it up. This is probably also a previously asked question in stackoverflow, I encourage you look for and to link to the other ones.
So I have this file that contains 2 words each line. It looks like this.
[/lang:F </lang:foreign>
[lipmack] [lipsmack]
[Fang:foreign] <lang:foreign>
the first word is incorrectly formatted and the second one is correctly formatted. I am trying to put them in a dictionary. Below is my code.
textFile = open("hinditxt1.txt", "r+")
textFile = textFile.readlines()
flist = []
for word in textFile:
flist.append(word.split())
fdict = dict()
for num in range(len(flist)):
fdict[flist[num][0]] = flist[num][1]
First I split it then I try to put them in a dictionary. But for some reason I get "IndexError: list index out of range" when trying to put them in a dictionary. What can i do to fix it? Thanks!
It is better in python to iterate over the items of a list rather than a new range of indicies. My guess is that the IndexError is coming from a line in the input file that is blank or does not contain any spaces.
with open("input.txt", 'r') as f:
flist = [line.split() for line in f]
fdict = {}
for k, v in flist:
fdict[k] = v
print(fdict)
The code above avoids needing to access elements of the list using an index by simply iterating over the items of the list itself. We can further simplify this by using a dict comprehension:
with open("input.txt", 'r') as f:
flist = [line.split() for line in f]
fdict = {k: v for k, v in flist}
print(fdict)
With dictionaries it is typical to use the .update() method to add new key-value pairs. It would look more like:
for num in range(len(flist)):
fdict.update({flist[num][0] : flist[num][1]})
A full example without file reading would look like:
in_words = ["[/lang:F </lang:foreign>",
"[lipmack] [lipsmack]",
"[Fang:foreign] <lang:foreign>"]
flist = []
for word in in_words:
flist.append(word.split())
fdict = dict()
for num in range(len(flist)):
fdict.update({flist[num][0]: flist[num][1]})
print(fdict)
Yielding:
{'[lipmack]': '[lipsmack]', '[Fang:foreign]': '<lang:foreign>', '[/lang:F': '</lang:foreign>'}
Although your output may vary, since dictionaries do not maintain order.
As #Alex points out, the IndexError is likely from your data having improperly formatted data (i.e. a line with only 1 or 0 items on it). I suspect the most likely cause of this would be a \n at the end of your file that is causing the last line(s) to be blank.
I am just learning to code and am trying to take an input txt file and break into a list (by row) where each row's characters are elements of that list. For example if the file is:
abcde
fghij
klmno
I would like to create
[['a','b','c','d','e'], ['f','g','h','i','j'],['k','l','m','n','o']]
I have tried this, but the results aren't what I am looking for.
file = open('alpha.txt', 'r')
lst = []
for line in file:
lst.append(line.rstrip().split(','))
print(lst)
[['abcde', 'fghij', 'klmno']]
I also tried this, which is closer, but I don't know how to combine the two codes:
file = open('alpha.txt', 'r')
lst = []
for line in file:
for c in line:
lst.append(c)
print(lst)
['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o']
I tried to add the rstrip into the lst.append but it didn't work (or I didn't do it properly). Sorry - complete newbie here!
I should mention that I don't want newline characters included. Any help is much appreciated!
This is very simple. You have to use the list() constructor to make a string into its respective characters.
with open('alpha.txt', 'r') as file:
print([list(line)[:-1] for line in file.readlines()])
(The with open construct is just an idiom, so you don't have to do all the handling with the file like closing it, which you forgot to do)
If you want to split a string to it's charts you can just use list(s) (where s = 'asdf'):
file = open('alpha.txt', 'r')
lst = []
for line in file:
lst.append(list(line.strip()))
print(lst)
You are appending each entry to your original list. You want to create a new list for each line in your input, append to that list, and then append that list to your master list. For example,
file = open('alpha.txt', 'r')
lst = []
for line in file:
newLst = []
for c in line:
newLst.append(c)
lst.append(newLst)
print(lst)
use a nested list comprehension. The outer loop iterates over the lines in the file and the inner loop over the characters in the strings of each line.
with open('alpha.txt') as f:
out = [[char for char in line.strip()] for line in f]
req = [['a','b','c','d','e'], ['f','g','h','i','j'],['k','l','m','n','o']]
print(out == req)
prints
True
I am trying to read a file, make a list of words and then make a new list of words removing the duplicates.
I am not able to append the words to the new list. it says none type object has no attribute'append'
Here is the bit of code:
fh = open("gdgf.txt")
lst = list()
file = fh.read()
for line in fh:
line = line.rstrip()
file = file.split()
for word in file:
if word in lst:
continue
lst = lst.append(word)
print lst
python append will return None.So set will help here to remove duplicates.
In [102]: mylist = ["aa","bb","cc","aa"]
In [103]: list(set(mylist))
Out[103]: ['aa', 'cc', 'bb']
Hope this helps
In your case
file = fh.read()
After this fh will be an empty generator.So you cannot use it since it is already used.You have to do operations with variable file
You are replacing your list with the return value of the append function, which is not a list. Simply do this instead:
lst.append(word)
append modifies the list it was called on, and returns None. I.e., you should replace the line:
lst=lst.append(word)
with simply
lst.append(word)
fh=open("gdgf.txt")
file=fh.read()
for line in fh:
line=line.rstrip()
lst = []
file=file.split()
for word in file:
lst.append(word)
print (set(lst))
append appends an item in-place which means it does not return any value. You should get rid of lst= when appending word:
if word in lst:
continue
lst.append(word)
list.append() is inplace append, it returns None (as it does not return anything). so you do not need to set the return value of list.append() back to the list. Just change the line - lst=lst.append(word) to -
lst.append(word)
Another issue, you are first calling .read() on the file and then iterating over its lines, you do not need to do that. Just remove the iteration part.
Also, an easy way to remove duplicates, if you are not interested in the order of the elements is to use set.
Example -
>>> lst = [1,2,3,4,1,1,2,3]
>>> set(lst)
{1, 2, 3, 4}
So, in your case you can initialize lst as - lst=set() . And then use lst.add() element, you would not even need to do a check whether the element already exists or not. At the end, if you really want the result as a list, do - list(lst) , to convert it to list. (Though when doing this, you want to consider renaming the variable to something better that makes it easy to understand that its a set not a list )
append() does not return anything, so don't assign it. lst.append() is
enough.
Modified Code:
fh = open("gdgf.txt")
lst = []
file=fh.read()
for line in fh:
line = line.rstrip()
file=file.split()
for word in file:
if word in lst:
continue
lst.append(word)
print lst
I suggest you use set() , because it is used for unordered collections of unique elements.
fh = open("gdgf.txt")
lst = []
file = fh.read()
for line in fh:
line = line.rstrip()
file = file.split()
lst = list( set(lst) )
print lst
You can simplify your code by reading and adding the words directly to a set. Sets do not allow duplicates, so you'll be left with just the unique words:
words = set()
with open('gdgf.txt') as f:
for line in f:
for word in line.strip():
words.add(word.strip())
print(words)
The problem with the logic above, is that words that end in punctuation will be counted as separate words:
>>> s = "Hello? Hello should only be twice in the list"
>>> set(s.split())
set(['be', 'twice', 'list', 'should', 'Hello?', 'only', 'in', 'the', 'Hello'])
You can see you have Hello? and Hello.
You can enhance the code above by using a regular expression to extract words, which will take care of the punctuation:
>>> set(re.findall(r"(\w[\w']*\w|\w)", s))
set(['be', 'list', 'should', 'twice', 'only', 'in', 'the', 'Hello'])
Now your code is:
import re
with open('gdgf.txt') as f:
words = set(re.findall(r"(\w[\w']*\w|\w)", f.read(), re.M))
print(words)
Even with the above, you'll have duplicates as Word and word will be counted twice. You can enhance it further if you want to store a single version of each word.
I think the solution to this problem can be more succinct:
import string
with open("gdgf.txt") as fh:
word_set = set()
for line in fh:
line = line.split()
for word in line:
# For each character in string.punctuation, iterate and remove
# from the word by replacing with '', an empty string
for char in string.punctuation:
word = word.replace(char, '')
# Add the word to the set
word_set.add(word)
word_list = list(word_set)
# Sort the set to be fastidious.
word_list.sort()
print(word_list)
One thing about counting words by "split" is that you are splitting on whitespace, so this will make "words" out of things like "Hello!" and "Really?" The words will include punctuation, which may probably not be what you want.
Your variable names could be a bit more descriptive, and your indentation seems a bit off, but I think it may be the matter of cutting/pasting into the posting. I have tried to name the variables I used based on whatever the logical structure it is I am interacting with (file, line, word, char, and so on).
To see the contents of 'string.punctuation' you can launch iPython, import string, then simply enter string.punctuation to see what is the what.
It is also unclear if you need to have a list, or if you just need a data structure that contains a unique list of words. A set or a list that has been properly created to avoid duplicates should do the trick. Following on with the question, I used a set to uniquely store elements, then converted that set to a list trivially, and later sorted this alphabetically.
Hope this helps!
I have the following text file:
This is my text file
NUM,123
FRUIT
DRINK
FOOD,BACON
CAR
NUM,456
FRUIT
DRINK
FOOD,BURGER
CAR
NUM,789
FRUIT
DRINK
FOOD,SAUSAGE
CAR
NUM,012
FRUIT
DRINK
FOOD,MEATBALL
CAR
And I have the following list called 'wanted':
['123', '789']
What I'm trying to do is if the numbers after NUM is not in the list called 'wanted', then that line along with 4 lines below it gets deleted. So the output file will looks like:
This is my text file
NUM,123
FRUIT
DRINK
FOOD,BACON
CAR
NUM,789
FRUIT
DRINK
FOOD,SAUSAGE
CAR
My code so far is:
infile = open("inputfile.txt",'r')
data = infile.readlines()
for beginning_line, ube_line in enumerate(data):
UNIT = data[beginning_line].split(',')[1]
if UNIT not in wanted:
del data_list[beginning_line:beginning_line+4]
You shouldn't modify a list while you are looping over it.
What you could try is to just advance the iterator on the file object when needed:
wanted = set(['123', '789'])
with open("inputfile.txt",'r') as infile, open("outfile.txt",'w') as outfile:
for line in infile:
if line.startswith('NUM,'):
UNIT = line.strip().split(',')[1]
if UNIT not in wanted:
for _ in xrange(4):
infile.next()
continue
outfile.write(line)
And use a set. It is faster for constantly checking the membership.
This approach doesn't make you read in the entire file at once to process it in a list form. It goes line by line, reading from the file, advancing, and writing to the new file. If you want, you can replace the outfile with a list that you are appending to.
There are some issues with the code; for instance, data_list isn't even defined. If it's a list, you can't del elements from it; you can only pop. Then you use both enumerate and direct index access on data; also readlines is not needed.
I'd suggest to avoid keeping all lines in memory, it's not really needed here. Maybe try with something like (untested):
with open('infile.txt') as fin, open('outfile.txt', 'w') as fout:
for line in fin:
if line.startswith('NUM,') and line.split(',')[1] not in wanted:
for _ in range(4):
fin.next()
else:
fout.write(line)
import re
# find the lines that match NUM,XYZ
nums = re.compile('NUM,(?:' + '|'.join(['456','012']) + ")")
# find the three lines after a nums match
line_matches = breaks = re.compile('.*\n.*\n.*\n')
keeper = ''
for line in nums.finditer(data):
keeper += breaks.findall( data[line.start():] )[0]
result on the given string is
NUM,456
FRUIT
DRINK
FOOD,BURGER
NUM,012
FRUIT
DRINK
FOOD,MEATBALL
edit: deleting items while iterating is probably not a good idea, see: Remove items from a list while iterating
infile = open("inputfile.txt",'r')
data = infile.readlines()
SKIP_LINES = 4
skip_until = False
result_data = []
for current_line, line in enumerate(data):
if skip_until and skip_until < current_line:
continue
try:
_, num = line.split(',')
except ValueError:
pass
else:
if num not in wanted:
skip_until = current_line + SKIP_LINES
else:
result_data.append(line)
... and result_data is what you want.
If you don't mind building a list, and iff your "NUM" lines come every 5 other line, you may want to try:
keep = []
for (i, v) in enumerate(lines[::5]):
(num, current) = v.split(",")
if current in wanted:
keep.extend(lines[i*5:i*5+5])
Don't try to think of this in terms of building up a list and removing stuff from it while you loop over it. That way leads madness.
It is much easier to write the output file directly. Loop over lines of the input file, each time deciding whether to write it to the output or not.
Also, to avoid difficulties with the fact that not every line has a comma, try just using .partition instead to split up the lines. That will always return 3 items: when there is a comma, you get (before the first comma, the comma, after the comma); otherwise, you get (the whole thing, empty string, empty string). So you can just use the last item from there, since wanted won't contain empty strings anyway.
skip_counter = 0
for line in infile:
if line.partition(',')[2] not in wanted:
skip_counter = 5
if skip_counter:
skip_counter -= 1
else:
outfile.write(line)