How to read a text file into separate lists python - python

Say I have a text file formatted like this:
100 20 the birds are flying
and I wanted to read the int(s) into their own lists and the string into its own list...how would I go about this in python. I tried
data.append(map(int, line.split()))
that didn't work...any help?

Essentially, I'm reading the file line by line, and splitting them. I first check to see if I can turn them into an integer, and if I fail, treat them as strings.
def separate(filename):
all_integers = []
all_strings = []
with open(filename) as myfile:
for line in myfile:
for item in line.split(' '):
try:
# Try converting the item to an integer
value = int(item, 10)
all_integers.append(value)
except ValueError:
# if it fails, it's a string.
all_strings.append(item)
return all_integers, all_strings
Then, given the file ('mytext.txt')
100 20 the birds are flying
200 3 banana
hello 4
...doing the following on the command line returns...
>>> myints, mystrings = separate(r'myfile.txt')
>>> print myints
[100, 20, 200, 3, 4]
>>> print mystrings
['the', 'birds', 'are', 'flying', 'banana', 'hello']

If i understand your question correctly:
import re
def splitList(list):
ints = []
words = []
for item in list:
if re.match('^\d+$', item):
ints.append(int(item))
else:
words.append(item)
return ints, words
intList, wordList = splitList(line.split())
Will give you two lists: [100, 20] and ['the', 'birds', 'are', 'flying']

Here's a simple solution. Note it might not be as efficient as others for very large files, because it iterates over word two times for each line.
words = line.split()
intList = [int(x) for x in words if x.isdigit()]
strList = [x for x in words if not x.isdigit()]

pop removes the element from the list and returns it:
words = line.split()
first = int(words.pop(0))
second = int(words.pop(0))
This is of course assuming your format is always int int word word word ....
And then join the rest of the string:
words = ' '.join(words)
And in Python 3 you can even do this:
first, second, *words = line.split()
Which is pretty neat. Although you would still have to convert first and second to int's.

Related

How to compare reverse strings in list of strings with the original list of strings in python?

Input a given string and check if any word in that string matches with its reverse in the same string then print that word else print $
I split the string and put the words in a list and then I reversed the words in that list. After that, I couldn't able to compare both the lists.
str = input()
x = str.split()
for i in x: # printing i shows the words in the list
str1 = i[::-1] # printing str1 shows the reverse of words in a new list
# now how to check if any word of the new list matches to any word of the old list
if(i==str):
print(i)
break
else:
print('$)
Input: suman is a si boy.
Output: is ( since reverse of 'is' is present in the same string)
You almost have it, just need to add another loop to compare each word against each inverted word. Try using the following
str = input()
x = str.split()
for i in x:
str1 = i[::-1]
for j in x: # <-- this is the new nested loop you are missing
if j == str1: # compare each inverted word against each regular word
if len(str1) > 1: # Potential condition if you would like to not include single letter words
print(i)
Update
To only print the first occurrence of a match, you could, in the second loop, only check the elements that come after. We can do this by keeping track of the index:
str = input()
x = str.split()
for index, i in enumerate(x):
str1 = i[::-1]
for j in x[index+1:]: # <-- only consider words that are ahead
if j == str1:
if len(str1) > 1:
print(i)
Note that I used index+1 in order to not consider single word palindromes a match.
a = 'suman is a si boy'
# Construct the list of words
words = a.split(' ')
# Construct the list of reversed words
reversed_words = [word[::-1] for word in words]
# Get an intersection of these lists converted to sets
print(set(words) & set(reversed_words))
will print:
{'si', 'is', 'a'}
Another way to do this is just in a list comprehension:
string = 'suman is a si boy'
output = [x for x in string.split() if x[::-1] in string.split()]
print(output)
The split on string creates a list split on spaces. Then the word is included only if the reverse is in the string.
Output is:
['is', 'a', 'si']
One note, you have a variable name str. Best not to do that as str is a Python thing and could cause other issues in your code later on.
If you want word more than one letter long then you can do:
string = 'suman is a si boy'
output = [x for x in string.split() if x[::-1] in string.split() and len(x) > 1]
print(output)
this gives:
['is', 'si']
Final Answer...
And for the final thought, in order to get just the 'is':
string = 'suman is a si boy'
seen = []
output = [x for x in string.split() if x[::-1] not in seen and not seen.append(x) and x[::-1] in string.split() and len(x) > 1]
print(output)
output is:
['is']
BUT, this is not necessarily a good way to do it, I don't believe. Basically you are storing information in seen during the list comprehension AND referencing that same list. :)
This answer wouldn't show you 'a' and won't output 'is' with 'si'.
str = input() #get input string
x = str.split() #returns list of words
y = [] #list of words
while len(x) > 0 :
a = x.pop(0) #removes first item from list and returns it, then assigns it to a
if a[::-1] in x: #checks if the reversed word is in the list of words
#the list doesn't contain that word anymore so 'a' that doesn't show twice wouldn't be returned
#and 'is' that is present with 'si' will be evaluated once
y.append(a)
print(y) # ['is']

How can I create a list that is created using the indexes and items of two other lists?

I have two lists. One is made up of positions from a sentence and the other is made up of words that make up the sentence. I want to recreate the variable sentence using poslist and wordlist.
recreate = []
sentence = "This and that, and this and this."
poslist = [1, 2, 3, 2, 4, 2, 5]
wordlist = ['This', 'and', 'that', 'this', 'this.']
I wanted to use a for loop to go through poslist and if the item in poslist was equal to the position of a word in wordlist it would append it to a new list, recreating the original list. My first try was:
for index in poslist:
recreate.append(wordlist[index])
print (recreate)
I had to make the lists strings to write the lists into a text file. When I tried splitting them again and using the code shown above it does not work. It said that the indexes needed to be slices or integers or slices not in a list. I would like a solution to this problem. Thank you.
The list of words is gotten using:
sentence = input("Enter a sentence >>") #asking the user for an input
sentence_lower = sentence.lower() #making the sentence lower case
wordlist = [] #creating a empty list
sentencelist = sentence.split() #making the sentence into a list
for word in sentencelist: #for loop iterating over the sentence as a list
if word not in wordlist:
wordlist.append(word)
txtfile = open ("part1.txt", "wt")
for word in wordlist:
txtfile.write(word +"\n")
txtfile.close()
txtfile = open ("part1.txt", "rt")
for item in txtfile:
print (item)
txtfile.close()
print (wordlist)
And the positions are gotten using:
poslist = []
textfile = open ("part2.txt", "wt")
for word in sentencelist:
poslist.append([position + 1 for position, i in enumerate(wordlist) if i == word])
print (poslist)
str1 = " ".join(str(x) for x in poslist)
textfile = open ("part2.txt", "wt")
textfile.write (str1)
textfile.close()
Lists are 0-indexed (the first item has the index 0, the second the index 1, ...), so you have to substract 1 from the indexes if you want to use "human" indexes in the poslist:
for index in poslist:
recreate.append(wordlist[index-1])
print (recreate)
Afterwards, you can glue them together again and write them to a file:
with open("thefile.txt", "w") as f:
f.write("".join(recreate))
First, your code can be simplified to:
sentence = input("Enter a sentence >>") #asking the user for an input
sentence_lower = sentence.lower() #making the sentence lower case
wordlist = [] #creating a empty list
sentencelist = sentence.split() #making the sentence into a list
with open ("part1.txt", "wt") as txtfile:
for word in sentencelist: #for loop iterating over the sentence as a list
if word not in wordlist:
wordlist.append(word)
txtfile.write(word +"\n")
poslist = [wordlist.index(word) for word in sentencelist]
print (poslist)
str1 = " ".join(str(x) for x in poslist)
with open ("part2.txt", "wt") as textfile:
textfile.write (str1)
In your original code, poslist was a list of lists instead of a list of integers.
Then, if you want to reconstruct your sentence from poslist (which is now a list of int and not a list of lists as in the code you provided) and wordlist, you can do the following:
sentence = ' '.join(wordlist[pos] for pos in poslist)
You can also do it using a generator expression and the string join method:
sentence = ' '.join(wordlist[pos-1] for pos in poslist if pos if pos <= len(wordlist))
# 'This and that, and this and this.'
You can use operator.itemgetter() for this.
from operator import itemgetter
poslist = [0, 1, 2, 1, 3, 1, 4]
wordlist = ['This', 'and', 'that', 'this', 'this.']
print(' '.join(itemgetter(*poslist)(wordlist)))
Note that I had to subtract one from all of the items in poslist, as Python is a zero-indexed language. If you need to programmatically change poslist, you could do poslist = (n - 1 for n in poslist) right after you declare it.

How to check for words in a list and then return a count?

I am relatively new to python and have a question:
I am trying to write a script that will read a .txt file and check if words are in a list that I've provided and then return a count as to how many words were in that list.
So far,
import string
#this is just an example of a list
list = ['hi', 'how', 'are', 'you']
filename="hi.txt"
infile=open(filename, "r")
lines = infile.readlines()
for line in lines:
words = line.split()
for word in words:
word = word.strip(string.punctuation)
I've tried to split the file into lines and then the lines into words without punctuation.
I am not sure where to go after this. I would like ultimately for the output to be something like this:
"your file has x words that are in the list".
Thank you!
You can split your file to words using the following command :
words=reduce(lambda x,y:x+y,[line.split() for line in f])
Then count the number of words in your word list with loop over it and using count function :
w_list = ['hi', 'how', 'are', 'you']
with open("hi.txt", "r") as f :
words=reduce(lambda x,y:x+y,[line.split() for line in f])
for w in w_list:
print "your file has {} {}".format(words.count(w),w)
# words to search for;
# (stored as a set so `word in search_for` is O(1))
search_for = set(["hi", "how", "are", "you"])
# get search text
# (no need to split into lines)
with open("hi.txt") as inf:
text = inf.read().lower()
# create translation table
# - converts non-word chars to spaces (this maintains appropriate word-breaks)
# - keeps apostrophe (for words like "don't" or "couldn't")
trans = str.maketrans(
"0123456789abcdefghijklmnopqrstuvwxyz'!#$%&()*+,-./:;<=>?#[\\]^_`{|}~\"\\",
" abcdefghijklmnopqrstuvwxyz' "
)
# apply translation table and split into words
words = text.translate(trans).split()
# count desired words
word_count = sum(word in search_for for word in words)
# show result
print("your file has {} words that are in the list".format(word_count))
Read file content by with statement and open() method.
Remove punctuation from the file content by string module.
Split file content by split() method and iterate every word by for loop.
Check if word is present in the input list or not and increment count value according yo that.
input file: hi.txt
hi, how are you?
hi, how are you?
code:
import string
input_list = ['hi', 'how', 'are', 'you']
filename="hi.txt"
count = 0
with open(filename, "rb") as fp:
data = fp.read()
data = data.translate(string.maketrans("",""), string.punctuation)
for word in data.split():
if word in input_list:
count += 1
print "Total number of word present in file from the list are %d"%(count)
Output:
vivek#vivek:~/Desktop/stackoverflow$ python 18.py
Total number of word present in file from the list are 8
vivek#vivek:~/Desktop/stackoverflow$
Do not use variable names which already define by python interpreter
e.g. list in your code.
>>> list
<type 'list'>
>>> a = list([1,2,3])
>>> a
[1, 2, 3]
>>> list = ["hi", "how"]
>>> b = list([1,2,3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'list' object is not callable
>>>
use the len()
for example for a list;
enter code here
myList = ["c","b","a"]
len(myList)
it will return 3 meaning there are three items on your list.

Why am i getting an empty dictionary?

I am learning python from an introductory Python textbook and I am stuck on the following problem:
You will implement function index() that takes as input the name of a text file and a list of words. For every word in the list, your function will find the lines in the text file where the word occurs and print the corresponding line numbers.
Ex:
>>>> index('raven.txt', ['raven', 'mortal', 'dying', 'ghost', 'ghastly', 'evil', 'demon'])
ghost 9
dying 9
demon 122
evil 99, 106
ghastly 82
mortal 30
raven 44, 53, 55, 64, 78, 97, 104, 111, 118, 120
Here is my attempt at the problem:
def index(filename, lst):
infile = open(filename, 'r')
lines = infile.readlines()
lst = []
dic = {}
for line in lines:
words = line.split()
lst. append(words)
for i in range(len(lst)):
for j in range(len(lst[i])):
if lst[i][j] in lst:
dic[lst[i][j]] = i
return dic
When I run the function, I get back an empty dictionary. I do not understand why I am getting an empty dictionary. So what is wrong with my function? Thanks.
You are overwriting the value of lst. You use it as both a parameter to a function (in which case it is a list of strings) and as the list of words in the file (in which case it's a list of list of strings). When you do:
if lst[i][j] in lst
The comparison always returns False because lst[i][j] is a str, but lst contains only lists of strings, not strings themselves. This means that the assignment to the dic is never executed and you get an empty dict as result.
To avoid this you should use a different name for the list in which you store the words, for example:
In [4]: !echo 'a b c\nd e f' > test.txt
In [5]: def index(filename, lst):
...: infile = open(filename, 'r')
...: lines = infile.readlines()
...: words = []
...: dic = {}
...: for line in lines:
...: line_words = line.split()
...: words.append(line_words)
...: for i in range(len(words)):
...: for j in range(len(words[i])):
...: if words[i][j] in lst:
...: dic[words[i][j]] = i
...: return dic
...:
In [6]: index('test.txt', ['a', 'b', 'c'])
Out[6]: {'a': 0, 'c': 0, 'b': 0}
There are also a lot of things you can change.
When you want to iterate a list you don't have to explicitly use indexes. If you need the index you can use enumerate:
for i, line_words in enumerate(words):
for word in line_words:
if word in lst: dict[word] = i
You can also iterate directly on a file (refer to Reading and Writing Files section of the python tutorial for a bit more information):
# use the with statement to make sure that the file gets closed
with open('test.txt') as infile:
for i, line in enumerate(infile):
print('Line {}: {}'.format(i, line))
In fact I don't see why would you first build that words list of list. Just itertate on the file directly while building the dictionary:
def index(filename, lst):
with open(filename, 'r') as infile:
dic = {}
for i, line in enumerate(infile):
for word in line.split():
if word in lst:
dic[word] = i
return dic
Your dic values should be lists, since more than one line can contain the same word. As it stands your dic would only store the last line where a word is found:
from collections import defaultdict
def index(filename, words):
# make faster the in check afterwards
words = frozenset(words)
with open(filename) as infile:
dic = defaultdict(list)
for i, line in enumerate(infile):
for word in line.split():
if word in words:
dic[word].append(i)
return dic
If you don't want to use the collections.defaultdict you can replace dic = defaultdict(list) with dic = {} and then change the:
dic[word].append(i)
With:
if word in dic:
dic[word] = [i]
else:
dic[word].append(i)
Or, alternatively, you can use dict.setdefault:
dic.setdefault(word, []).append(i)
although this last way is a bit slower than the original code.
Note that all these solutions have the property that if a word isn't found in the file it will not appear in the result at all. However you may want it in the result, with an emty list as value. In such a case it's simpler the dict with empty lists before starting to loop, such as in:
dic = {word : [] for word in words}
for i, line in enumerate(infile):
for word in line.split():
if word in words:
dic[word].append(i)
Refer to the documentation about List Comprehensions and Dictionaries to understand the first line.
You can also iterate over words instead of the line, like this:
dic = {word : [] for word in words}
for i, line in enumerate(infile):
for word in words:
if word in line.split():
dic[word].append(i)
Note however that this is going to be slower because:
line.split() returns a list, so word in line.split() will have to scan all the list.
You are repeating the computation of line.split().
You can try to solve these two problems doing:
dic = {word : [] for word in words}
for i, line in enumerate(infile):
line_words = frozenset(line.split())
for word in words:
if word in line_words:
dic[word].append(i)
Note that here we are iterating once over line.split() to build the set and also over words. Depending on the sizes of the two sets this may be slower or faster than the original version (iteratinv over line.split()).
However at this point it's probably faster to intersect the sets:
dic = {word : [] for word in words}
for i, line in enumerate(infile):
line_words = frozenset(line.split())
for word in words & line_words: # & stands for set intersection
dic[word].append(i)
Try this,
def index(filename, lst):
dic = {w:[] for w in lst}
for n,line in enumerate( open(filename,'r') ):
for word in lst:
if word in line.split(' '):
dic[word].append(n+1)
return dic
There are some features of the language introduced here that you should be aware of because they will make life a lot easier in the long run.
The first is a dictionary comprehension. It basically initializes a dictionary using the words in lst as keys and an empty list [] as the value for each key.
Next the enumerate command. This allows us to iterate over the items in a sequence but also gives us the index of those items. In this case, because we passed a file object to enumerate it will loop over the lines. For each iteration, n will be the 0-based index of the line and line will be the line itself. Next we iterate over the words in lst.
Notice that we don't need any indices here. Python encourages looping over objects in sequences rather than looping over indices and then accessing the objects in a sequence based on index (for example discourages doing for i in range(len(lst)): do something with lst[i]).
Finally, the in operator is a very straightforward way to test membership for many types of objects and the syntax is very intuitive. In this case, we are asking is the current word from lst in the current line.
Note that we use line.split(' ') to get a list of the words in the line. If we don't do this, 'the' in 'there was a ghost' would return True as the is a substring of one of the words.
On the other hand 'the' in ['there', 'was', 'a', 'ghost'] would return False. If the conditional returns True, we append it to the list associated to the key in our dictionary.
That might be a lot to chew on, but these concepts make problems like this more straight forward.
First, your function param with the words is named lst and also the list where you put all the words in the file is also named lst, so you are not saving the words passed to your functions, because on line 4 you're redeclaring the list.
Second, You are iterating over each line in the file (the first for), and getting the words in that line. After that lst has all the words in the entire file. So in the for i ... you are iterating over all the words readed from the file, there's no need to use the third for j where you are iterating over each character in every word.
In resume, in that if you are saying "If this single character is in the lists of words ..." wich is not, so the dict will be never filled up.
for i in range(len(lst)):
if words[i] in lst:
dic[words[i]] = dic[words[i]] + i # To count repetitions
You need to rethink the problem, even my answer will fail because the word in the dict will not exist giving an error, but you get the point. Good luck!

Change the display of a list took from text file

I have this code wrote in Python:
with open ('textfile.txt') as f:
list=[]
for line in f:
line = line.split()
if line:
line = [int(i) for i in line]
list.append(line)
print(list)
This actually read integers from a text file and put them in a list.But it actually result as :
[[10,20,34]]
However,I would like it to display like:
10 20 34
How to do this? Thanks for your help!
You probably just want to add the items to the list, rather than appending them:
with open('textfile.txt') as f:
list = []
for line in f:
line = line.split()
if line:
list += [int(i) for i in line]
print " ".join([str(i) for i in list])
If you append a list to a list, you create a sub list:
a = [1]
a.append([2,3])
print a # [1, [2, 3]]
If you add it you get:
a = [1]
a += [2,3]
print a # [1, 2, 3]!
with open('textfile.txt') as f:
lines = [x.strip() for x in f.readlines()]
print(' '.join(lines))
With an input file 'textfiles.txt' that contains:
10
20
30
prints:
10 20 30
It sounds like you are trying to print a list of lists. The easiest way to do that is to iterate over it and print each list.
for line in list:
print " ".join(str(i) for i in line)
Also, I think list is a keyword in Python, so try to avoid naming your stuff that.
If you know that the file is not extremely long, if you want the list of integers, you can do it at once (two lines where one is the with open(.... And if you want to print it your way, you can convert the element to strings and join the result via ' '.join(... -- like this:
#!python3
# Load the content of the text file as one list of integers.
with open('textfile.txt') as f:
lst = [int(element) for element in f.read().split()]
# Print the formatted result.
print(' '.join(str(element) for element in lst))
Do not use the list identifier for your variables as it masks the name of the list type.

Categories

Resources