I'm trying to print out a medium sized list in Python and what I'm doing is printing out the entire list on one line to test the program to make sure the right data is being put in to the list in the right order. I read in 2 files and put all the data into 2 dictionaries. Then, I split the dictionaries into parts and put all the similar data into a list. I'm super new to Python and this is a tutorial I found on dictionaries and I'm a little stuck. This line prints the list on one line:
print '[%s]' % ', '.join(map(str, player_list))
But this line prints each value of the list on a separate line which I don't want:
print '[%s]' % ', '.join(map(str, army_list))
Here's my code if needed that adds to the list:
import collections
import operator
terridict = {}
gsdict = {}
terr_list = []
player_list = []
army_list = []
list_length = []
total_territories = 0
with open('territories.txt', 'r') as territory:
for line in territory:
terridict["territory"], terridict["numeric_id"], terridict["continent"] = line.split(',')
with open('gameState.txt', 'r') as gameState:
for line in gameState:
gsdict["numeric_id"], gsdict["player"], gsdict["num_armies"] = line.split(',')
terr_num = gsdict["numeric_id"]
player_num = gsdict["player"]
army_size = gsdict["num_armies"]
if terr_num >= 1 and player_num >= 1 and army_size >= 1:
terr_list.append(terr_num)
player_list.append(player_num)
army_list.append(army_size)
player_list.sort()
counter = collections.Counter(player_list)
print (counter)
total_territories = total_territories + 1
x = counter
sorted_x = sorted(x.items(), key=operator.itemgetter(0))
counter = sorted_x
print terr_num, player_num, army_size
print counter
print "Number of territories: %d" % total_territories
print '[%s]' % ', '.join(map(str, terr_list))
print '[%s]' % ', '.join(map(str, player_list))
print '[%s]' % ', '.join(map(str, army_list))
line, when you read it in, ends with a newline. For example (I'm guessing here):
"1 nelson2013 23\n"
When you split it by space, you get this:
["1", "nelson2013", "23\n"]
Notice that the player name does not end with a newline, but army size does. When you join army sizes together, they end up like this:
"23\n, 18\n, 121\n"
i.e. separated by newlines, which makes them print one per line.
To combat this, you want to invoke rstrip() on line immediately at the top of the loop, before you process it any further.
You probably want to fix what line is now because line.rsplit() doesn't work very well by itself. Building off what Amadan said:
line = line.rsplit()
This way, the new line character is removed and line can be set to a condition where the newline character is not involved. I tried it out and this worked.
Related
This might sound banal but it has being a pain.
So I wrote code that parses lines. The .txt file has a line which match my re.match and a line which doesnt.
cat file.txt
00.00.00 : Blabla
x
In this case I treat checking the first letter "x".
def parser():
path = "file.txt"
with open (path, 'r+') as file:
msg = {}
list = []
start = 0
lines = file.readlines()
for i in range (0,len(lines)):
line = lines[i]
if re.match('MY RULES', line) is not None:
field['date'] = line[:8]
msg['msg'] = line[start + 2:]
print msg
if line.startswith('x'):
msg['msg'] += line
list.append(msg)
print chat
OUTPUT for 2 lines
{'date': '0.0.00', 'msg': 'BlaBla'}
{'msg': 'x'}
The problem is I cant append the second dict message['msg'] to the last message, if starts with "x".
The expected output is:
{'date': '0.0.00', 'msg': 'BlaBlax'}
I tried using the variant, for changing the last appended chat:
else:
list[len(list) - 1]['msg'] += + line
but then I get the error:
IndexError: list index out of range
I also tried using next(infile) to predict the next line, but then it output every other line.
How would you trick a nested loop to append a dict entry?
Cheers
First of all do not use list as a name for a variable it is builtin keyword and you are shadowing it.
Secondly if I understand correctly you would like to append the last result.
Here:
if re.match('MY RULES', line) is not None:
field['date'] = line[:8]
msg['msg'] = line[start + 2:]
print msg
if line.startswith('x'):
msg['msg'] += line
You are analyzing the same line and this msg['msg'] = line[start + 2:] in the next iteration overwrites your key msg in dictionary msg and clear the previous value. So this code
field['date'] = line[:8]
msg['msg'] = line[start + 2:]
print msg
Always gets executed even for a simple x in your input file and clears the previous values under the key msg
If you would like it to work you need if else although I would recommend storing intermediate values it in a different way then in locally scoped variable.
Full example with code fix:
def parser():
path = "file.txt"
with open(path, 'r+') as file:
msg = {}
chat = []
start = 0
lines = file.readlines()
for i in range(0, len(lines)):
line = lines[i]
if True:
if line.startswith('x'):
msg['msg'] += line
else:
msg['date'] = line[:8]
msg['msg'] = line[12:]
chat.append(msg)
print(chat)
parser()
Result:
[{'date': '00.00.00', 'msg': 'Blabla\nx'}]
Assuming that the line if re.match('MY RULES', line) is not None:
is True for all the lines in the file that is:
00.00.00 : Blabla
x
How about this:
path = "file.txt"
with open (path, 'r') as f:
msg = dict()
for line in f.readlines():
if line[0].isdigit():
tmp = line.split(':')
date = tmp[0].strip()
msg[date] = ' '.join(*[x.split() for x in tmp[1:]])
else:
msg[date] += ' ' + ' '.join(*[line.split()])
We go line by line, in case first letter of the line is a digit we assume it is a date and add it to our dict - otherwise we add the string found to the last dict entry we made. str.split() makes sure you get ride of all different whitespace characters.
You can for sure replace the if statement in the for loop with your regex... The issue i see with your implementation in general is that as soon as the input varies slightly (e.g. more whitespace chars as intended) your solution produces faulty results. Basic python string manipulations are really powerful ;)
Update
This should produce the right output:
*file.txt*
00.00.00 : Blabla
x
00.00.00 : Blabla2
x2
path = "file.txt"
with open (path, 'r') as f:
lst = list()
for line in f.readlines():
if line[0].isdigit():
tmp = line.split(':')
date = tmp[0].strip()
msg = {date: ' '.join(*[x.split() for x in tmp[1:]])}
lst.append(msg)
else:
msg[date] += ' ' + ' '.join(*[line.split()])
print(lst)
>>> [{'00.00.00': 'Blabla x'}, {'00.00.00': 'Blabla2 x2'}]
I missed the part that you want to store each pair separately in a dict and append it to a list.
I open a dictionary and pull specific lines the lines will be specified using a list and at the end i need to print a complete sentence in one line.
I want to open a dictionary that has a word in each line
then print a sentence in one line with a space between the words:
N = ['19','85','45','14']
file = open("DICTIONARY", "r")
my_sentence = #?????????
print my_sentence
If your DICTIONARY is not too big (i.e. can fit your memory):
N = [19,85,45,14]
with open("DICTIONARY", "r") as f:
words = f.readlines()
my_sentence = " ".join([words[i].strip() for i in N])
EDIT: A small clarification, the original post didn't use space to join the words, I've changed the code to include it. You can also use ",".join(...) if you need to separate the words by a comma, or any other separator you might need. Also, keep in mind that this code uses zero-based line index so the first line of your DICTIONARY would be 0, the second would be 1, etc.
UPDATE:: If your dictionary is too big for your memory, or you just want to consume as little memory as possible (if that's the case, why would you go for Python in the first place? ;)) you can only 'extract' the words you're interested in:
N = [19, 85, 45, 14]
words = {}
word_indexes = set(N)
counter = 0
with open("DICTIONARY", "r") as f:
for line in f:
if counter in word_indexes:
words[counter] = line.strip()
counter += 1
my_sentence = " ".join([words[i] for i in N])
you can use linecache.getline to get specific line numbers you want:
import linecache
sentence = []
for line_number in N:
word = linecache.getline('DICTIONARY',line_number)
sentence.append(word.strip('\n'))
sentence = " ".join(sentence)
Here's a simple one with more basic approach:
n = ['2','4','7','11']
file = open("DICTIONARY")
counter = 1 # 1 if you're gonna count lines in DICTIONARY
# from 1, else 0 is used
output = ""
for line in file:
line = line.rstrip() # rstrip() method to delete \n character,
# if not used, print ends with every
# word from a new line
if str(counter) in n:
output += line + " "
counter += 1
print output[:-1] # slicing is used for a white space deletion
# after last word in string (optional)
I am trying to set up a system for running various statistics on a text file. In this endeavor I need to open a file in Python (v2.7.10) and read it both as lines, and as a string, for the statistical functions to work.
So far I have this:
import csv, json, re
from textstat.textstat import textstat
file = "Data/Test.txt"
data = open(file, "r")
string = data.read().replace('\n', '')
lines = 0
blanklines = 0
word_list = []
cf_dict = {}
word_dict = {}
punctuations = [",", ".", "!", "?", ";", ":"]
sentences = 0
This sets up the file and the preliminary variables. At this point, print textstat.syllable_count(string) returns a number. Further, I have:
for line in data:
lines += 1
if line.startswith('\n'):
blanklines += 1
word_list.extend(line.split())
for char in line.lower():
cf_dict[char] = cf_dict.get(char, 0) + 1
for word in word_list:
lastchar = word[-1]
if lastchar in punctuations:
word = word.rstrip(lastchar)
word = word.lower()
word_dict[word] = word_dict.get(word, 0) + 1
for key in cf_dict.keys():
if key in '.!?':
sentences += cf_dict[key]
number_words = len(word_list)
num = float(number_words)
avg_wordsize = len(''.join([k*v for k, v in word_dict.items()]))/num
mcw = sorted([(v, k) for k, v in word_dict.items()], reverse=True)
print( "Total lines: %d" % lines )
print( "Blank lines: %d" % blanklines )
print( "Sentences: %d" % sentences )
print( "Words: %d" % number_words )
print('-' * 30)
print( "Average word length: %0.2f" % avg_wordsize )
print( "30 most common words: %s" % mcw[:30] )
But this fails as 22 avg_wordsize = len(''.join([k*v for k, v in word_dict.items()]))/num returns a ZeroDivisionError: float division by zero. However, if I comment out the string = data.read().replace('\n', '') from the first piece of code, I can run the second piece without problem and get the expected output.
Basically, how do I set this up so that I can run the second piece of code on data, as well as textstat on string?
The call to data.read() places the file pointer at the end of the file, so you dont have anything more to read at this point. You either have to close and reopen the file or more simply reset the pointer at the begining using data.seek(0)
First see the line:
string = data.read().replace('\n', '')
You are reading from data once. Now, cursor is in the end of data.
Then see the line,
for line in data:
You are trying to read it again, but you just can't do it, because there is nothing else in data, you are at the end of it.so len(word_list) are returning 0.
You are dividing by it and getting the error.
ZeroDivisionError: float division by zero.
But when you comment it, now you are reading only once, which is valid, so second portion of your codes now work.
Clear now?
So, what to do now?
Use data.seek() after data.read()
Demo:
>>> a = open('file.txt')
>>> a.read()
#output
>>>a.read()
#nothing
>>> a.seek(0)
>>> a.read()
#output again
Here is a simple fix. Replace the line for line in data: by :
data.seek(0)
for line in data.readlines():
...
It basically points back to the beginning of the file and read it again line by line.
While this should work, you may want to simplify the code and read the file only once. Something like:
with open(file, "r") as fin:
lines = fin.readlines()
string = ''.join(lines).replace('\n', '')
Write a program that reads the contents of a random text file. The program should create a dictionary in which the keys are individual words found in the file and the values are the number of times each word appears.
How would I go about doing this?
def main():
c = 0
dic = {}
words = set()
inFile = open('text2', 'r')
for line in inFile:
line = line.strip()
line = line.replace('.', '')
line = line.replace(',', '')
line = line.replace("'", '') #strips the punctuation
line = line.replace('"', '')
line = line.replace(';', '')
line = line.replace('?', '')
line = line.replace(':', '')
words = line.split()
for x in words:
for y in words:
if x == y:
c += 1
dic[x] = c
print(dic)
print(words)
inFile.close()
main()
Sorry for the vague question. Never asked any questions here before. This is what I have so far. Also, this is the first ever programming I've done so I expect it to be pretty terrible.
with open('path/to/file') as infile:
# code goes here
That's how you open a file
for line in infile:
# code goes here
That's how you read a file line-by-line
line.strip().split()
That's how you split a line into (white-space separated) words.
some_dictionary['abcd']
That's how you access the key 'abcd' in some_dictionary.
Questions for you:
What does it mean if you can't access the key in a dictionary?
What error does that give you? Can you catch it with a try/except block?
How do you increment a value?
Is there some function that GETS a default value from a dict if the key doesn't exist?
For what it's worth, there's also a function that does almost exactly this, but since this is pretty obviously homework it won't fulfill your assignment requirements anyway. It's in the collections module. If you're interested, try and figure out what it is :)
There are at least three different approaches to add a new word to the dictionary and count the number of occurences in this file.
def add_element_check1(my_dict, elements):
for e in elements:
if e not in my_dict:
my_dict[e] = 1
else:
my_dict[e] += 1
def add_element_check2(my_dict, elements):
for e in elements:
if e not in my_dict:
my_dict[e] = 0
my_dict[e] += 1
def add_element_except(my_dict, elements):
for e in elements:
try:
my_dict[e] += 1
except KeyError:
my_dict[e] = 1
my_words = {}
with open('pathtomyfile.txt', r) as in_file:
for line in in_file:
words = [word.strip().lower() word in line.strip().split()]
add_element_check1(my_words, words)
#or add_element_check2(my_words, words)
#or add_element_except(my_words, words)
If you are wondering which is the fastest? The answer is: it depends. It depends on how often a given word might occur in the file. If a word does only occur (relatively) few times, the try-except would be the best choice in your case.
I have done some simple benchmarks here
This is a perfect job for the built in Python Collections class. From it, you can import Counter, which is a dictionary subclass made for just this.
How you want to process your data is up to you. One way to do this would be something like this
from collections import Counter
# Open your file and split by white spaces
with open("yourfile.txt","r") as infile:
textData = infile.read()
# Replace characters you don't want with empty strings
textData = textData.replace(".","")
textData = textData.replace(",","")
textList = textData.split(" ")
# Put your data into the counter container datatype
dic = Counter(textList)
# Print out the results
for key,value in dic.items():
print "Word: %s\n Count: %d\n" % (key,value)
Hope this helps!
Matt
I have a string:
"apples = green"
How do I print:
print everything before '=' (apples)
print everything after '=' (green)
specify a number of the string in a text file. I have .txt file which contains:
apples = green
lemons = yellow
... = ...
... = ...
split the string using .split():
print astring.split(' = ', 1)[0]
still split the string using .split():
print astring.split(' = ', 1)[1]
Alternatively, you could use the .partition() method:
>>> astring = "apples = green"
>>> print astring.split(' = ', 1)
['apples', 'green']
>>> print astring.partition(' = ')
('apples', ' = ', 'green')
Partition always only splits once, but returns the character you split on as well.
If you need to read a specific line in a file, skip lines first by iterating over the file object. The itertools.islice() function is the most compact way to return that line; don't worry too much if you don't understand how that all works. If the file doesn't have that many lines, an empty string is returned instead:
from itertools import islice
def read_specific_line(filename, lineno):
with open(filename) as f:
return next(islice(f, lineno, lineno + 1), '')
To read the 3rd line from a file:
line = read_specific_line('/path/to/some/file.txt', 3)
If instead you need to know what the line number is of a given piece of text, you'd need to use the enumerate() to keep track of the line count so far:
def what_line(filename, text):
with open(filename) as f:
for lineno, line in enumerate(f):
if line.strip() == text:
return lineno
return -1
which would return the line number (starting to count from 0), or -1 if the line was not found in the file.
Every string in python has a function within it called 'split.' If you call string.split("substring") It creates a list which does exactly what you are looking for.
>>> string = "apples = green"
>>> string.split("=")
['apples ', ' green']
>>> string = "apples = green = leaves = chloroplasts"
>>> string.split("=")
['apples ', ' green ', ' leaves ', ' chloroplasts']
So, if you use string.split(), you can call the index in the resulting list to get the substring you want:
>>> string.split(" = ")[0]
'apples'
>>> string.split(" = ")[1]
'green'
>>> string.split(" = ")[2]
'leaves'
etc... Just make sure you have a string which actually contains the substring, or this will throw an IndexError for any index greater than 0.