I have a text file that looks something like this:
John Graham 2
Marcus Bishop 0
Bob Hamilton 1
... and like 20 other names.
Each name appears several times and with a different number(score) after it.
I need to make a list that shows each name only one time and with a sum of that name's total score efter it. I need to use a dictionary.
This is what i have done, but it only makes a list like the text file looked like from the beginning:
dict = {}
with open('scores.txt', 'r+') as f:
data = f.readlines()
for line in data:
nameScore = line.split()
print (nameScore)
I don't know how to do the next part.
Here is one option using defaultdict(int):
from collections import defaultdict
result = defaultdict(int)
with open('scores.txt', 'r') as f:
for line in f:
key, value = line.rsplit(' ', 1)
result[key] += int(value.strip())
print result
If the contents of scores.txt is:
John Graham 2
Marcus Bishop 0
Bob Hamilton 1
John Graham 3
Marcus Bishop 10
it prints:
defaultdict(<type 'int'>,
{'Bob Hamilton': 1, 'John Graham': 5, 'Marcus Bishop': 10})
UPD (formatting output):
for key, value in result.iteritems():
print key, value
My first pass would look like:
scores = {} # Not `dict`. Don't reuse builtin names.
with open('scores.txt', 'r') as f: # Not "r+" unless you want to write later
for line in f:
name, score = line.strip().rsplit(' ', 1)
score = int(score)
if name in scores:
scores[name] = scores[name] + score
else:
scores[name] = score
print scores.items()
This isn't exactly how I'd write it, but I wanted to be explicit enough that you could follow along.
use dictionary get:
dict = {}
with open('file.txt', 'r+') as f:
data = f.readlines()
for line in data:
nameScore = line.split()
l=len(nameScore)
n=" ".join(nameScore[:l-1])
dict[n] = dict.get(n,0) + int(nameScore[-1])
print dict
Output:
{'Bob Hamilton': 1, 'John Graham': 2, 'Marcus Bishop': 0}
I had a similar situation I was in. I modified Wesley's Code to work for my specific situation. I had a mapping file "sort.txt" that consisted of different .pdf files and numbers to indicate the order that I want them in based on an output from DOM manipulation from a website. I wanted to combine all these separate pdf files into a single pdf file but I wanted to retain the same order they are in as they are on the website. So I wanted to append numbers according to their tree location in a navigation menu.
1054 spellchecking.pdf
1055 using-macros-in-the-editor.pdf
1056 binding-macros-with-keyboard-shortcuts.pdf
1057 editing-macros.pdf
1058 etc........
Here is the Code I came up with:
import os, sys
# A dict with keys being the old filenames and values being the new filenames
mapping = {}
# Read through the mapping file line-by-line and populate 'mapping'
with open('sort.txt') as mapping_file:
for line in mapping_file:
# Split the line along whitespace
# Note: this fails if your filenames have whitespace
new_name, old_name = line.split()
mapping[old_name] = new_name
# List the files in the current directory
for filename in os.listdir('.'):
root, extension = os.path.splitext(filename)
#rename, put number first to allow for sorting by name and
#then append original filename +e extension
if filename in mapping:
print "yay" #to make coding fun
os.rename(filename, mapping[filename] + filename + extension)
I didn't have a suffix like _full so I didn't need that code. Other than that its the same code, I've never really touched python so this was a good learning experience for me.
Related
How to compare multiple lines and join the lines based on the first word if they are same using python
I am a beginner to python and trying to compare multiple lines in a text file and print by joining them.
Text file
Rolt12 is a musician
Rolt1 is dancer
Rolt1 is an actor
Rolt14 is a singer
I am trying to print
Rolt12 is a musician
Rolt1 is a dancer; is an actor
Rolt14 is a singer
So far I know the opening and writing into the file
with open ('input.txt','r') as ifh, with open ('out.txt', 'w') as ofh:
ifh.readlines()
After this, I think I should compare the lines in a text file and check whether the first is same or no. Later, join them if the first word is the same. But I am not sure how to compare and join them. Any help would be appreciated....Thank you
A reasonable approach to this problem would be to use a dictionary to store each name's list of occupations. For example, if you have the following set up:
data = [("Rolt12", "musician"), ("Rolt1", "dancer"), ("Rolt1", "actor"), ("Rolt14", "singer")]
You could use the following code to make a list of occupations for each name:
occupations = {}
for name, occupation in data:
if name not in occupations:
occupations[name] = []
occupations[name].append(occupation)
Or, more idiomatically:
import collections
occupations = collections.defaultdict(list)
for name, occupation in data:
occupations[name].append(occupation)
Then, you can iterate over the dictionary to print the data you want:
for name, all_occupations in occupations.items():
occupations_string = "; ".join(all_occupations)
print(f"{name} is a {occupations_string}")
You can solve it using the dictionary split each line into two parts one with 'name' and other part(rest of the line without name). Use name as key in the dictionary.
from collections import defaultdict
with open('data.txt') as fp:
d = defaultdict(list)
for line in fp:
x = line.strip().split(' ', 1)
d[x[0]].append(x[1])
#writing output to new file
with open('output.txt', 'w') as fw:
for k, v in d.items():
fw.write( k + ' ' + '; '.join(v) + '\n')
Output:
Rolt12 is a musician
Rolt1 is dancer; is an actor
Rolt14 is a singer
I have different text files and I want to extract the values from there into a csv file.
Each file has the following format
main cost: 30
additional cost: 5
I managed to do that but the problem that I want it to insert the values of each file into a different columns I also want the number of text files to be a user argument
This is what I'm doing now
numFiles = sys.argv[1]
d = [[] for x in xrange(numFiles+1)]
for i in range(numFiles):
filename = 'mytext' + str(i) + '.text'
with open(filename, 'r') as in_file:
for line in in_file:
items = line.split(' : ')
num = items[1].split('\n')
if i ==0:
d[i].append(items[0])
d[i+1].append(num[0])
grouped = itertools.izip(*d[i] * 1)
if i == 0:
grouped1 = itertools.izip(*d[i+1] * 1)
with open(outFilename, 'w') as out_file:
writer = csv.writer(out_file)
for j in range(numFiles):
for val in itertools.izip(d[j]):
writer.writerow(val)
This is what I'm getting now, everything in one column
main cost
additional cost
30
5
40
10
And I want it to be
main cost | 30 | 40
additional cost | 5 | 10
You could use a dictionary to do this where the key will be the "header" you want to use and the value be a list.
So it would look like someDict = {'main cost': [30,40], 'additional cost': [5,10]}
edit2: Went ahead and cleaned up this answer so it makes a little more sense.
You can build the dictionary and iterate over it like this:
from collections import OrderedDict
in_file = ['main cost : 30', 'additional cost : 5', 'main cost : 40', 'additional cost : 10']
someDict = OrderedDict()
for line in in_file:
key,val = line.split(' : ')
num = int(val)
if key not in someDict:
someDict[key] = []
someDict[key].append(num)
for key in someDict:
print(key)
for value in someDict[key]:
print(value)
The code outputs:
main cost
30
40
additional cost
5
10
Should be pretty straightforward to modify the example to fit your desired output.
I used the example # append multiple values for one key in Python dictionary and thanks to #wwii for some suggestions.
I used an OrderedDict since a dictionary won't keep keys in order.
You can run my example # https://ideone.com/myN2ge
This is how I might do it. Assumes the fields are the same in all the files. Make a list of names, and a dictionary using those field names as keys, and the list of values as the entries. Instead of running on file1.text, file2.text, etc. run the script with file*.text as a command line argument.
#! /usr/bin/env python
import sys
if len(sys.argv)<2:
print "Give file names to process, with wildcards"
else:
FileList= sys.argv[1:]
FileNum = 0
outFilename = "myoutput.dat"
NameList = []
ValueDict = {}
for InfileName in FileList:
Infile = open(InfileName, 'rU')
for Line in Infile:
Line=Line.strip('\n')
Name,Value = Line.split(":")
if FileNum==0:
NameList.append(Name.strip())
ValueDict[Name] = ValueDict.get(Name,[]) + [Value.strip()]
FileNum += 1 # the last statement in the file loop
Infile.close()
# print NameList
# print ValueDict
with open(outFilename, 'w') as out_file:
for N in NameList:
OutString = "{},{}\n".format(N,",".join(ValueDict.get(N)))
out_file.write(OutString)
Output for my four fake files was:
main cost,10,10,40,10
additional cost,25.6,25.6,55.6,25.6
okay so I have a file that contains ID number follows by name just like this:
10 alex de souza
11 robin van persie
9 serhat akin
I need to read this file and break each record up into 2 fields the id, and the name. I need to store the entries in a dictionary where ID is the key and the name is the satellite data. Then I need to output, in 2 columns, one entry per line, all the entries in the dictionary, sorted (numerically) by ID. dict.keys and list.sort might be helpful (I guess). Finally the input filename needs to be the first command-line argument.
Thanks for your help!
I have this so far however can't go any further.
fin = open("ids","r") #Read the file
for line in fin: #Split lines
string = str.split()
if len(string) > 1: #Seperate names and grades
id = map(int, string[0]
name = string[1:]
print(id, name) #Print results
We need sys.argv to get the command line argument (careful, the name of the script is always the 0th element of the returned list).
Now we open the file (no error handling, you should add that) and read in the lines individually. Now we have 'number firstname secondname'-strings for each line in the list "lines".
Then open an empty dictionary out and loop over the individual strings in lines, splitting them every space and storing them in the temporary variable tmp (which is now a list of strings: ('number', 'firstname','secondname')).
Following that we just fill the dictionary, using the number as key and the space-joined rest of the names as value.
To print the dictionary sorted just loop over the list of numbers returned by sorted(out), using the key=int option for numerical sorting. Then print the id (the number) and then the corresponding value by calling the dictionary with a string representation of the id.
import sys
try:
infile = sys.argv[1]
except IndexError:
infile = input('Enter file name: ')
with open(infile, 'r') as file:
lines = file.readlines()
out = {}
for fullstr in lines:
tmp = fullstr.split()
out[tmp[0]] = ' '.join(tmp[1:])
for id in sorted(out, key=int):
print id, out[str(id)]
This works for python 2.7 with ASCII-strings. I'm pretty sure that it should be able to handle other encodings as well (German Umlaute work at least), but I can't test that any further. You may also want to add a lot of error handling in case the input file is somehow formatted differently.
Just a suggestion, this code is probably simpler than the other code posted:
import sys
with open(sys.argv[1], "r") as handle:
lines = handle.readlines()
data = dict([i.strip().split(' ', 1) for i in lines])
for idx in sorted(data, key=int):
print idx, data[idx]
My task is to parse a txtfile and return a dictionary with the counts of last names in the file. The txtfile looks like this:
city: Aberdeen
state: Washington
Johnson, Danny
Williams, Steve
Miller, Austin
Jones, Davis
Miller, Thomas
Johnson, Michael
I know how to read the file in, and assign the file to a list or a string, however I have no clue how to go about finding the counts of each and putting them into a dictionary. Could one of you point me in the right direction?
import re
with open('test.txt') as f:
text = f.read()
reobj = re.compile("(.+),", re.MULTILINE)
dic = {}
for match in reobj.finditer(text):
surname = match.group()
if surname in dic:
dic[surname] += 1
else:
dic[surname] = 1
The result is:
{'Williams,': 1, 'Jones,': 1, 'Miller,': 2, 'Johnson,': 2}
In order to find the counts of each surname:
you need to create a dictionary, empty will do
loop through the lines in the file
for each line in the file determine what you need to do with the data, there appear to be headers. Perhaps testing for the presence of a particular character in the string will suffice.
for each line with that you decide is a name, you need to split or perhaps partition the string to extract the surname.
then using the surname as a key to the dictionary, check for and set or increment an integer as the key's value.
after you've looped through the file data, you should have a dictionary keyed by surname and values being the number of appearances.
import re
file = open('data.txt','r')
lastnames={}
for line in file:
if re.search(':',line) ==None:
line.strip()
last = line.split(',')[0].strip()
first = line.split(',')[1].strip()
if lastnames.has_key(last):
lastnames[last]+= 1
else:
lastnames[last]= 1
print lastnames
Gives me the following
>>> {'Jones': 1, 'Miller': 2, 'Williams': 1, 'Johnson': 2}
This would be my approach. No need to use regex. Also filtering blank lines for extra robustness.
from __future__ import with_statement
from collections import defaultdict
def nonblank_lines(f):
for l in f:
line = l.rstrip()
if line:
yield line
with open('text.txt') as text:
lines = nonblank_lines(text)
name_lines = (l for l in lines if not ':' in l)
surnames = (line.split(',')[0].strip() for line in name_lines)
counter = defaultdict(int)
for surname in surnames:
counter[surname] += 1
print counter
If you're using a Python version > 2.7 you could use the built in collections.Counter instead of a defaultdict.
So i have this txt file called "Students.txt", I want to define a function called load(student)
so i have this code:
def load(student):
body
im not quite sure what to write for the body of code so that it reads the file and returns the value from the file as dictionary. I know it would be something like readlines()
anyway, the file students.txt looks like this:
P883, Michael Smith, 1991
L672, Jane Collins, 1992
(added)L322, Randy Green, 1992
H732, Justin Wood, 1995(added)
^key ^name ^year of birth
the function has to return as a dictionary that looks like this:
{'P883': ('Michael Smith',1991),
'(key)':('name','year')}
I managed to return the values by trial and error however i cant make new lines and keep returning \n.
===============
this question has been answered and i used the following code which works perfectly however when there is a space in the values from the txt file.. (see added parts) it doesnt work anymore and gives an error saying that list index is out of range
Looks like a CSV file. You can use the csv module then:
import csv
studentReader = csv.reader(open('Students.txt', 'rb'), delimiter=',', skipinitialspace=True)
d = dict()
for row in studentReader:
d[row[0]] = tuple(row[1:])
This won't give you the year as integer, you have to transform it yourself:
for row in studentReader:
d[row[0]] = tuple(row[1], int(row[2]))
Something like this should do it, I think:
students = {}
infile = open("students.txt")
for line in infile:
line = line.strip()
parts = [p.strip() for p in line.split(",")]
students[parts[0]] = (parts[1], parts[2])
This might not be 100%, but should give you a starting-point. Error handling was omitted for brevity.
def load(students_file):
result = {}
for line in students_file:
key, name, year_of_birth = [x.strip() for x in line.split(",")]
result[key] = (name, year_of_birth)
return result
I would use the pickle module in python to save your data as a dict so you could load it easily by unpickling it. Or, you could just do:
d = {}
with open('Students.txt', 'r') as f:
for line in f:
tmp = line.strip().split(',')
d[tmp[0]] = tuple(tmp[1],tmp[2])