How to parse txtfile and export into dictionary? - python

My task is to parse a txtfile and return a dictionary with the counts of last names in the file. The txtfile looks like this:
city: Aberdeen
state: Washington
Johnson, Danny
Williams, Steve
Miller, Austin
Jones, Davis
Miller, Thomas
Johnson, Michael
I know how to read the file in, and assign the file to a list or a string, however I have no clue how to go about finding the counts of each and putting them into a dictionary. Could one of you point me in the right direction?

import re
with open('test.txt') as f:
text = f.read()
reobj = re.compile("(.+),", re.MULTILINE)
dic = {}
for match in reobj.finditer(text):
surname = match.group()
if surname in dic:
dic[surname] += 1
else:
dic[surname] = 1
The result is:
{'Williams,': 1, 'Jones,': 1, 'Miller,': 2, 'Johnson,': 2}

In order to find the counts of each surname:
you need to create a dictionary, empty will do
loop through the lines in the file
for each line in the file determine what you need to do with the data, there appear to be headers. Perhaps testing for the presence of a particular character in the string will suffice.
for each line with that you decide is a name, you need to split or perhaps partition the string to extract the surname.
then using the surname as a key to the dictionary, check for and set or increment an integer as the key's value.
after you've looped through the file data, you should have a dictionary keyed by surname and values being the number of appearances.

import re
file = open('data.txt','r')
lastnames={}
for line in file:
if re.search(':',line) ==None:
line.strip()
last = line.split(',')[0].strip()
first = line.split(',')[1].strip()
if lastnames.has_key(last):
lastnames[last]+= 1
else:
lastnames[last]= 1
print lastnames
Gives me the following
>>> {'Jones': 1, 'Miller': 2, 'Williams': 1, 'Johnson': 2}

This would be my approach. No need to use regex. Also filtering blank lines for extra robustness.
from __future__ import with_statement
from collections import defaultdict
def nonblank_lines(f):
for l in f:
line = l.rstrip()
if line:
yield line
with open('text.txt') as text:
lines = nonblank_lines(text)
name_lines = (l for l in lines if not ':' in l)
surnames = (line.split(',')[0].strip() for line in name_lines)
counter = defaultdict(int)
for surname in surnames:
counter[surname] += 1
print counter
If you're using a Python version > 2.7 you could use the built in collections.Counter instead of a defaultdict.

Related

How to save data from text file to python dictionary and select designation data

Example of data in txt file:
apple
orange
banana
lemon
pears
Code of filtering words with 5 letters without dictionary:
def numberofletters(n):
file = open("words.txt","r")
lines = file.readlines()
file.close()
for line in lines:
if len(line) == 6:
print(line)
return
print("===================================================================")
print("This program can use for identify and print out all words in 5
letters from words.txt")
n = input("Please Press enter to start filtering words")
print("===================================================================")
numberofletters(n)
My question is how create a dictionary whose keys are integers and values the English words with that many letters and Use the dictionary to identify and print out all the 5 letter words?
Imaging with a huge list of words
Sounds like a job for a defaultdict.
>>> from collections import defaultdict
>>> length2words = defaultdict(set)
>>>
>>> with open('file.txt') as f:
... for word in f: # one word per line
... word = word.strip()
... length2words[len(word)].add(word)
...
>>> length2words[5]
set(['lemon', 'apple', 'pears'])
If you care about duplicates and insertion order, use a defaultdict(list) and append instead of add.
you can make your for loop like this:
for line in lines:
line_len = len(line)
if line_len not in dicword.keys():
dicword.update({line_len: [line]})
else:
dicword[line_len].append(line)
Then you can get it by just doing dicword[5]
If I understood, you need to write filter your document and result into a file. For that you can write a CSV file with DictWriter (https://docs.python.org/2/library/csv.html).
DictWriter: Create an object which operates like a regular writer but maps dictionaries onto output rows.
BTW, you will be able to store and structure your document
def numberofletters(n):
file = open("words.txt","r")
lines = file.readlines()
file.close()
dicword = {}
writer = csv.DictWriter(filename, fieldnames=fieldnames)
writer.writeheader()
for line in lines:
if len(line) == 6:
writer.writerow({'param_label': line, [...]})
return
I hope that help you.

Python sort text file in dictionary

I have a text file that looks something like this:
John Graham 2
Marcus Bishop 0
Bob Hamilton 1
... and like 20 other names.
Each name appears several times and with a different number(score) after it.
I need to make a list that shows each name only one time and with a sum of that name's total score efter it. I need to use a dictionary.
This is what i have done, but it only makes a list like the text file looked like from the beginning:
dict = {}
with open('scores.txt', 'r+') as f:
data = f.readlines()
for line in data:
nameScore = line.split()
print (nameScore)
I don't know how to do the next part.
Here is one option using defaultdict(int):
from collections import defaultdict
result = defaultdict(int)
with open('scores.txt', 'r') as f:
for line in f:
key, value = line.rsplit(' ', 1)
result[key] += int(value.strip())
print result
If the contents of scores.txt is:
John Graham 2
Marcus Bishop 0
Bob Hamilton 1
John Graham 3
Marcus Bishop 10
it prints:
defaultdict(<type 'int'>,
{'Bob Hamilton': 1, 'John Graham': 5, 'Marcus Bishop': 10})
UPD (formatting output):
for key, value in result.iteritems():
print key, value
My first pass would look like:
scores = {} # Not `dict`. Don't reuse builtin names.
with open('scores.txt', 'r') as f: # Not "r+" unless you want to write later
for line in f:
name, score = line.strip().rsplit(' ', 1)
score = int(score)
if name in scores:
scores[name] = scores[name] + score
else:
scores[name] = score
print scores.items()
This isn't exactly how I'd write it, but I wanted to be explicit enough that you could follow along.
use dictionary get:
dict = {}
with open('file.txt', 'r+') as f:
data = f.readlines()
for line in data:
nameScore = line.split()
l=len(nameScore)
n=" ".join(nameScore[:l-1])
dict[n] = dict.get(n,0) + int(nameScore[-1])
print dict
Output:
{'Bob Hamilton': 1, 'John Graham': 2, 'Marcus Bishop': 0}
I had a similar situation I was in. I modified Wesley's Code to work for my specific situation. I had a mapping file "sort.txt" that consisted of different .pdf files and numbers to indicate the order that I want them in based on an output from DOM manipulation from a website. I wanted to combine all these separate pdf files into a single pdf file but I wanted to retain the same order they are in as they are on the website. So I wanted to append numbers according to their tree location in a navigation menu.
1054 spellchecking.pdf
1055 using-macros-in-the-editor.pdf
1056 binding-macros-with-keyboard-shortcuts.pdf
1057 editing-macros.pdf
1058 etc........
Here is the Code I came up with:
import os, sys
# A dict with keys being the old filenames and values being the new filenames
mapping = {}
# Read through the mapping file line-by-line and populate 'mapping'
with open('sort.txt') as mapping_file:
for line in mapping_file:
# Split the line along whitespace
# Note: this fails if your filenames have whitespace
new_name, old_name = line.split()
mapping[old_name] = new_name
# List the files in the current directory
for filename in os.listdir('.'):
root, extension = os.path.splitext(filename)
#rename, put number first to allow for sorting by name and
#then append original filename +e extension
if filename in mapping:
print "yay" #to make coding fun
os.rename(filename, mapping[filename] + filename + extension)
I didn't have a suffix like _full so I didn't need that code. Other than that its the same code, I've never really touched python so this was a good learning experience for me.

Create a list from file in python

I have a .txt file like this:
John 26
Mary 48
Nick 34
I want import them and put them in a list so that I can find specific elements. For example age[1] would have the value 48, name[1] the value Mary etc.
I tried doing
import sys,random
f = open('example.txt', 'r')
for line in f:
tokens=line.split()
a=tokens[0]
print a[1]
but the result of print a[1] is the second letter of each string.
Instead of a[1], you want tokens[1].
This is the value of a, which is the first element of tokens:
Nick
But the second element of tokens is the age:
"34"
As #user mentioned, you probably wanted to have it as integer, not a string. You can convert it to integer:
a = int(tokens[1])
#thefourtheye proposed a nice solution. I think i'll propose to store it in a dictionary:
with open('example.txt') as f:
ages = {}
for line in f:
d = line.split()
ages[d[0]] = int(d[1])
And here is ages:
{'John':26, 'Mary':48, 'Nick':34}
To retrieve the age of John:
print(ages['John'])
Hope this helps!
While reading from a file, always use with, so that you dont have to worry about closing the file.
Then, you can read lines and split them and finally unzip them like this
with open('Input.txt', 'r') as inFile:
names, ages = zip(*(line.rstrip().split() for line in inFile))
print names, ages
Output
('John', 'Mary', 'Nick') ('26', '48', '34')
You can access the individual names and ages like this
names[0], ages[0]

a loop to read lines that start with specific letters

I'm using a for loop to read a file, but I only want to read specific lines, say line that start with "af"and "apn". Is there any built-in feature to achieve this?
How to split this line after reading it ?
How to store the elements from the split into a dictionary?
Lets say the first element of the line after the split is employee ID i store it in the dictionary then the second element is his full name i want to store it in the dictionary too.
So when i use this line "employee_dict{employee_ID}" will i get his full name ?
Thank you.
You can do so very easily
f = open('file.txt', 'r')
employee_dict = {}
for line in f:
if line.startswith("af") or line.startswith("apn"):
emprecords = line.split() #assuming the default separator is a space character
#assuming that all your records follow a common format, you can then create an employee dict
employee = {}
#the first element after the split is employee id
employee_id = int(emprecords[0])
#enter name-value pairs within the employee object - for e.g. let's say the second element after the split is the emp name, the third the age
employee['name'] = emprecords[1]
employee['age'] = emprecords[2]
#store this in the global employee_dict
employee_dict[employee_id] = employee
To retrieve the name of employee id 1 after having done the above use something like:
print employee_dict[1]['name']
Hope this gives you an idea on how to go about
if your file looks like
af, 1, John
ggg, 2, Dave
you could create dict like
d = {z[1].strip() : z[2].strip() for z in [y for y in [x.split(',') for x in open(r"C:\Temp\test1.txt")] if y[0] in ('af', 'apn')]}
More readable version
d = {}
for l in open(r"C:\Temp\test1.txt"):
x = l.split(',')
if x[0] not in ('af', 'apn'): continue
d[x[1].strip()] = x[2].strip()
both solutions give you d = {'1': 'John'} on this example. To get name from the dict, you can do name = d['1']
prefixes = ("af", "apn")
with open('file.txt', 'r') as f:
employee_dict = dict((line.split()[:2]) for line in f if any(line.startswith(p) for p in prefixes))
dictOfNames={}
file = open("filename","r")
for line in file:
if line.startswith('af') or if line.startswith('apn'):
line=line.split(',') #split using delimiter of ','
dictOfNames[line[1]] = line[2] # take 2nd element of line as id and 3rd as name
The program above will read the file and store the second element as id and third as name if it starts with 'af' or 'apn'. Assuming comma is the delimiter.
now you can go with dictOfNames[id] to get the name.

I am having a problem with my python program that im running

From an input file I'm suppose to extract only first name of the student and then save the result in a new file called "student-­‐firstname.txt" The output file should contain a list of
first names (not include middle name). I was able to get delete of the last name but I'm having problem deleting the middle name any help or suggestion?
the student name in the file look something like this (last name, first name, and middle initial)
Martin, John
Smith, James W.
Brown, Ashley S.
my python code is:
f=open("studentname.txt", 'r')
f2=open ("student-firstname.txt",'w')
str = ''
for line in f.readlines():
str = str + line
line=line.strip()
token=line.split(",")
f2.write(token[1]+"\n")
f.close()
f2.close()
f=open("studentname.txt", 'r')
f2=open ("student-firstname.txt",'w')
for line in f.readlines():
token=line.split()
f2.write(token[1]+"\n")
f.close()
f2.close()
Split token[1] with space.
fname = token[1].split(' ')[0]
with open("studentname.txt") as f, open("student-firstname.txt", 'w') as fout:
for line in f:
firstname = line.split()[1]
print >> fout, firstname
Note:
you could use a with statement to make sure that the files are always closed even in case of an exception. You might need contextlib.nested() on old Python versions
'r' is a default mode for files. You don't need to specify it explicitly
.readlines() reads all lines at once. You could iterate over the file line by line directly
To avoid hardcoding the filenames you could use fileinput. Save it to firstname.py:
#!/usr/bin/env python
import fileinput
for line in fileinput.input():
firstname = line.split()[1]
print firstname
Example: $ python firstname.py studentname.txt >student-firstname.txt
Check out regular expressions. Something like this will probably work:
>>> import re
>>> nameline = "Smith, James W."
>>> names = re.match("(\w+),\s+(\w+).*", nameline)
>>> if names:
... print names.groups()
('Smith', 'James')
Line 3 basically says find a sequence of word characters as group 0, followed by a comma, some space characters and another sequence of word characters as group 1, followed by anything in nameline.
f = open("file")
o = open("out","w")
for line in f:
o.write(line.rstrip().split(",")[1].strip().split()+"\n")
f.close()
o.close()

Categories

Resources