CSV selecting multiple columns - python

I have this CSV file whereby it contain lots of information. I have coded a program which are able to count what are inside the columns of 'Feedback' and the frequency of it.
My problem now is that, after I have produced the items inside 'Feedback' columns, I want to specifically bring out another columns which tally to the 'Feedback' columns.
Some example of the CSV file is as follow:
Feedback Description Status
Others Fire Proct Complete
Complaints Grass Complete
Compliment Wall Complete
... ... ...
With the frequency of the 'Feedback' columns, I now want to show, let's say if I select 'Complaints'. Then I want everything that tally with 'Complaints' from Description to show up.
Something like this:
Complaints Grass
Complaints Table
Complaints Door
... ...
Following is the code I have so far:
import csv, sys, os, shutil
from collections import Counter
reader = csv.DictReader(open('data.csv'))
result = {}
for row in reader:
for column, value in row.iteritems():
result.setdefault(column,[]).append(value)
list = []
for items in result['Feedback']:
if items == '':
items = items
else:
newitem = items.upper()
list.append(newitem)
unique = Counter(list)
for k, v in sorted(unique.items()):
print k.ljust(30),' : ', v
This is only the part whereby it count what's inside the 'Feedback' Columns and the frequency of it.

You could also store a defaultdict() holding a list of entries for each category as follows:
import csv
from collections import Counter, defaultdict
with open('data.csv', 'rb') as f_csv:
csv_reader = csv.DictReader(f_csv)
result = {}
feedback = defaultdict(list)
for row in csv_reader:
for column, value in row.iteritems():
result.setdefault(column, []).append(value)
feedback[row['Feedback'].upper()].append(row['Description'])
data = []
for items in result['Feedback']:
if items == '':
items = items
else:
newitem = items.upper()
data.append(newitem)
unique = Counter(data)
for k, v in sorted(unique.items()):
print "{:20} : {:5} {}".format(k, v, ', '.join(feedback[k]))
This would display your output as:
COMPLAINTS : 2 Grass, Door
COMPLIMENT : 2 Wall, Table
OTHERS1 : 1 Fire Proct
Or on multiple lines if instead you used:
print "{:20} : {:5}".format(k, v)
print ' ' + '\n '.join(feedback[k])
When using the csv library, you should open your file with rb in Python 2.x. Also avoid using list as a variable name as this overwrites the Python list() function.
Note: It is easier to use format() when printing aligned data.

You can do it with the code at the very end of this snippet, which is derived from the code in your question. I modified how the file is read by using a with statement which insures that it is closed when it's no longer needed. I also changed the name of the variable named list you had. because it hides the name of the built-in type and is considered by most to be a poor programming practice. See PEP 8 - Style Guide for Python Code for more on this and related topics.
For testing purposes, I also added a couple more rows of 'Complaints' type of 'Feedback' items.
import csv
from collections import Counter
with open('information.csv') as csvfile:
result = {}
for row in csv.DictReader(csvfile):
for column, value in row.iteritems():
result.setdefault(column, []).append(value)
items = [item.upper() for item in result['Feedback']]
unique = Counter(items)
for k, v in sorted(unique.items()):
print k.ljust(30), ' : ', v
print
for i, feedback in enumerate(result['Feedback']):
if feedback == 'Complaints':
print feedback, ' ', result['Description'][i]
Output:
COMPLAINTS : 3
COMPLIMENT : 1
OTHERS : 1
Complaints Grass
Complaints Table
Complaints Door

Related

Sorting and enumerating imported data from txt file (Python)

guys!
I'm trying to do a movie list, with data imported from a txt file that looks like this:
"Star Wars", "Y"
"Indiana Jones", "N"
"Pulp Fiction", "N"
"Fight Club", "Y"
(with Y = watched, and N = haven't seen yet)
I'm trying to sort the list by name, so that it'll look something like:
1. Fight Club (Watched)
2. Indiana Jones (Have Not Watched Yet)
3. Pulp Fiction (Have Not Watched Yet)
4. Star Wars (Watched)
And this is what I have so far:
def sortAlphabetically():
movie_list = {}
with open('movies.txt') as f:
for line in f:
movie, watched = line.strip().split(',')
movie_list[movie.strip()] = watched.strip()
if watched.strip() == '"N"':
print(movie.strip() + " (Have Not Watched Yet)")
if watched.strip() == '"Y"':
print(movie.strip() + " (Watched)")
I found a tutorial and tried adding this code within the function to sort them:
sortedByKeyDict = sorted(movie_list.items(), key=lambda t: t[0])
return sortedByKeyDict
I also tried using from ast import literal_eval to try and remove the quotation marks and then inserting this in the function:
for k, v in movie_list.items():
movie_list[literal_eval(k)] = v
But neither worked.
What should I try next?
Is it possible to remove the quotation marks?
And how do I go about enumerating?
Thank you so much in advance!
Here you go, this should do it:
filename = './movies.txt'
watched_mapping = {
'Y': 'Watched',
'N': 'Have Not Watched Yet'
}
with open(filename) as f:
content = f.readlines()
movies = []
for line in content:
name, watched = line.strip().lstrip('"').rstrip('"').split('", "')
movies.append({
'name': name,
'watched': watched
})
sorted_movies = sorted(movies, key=lambda k: k['name'])
for i, movie in enumerate(sorted_movies, 1):
print('{}. {} ({})'.format(
i,
movie['name'],
watched_mapping[movie['watched']],
))
First we define watched_mapping which simply maps values in your file to values you want printed.
After that we open the file and read all of its lines into a content list.
Next thing to do is parse that list and extract values from it (from each line we must extract the movie name and whether it has been watched or not). We will save those values into another list of dictionaries, each containing the movie name and whether it has been watched.
Thats what name, watched = line.strip().lstrip('"').rstrip('"').split('", "') is for, it basically strips garbage from each end of the line and then splits the line by garbage in the middle, returning clean name and watched.
Next thing to do is sort the list by name value in each dictionary:
sorted_movies = sorted(movies, key=lambda k: k['name'])
After that we simply enumerate the sorted list (starting at 1) and parse it to print out the desired output (using the watched_mapping to print out sentences instead of simple Y and N).
Output:
1. Fight Club (Watched)
2. Indiana Jones (Have Not Watched Yet)
3. Pulp Fiction (Have Not Watched Yet)
4. Star Wars (Watched)
Case insensitive sorting changes:
sorted_movies = sorted(movies, key=lambda k: k['name'].lower())
Simply change the value movies get sorted by into lowercase name. Now when sorting all the names are treated as lowercase.
Your function with some quick fix
def sortAlphabetically():
movie_list = []
with open('movies.txt') as f:
for line in f:
movie, watched = line.strip().split(',')
movie_list.append({
'name': movie.strip()[1:-1],
'watched': watched.strip()[1:-1]
})
return sorted(movie_list, key = lambda x : x['name'])
Well I just modified your code. When you use sorted() on a dictionary, then dictionary gets converted to a list of tuples. All I have done is that I have made another dictionary from the existing list of tuples.
def sortAlphabetically():
movie_list = dict()
with open('movies.txt') as f:
for line in f:
movie, watched = line.strip().split(',')
movie_list[movie.strip()] = watched.strip()
movie_sorted = sorted(movie_list.items(), key = lambda kv: kv[0])
movie_list = dict()
for key, value in movie_sorted:
movie_list[key] = value
i = 1
for key, value in movie_list.items():
if value == 'Y':
print("{}. {} {}".format(i,key,"(Watched)"))
else:
print("{}. {} {}".format(i,key,"Have Not Watched Yet"))
i += 1
I intentionally kept the code simple for better understanding. Hope this helps :)

How to extract from dictionaries to only print certain variables python

I had a tsv file like such
Name School Course
Nicole UVA Biology
Jenna GWU CS
from there,
I only want to print the Name and the Course from that dictionary. How would I go about this?
The code below is how I put the original TSV file into the dictionary above.
import csv
data = csv.reader(open('data.tsv'),delimiter='\t')
fields = data.next()
for row in data:
item = dict(zip(fields, row))
print item
So now I got a dictionary like such:
{'Name':'Nicole.', 'School':'UVA.','Course':'Biology'}
{'Name':'Jenna.', 'School':'GWU','Course':'CS'}
{'Name':'Shan', 'School':'Columbia','Course':'Astronomy'}
{'Name':'BILL', 'School':'UMD.','Course':'Algebra'}
I only want to print the Name and the Course from that dictionary. How would I go about this?
I want to add code so that I'm only printing
{'Name':'Jenna.','Course':'CS'}
{'Name':'Shan','Course':'Astronomy'}
{'Name':'BILL','Course':'Algebra'}
Please guide. Thank You
Just delete the key in the loop
for row in data:
item = dict(zip(fields, row))
del item['School']
print item
The easiest way is to just remove the item['School'] entry before printing
for row in data:
item = dict(zip(fields, row))
del item['School']
print item
But this only works if you know exactly what the dictionary looks like, it has no other entries that you don't want, and it already has a School entry. I would instead recommend that you build a new dictionary out of the old one, only keeping the Name and Course entries
for row in data:
item = dict(zip(fields, row))
item = {k, v for k, v in item.items() if k in ('Name', 'Course')}
print item
Maybe use a DictReader in the first place, and rebuild the row-dict only if key matches a pre-defined list:
import csv
keep = {"Name","Course"}
data = csv.DictReader(open('data.tsv'),delimiter='\t')
for row in data:
row = {k:v for k,v in row.items() if k in keep}
print(row)
result:
{'Course': 'Biology', 'Name': 'Nicole'}
{'Course': 'CS', 'Name': 'Jenna'}
Based on the answer here:
filter items in a python dictionary where keys contain a specific string
print {k:v for k,v in item.iteritems() if "Name" in k or "Course" in k}
You're better off using a library designed for these kinds of tasks (Pandas). A dictionary is great for storing key-value pairs, but it looks like you have spreadsheet-like tabular data, so you should choose a storage type that better reflects the data at hand. You could simply do the following:
import pandas as pd
df = pd.read_csv('myFile.csv', sep = '\t')
print df[['Name','Course']]
You'll find that as you start doing more complicated tasks, it's better to use well written libraries than to cludge something together
replace the your " print(item) " line with the below line.
print(dict(filter(lambda e: e[0]!='School', item)))
OUTPUT:
{'Name':'Jenna.','Course':'CS'}
{'Name':'Shan','Course':'Astronomy'}
{'Name':'BILL','Course':'Algebra'}

Looping through a dictionary in python

I am creating a main function which loops through dictionary that has one key for all the values associated with it. I am having trouble because I can not get the dictionary to be all lowercase. I have tried using .lower but to no avail. Also, the program should look at the words of the sentence, determine whether it has seen more of those words in sentences that the user has previously called "happy", "sad", or "neutral", (based on the three dictionaries) and make a guess as to which label to apply to the sentence.
an example output would be like
Sentence: i started screaming incoherently about 15 mins ago, this is B's attempt to calm me down.
0 appear in happy
0 appear in neutral
0 appear in sad
I think this is sad.
You think this is: sad
Okay! Updating.
CODE:
import csv
def read_csv(filename, col_list):
"""This function expects the name of a CSV file and a list of strings
representing a subset of the headers of the columns in the file, and
returns a dictionary of the data in those columns, as described below."""
with open(filename, 'r') as f:
# Better covert reader to a list (items represent every row)
reader = list(csv.DictReader(f))
dict1 = {}
for col in col_list:
dict1[col] = []
# Going in every row of the file
for row in reader:
# Append to the list the row item of this key
dict1[col].append(row[col])
return dict1
def main():
dictx = read_csv('words.csv', ['happy'])
dicty = read_csv('words.csv', ['sad'])
dictz = read_csv('words.csv', ['neutral'])
dictxcounter = 0
dictycounter = 0
dictzcounter = 0
a=str(raw_input("Sentence: ")).split(' ')
for word in a :
for keys in dictx['happy']:
if word == keys:
dictxcounter = dictxcounter + 1
for values in dicty['sad']:
if word == values:
dictycounter = dictycounter + 1
for words in dictz['neutral']:
if word == words:
dictzcounter = dictzcounter + 1
print dictxcounter
print dictycounter
print dictzcounter
Remove this line from your code:
dict1 = dict((k, v.lower()) for k,v in col_list)
It overwrites the dictionary that you built in the loop.

Parsing file structure based on a specific pattern

I have a text file with multiple lines that are in the order of name, location, website, then 'END' to indicate the end of one person's profile, then again name, location, website, and so on.
I need to add the name as a key to a dictionary and the rest (location, website) as its values.
So if I have a file:
name1
location1
website1
END
name2
location2
website2
END
name3
location3
website3
END
the outcome would be:
dict = {'name1': ['location1','website1'],
'name2': ['location2', 'website2'],
'name3': ['location3', 'website3']}
edit: the value would be a list, sorry about that
I have no idea how to approach this, can someone point me in the right direction?
First, there appears to be a misconception about the structure of a dictionary, or, more general, of associative containers in general, underlying this question.
The structure of a dict is, in python-like syntax
{
key : whatever_value1,
another_key: whatever_value2,
# ...
}
Second, if you trim the trailing digit from
name1
location1
website1
you naturally arrive at a struct-like ADT for the END-seperated individual entries of that file, namely
class Whatever(object):
def __init__(self, name, location, website):
self.name = name
self.location = location
self.website = website
(your mileage will vary regarding the name of the class)
Thus what you could use, is a python dict, that maps a key - likely the name attribute of your records - to a (reference to) an instance of that type.
To process the input file, you simple read the file line-wise each time until you encounter END, and then commit a class Whatever to the dictionary using (e.g.) its name as the key.
Use the fact "END" delimits each section, itertools.groupby will split the file using END and we just need to create our key/value pairing as we iterate over the groupby object.
from itertools import groupby
from collections import OrderedDict
with open("test.txt") as f:
d = OrderedDict((next(v), list(v))
for k, v in groupby(map(str.rstrip, f), key=lambda x: x[:3] != "END") if k)
Output:
OrderedDict([('name1', ['location1', 'website1']),
('name2', ['location2', 'website2']),
('name3', ['location3', 'website3'])])
Or using a regular for loop, just change the key each time we hit END storing the lines for each section in a tmp list:
from collections import OrderedDict
with open("test.txt") as f:
# itertools.imap for python2
data = map(str.rstrip, f)
d, tmp, k = OrderedDict(), [], next(data)
for line in data:
if line == "END":
d[k] = tmp
k, tmp = next(data, ""), []
else:
tmp.append(line)
Output will be the same:
OrderedDict([('name1', ['location1', 'website1']),
('name2', ['location2', 'website2']),
('name3', ['location3', 'website3'])])
Both code examples will work for any length sections not just three lines.
It has been answered, but you can shorten things by applying Python's very own dict and list comprehension:
with open(file, 'r') as f:
triplets = [data.strip().split('\n') for data in f.read().strip().split('END') if data]
d = {name: [line, site] for name, line, site in triplets}
You can take a slice of four lines at a time from the file without having to load it all into memory. One way to do this is with islice from itertools.
from itertools import islice
data = dict()
with open('file.path') as input:
while True:
batch = tuple(x.strip() for x in islice(input, 4))
if not batch:
break;
name, location, website, end = batch
data[name] = (location, website)
Verification:
> from pprint import pprint
> pprint(data)
{'name1': ('location1', 'website1'),
'name2': ('location2', 'website2'),
'name3': ('location3', 'website3')}
If you are guaranteed that you will always get this data in this format, then you could do the following:
dict = {}
name = None
location = None
website = None
count = 0:
with open(file, 'r') as f: #where file is the file name
for each in f:
count += 1
if count == 1:
name = each
elif count == 2:
location = each
elif count == 3:
website = each
elif count == 4 and each == 'END':
count = 0 # Forgot to reset to 0 when it got to four... my bad.
dict[name] = (location, website) # Adding to the dictionary as a tuple since you need to have key -> value not key -> value1, value2
else:
print("Well, something went amiss %i %s" % count, each)

Smaller program will print out key from values in dictionary, but stops when incorporated into larger function?

So I have a problem.
I am wanting to do something similar to this, where I call out a value, and it prints out the keys associated with that value. And I can even get it working:
def test(pet):
dic = {'Dog': ['der Hund', 'der Katze'] , 'Cat' : ['der Katze'] , 'Bird': ['der Vogel']}
items = dic.items()
key = dic.keys()
values = dic.values()
for x, y in items:
for item in y:
if item == pet:
print x
However, when I incorporate this same code format into a larger program it stops working:
def movie(movie):
file = open('/Users/Danrex/Desktop/Text.txt' , 'rt')
read = file.read()
list = read.split('\n')
actorList=[]
for item in list:
actorList = actorList + [item.split(',')]
actorDict = dict()
for item in actorList:
if item[0] in actorDict:
actorDict[item[0]].append(item[1])
else:
actorDict[item[0]] = [item[1]]
items = actorDict.items()
for x, y in items:
for item in y:
if item == movie:
print x
I have print(ed) out actorDict, items, x, y, and item and they all seem to follow the same format as the previous code so I can't figure out why this isn't working! So confused. And, please, when you explain it to me do it as if I am a complete idiot, which I probably am.
Cleaning up the code with some more idiomatic Python will sometimes clarify things. This is how I would write it in Python 2.7:
from collections import defaultdict
def movie(movie):
actorDict = defaultdict(list)
movie_info_filename = '/Users/Danrex/Desktop/Text.txt'
with open(movie_info_filename, 'rt') as fin:
for line_item in fin:
split_items = line_item.split(',')
actorDict[split_items[0]].append(split_items[1])
for actor, actor_info in actorDict.items():
for info_item in actor_info:
if info_item == movie:
print actor
In this case, what mostly boiled out were temporary objects created for making the actorDict. defaultdict creates a dictionary-like object that allows one to specify a function to generate the default value for a key that isn't currently present. See the collections documentation for more info.
What it looks like you're trying to do is print out some actor value for each time they are listed with a particular movie in your text file.
If you're going to check more than one movie, make the actorDict once and reference your movies against that existing actorDict. This will save you trips to disk.
from collections import defaultdict
def make_actor_dict():
actorDict = defaultdict(list)
movie_info_filename = '/Users/Danrex/Desktop/Text.txt'
with open(movie_info_filename, 'rt') as fin:
for line_item in fin:
split_items = line_item.split(',')
actorDict[split_items[0]].append(split_items[1])
def movie(movie, actorDict):
for actor, actor_info in actorDict.items():
for info_item in actor_info:
if info_item == movie:
print actor
def main():
actorDict = make_actor_dict()
movie('Star Wars', actorDict)
movie('Indiana Jones', actorDict)
If you only care that the actor was in that movie, you don't have to iterate through the movie list manually, you can just check that movie is in actor_info:
def movie(movie, actorDict):
for actor in actorDict:
if movie in actorDict[actor]:
print actor
Of course, you already figure out that the problem was the movie name not being an exact match to the text you read from the file. If you want to allow less-than-exact matches, you should consider normalizing your movie string and your data strings from the file. The string methods strip() and lower() can be really helpful there.

Categories

Resources