How to extract from dictionaries to only print certain variables python - python

I had a tsv file like such
Name School Course
Nicole UVA Biology
Jenna GWU CS
from there,
I only want to print the Name and the Course from that dictionary. How would I go about this?
The code below is how I put the original TSV file into the dictionary above.
import csv
data = csv.reader(open('data.tsv'),delimiter='\t')
fields = data.next()
for row in data:
item = dict(zip(fields, row))
print item
So now I got a dictionary like such:
{'Name':'Nicole.', 'School':'UVA.','Course':'Biology'}
{'Name':'Jenna.', 'School':'GWU','Course':'CS'}
{'Name':'Shan', 'School':'Columbia','Course':'Astronomy'}
{'Name':'BILL', 'School':'UMD.','Course':'Algebra'}
I only want to print the Name and the Course from that dictionary. How would I go about this?
I want to add code so that I'm only printing
{'Name':'Jenna.','Course':'CS'}
{'Name':'Shan','Course':'Astronomy'}
{'Name':'BILL','Course':'Algebra'}
Please guide. Thank You

Just delete the key in the loop
for row in data:
item = dict(zip(fields, row))
del item['School']
print item

The easiest way is to just remove the item['School'] entry before printing
for row in data:
item = dict(zip(fields, row))
del item['School']
print item
But this only works if you know exactly what the dictionary looks like, it has no other entries that you don't want, and it already has a School entry. I would instead recommend that you build a new dictionary out of the old one, only keeping the Name and Course entries
for row in data:
item = dict(zip(fields, row))
item = {k, v for k, v in item.items() if k in ('Name', 'Course')}
print item

Maybe use a DictReader in the first place, and rebuild the row-dict only if key matches a pre-defined list:
import csv
keep = {"Name","Course"}
data = csv.DictReader(open('data.tsv'),delimiter='\t')
for row in data:
row = {k:v for k,v in row.items() if k in keep}
print(row)
result:
{'Course': 'Biology', 'Name': 'Nicole'}
{'Course': 'CS', 'Name': 'Jenna'}

Based on the answer here:
filter items in a python dictionary where keys contain a specific string
print {k:v for k,v in item.iteritems() if "Name" in k or "Course" in k}

You're better off using a library designed for these kinds of tasks (Pandas). A dictionary is great for storing key-value pairs, but it looks like you have spreadsheet-like tabular data, so you should choose a storage type that better reflects the data at hand. You could simply do the following:
import pandas as pd
df = pd.read_csv('myFile.csv', sep = '\t')
print df[['Name','Course']]
You'll find that as you start doing more complicated tasks, it's better to use well written libraries than to cludge something together

replace the your " print(item) " line with the below line.
print(dict(filter(lambda e: e[0]!='School', item)))
OUTPUT:
{'Name':'Jenna.','Course':'CS'}
{'Name':'Shan','Course':'Astronomy'}
{'Name':'BILL','Course':'Algebra'}

Related

Print certain amount of tuples in a list without name

so my exercise was to print 10 most common words in a text file.
Assuming I opened the file and created a dictionary that contains seperated words with indexes.
Normally, I do this:
li=list()
for key,value in d.items():
tpl=(value,key)
li.append(tpl)
li=sorted(li,reverse=True)
for key,value in li[:10]:
print('Ten most common words: ',value,key)
But prof gave me a single line of code that can replace almost all those lines:
print(sorted([(value,key) for key,value in d.items()],reverse=True))
However I can't find a way to print only 10 tuples since the list has no name, I can't use the for loop to print. Can you help me out?
Separate the list creation from the print():
li = sorted([(value, key) for key, value in d.items()], reverse=True)
Now you can iterate through li.
for item in li[:10]:
print(item)

RAKE split sentences function on a Python dictionary

How would I be able to apply this function to just the values within a python dictionary:
def split_sentences(text):
"""
Utility function to return a list of sentences.
#param text The text that must be split in to sentences.
"""
sentence_delimiters = re.compile(u'[\\[\\]\n.!?,;:\t\\-\\"\\(\\)\\\'\u2019\u2013]')
sentences = (sentence_delimiters.split(text))
return sentences
The code I have used to create the dictionary from a CSV file input:
with open('second_table.csv', mode='r') as infile:
#Read in the csv file
reader = csv.reader(infile)
#Skip the headers
next(reader, None)
#Iterates through each row to get the key value pairs
mydict = {rows[0]:rows[1] for rows in reader}
The python dictionary looks like so:
{'INC000007581947': '$BREM - CATIAV5 - Catia does not start',
'INC000007581991': '$SPAI - REACT - react',
'INC000007582037': 'access request',
'INC000007582095': '$HAMB - DVOBROWSER - ACCESS RIGHTS',
'INC000007582136': 'SIGLUM issue by opening a REACT request'}
mydict.values() gives you all the values in the dictionary. You can then iterate over them and use your function.
for value in mydict.values():
split_sentences(value)
There are different solutions, depending on if you want to create a new dictionary or simply update the one you already have.
To update the dictionary values:
mydict.update({k : split_sentences(v) for k, v in mydict.items()})
To create a new dictionary:
new_dict = {k : split_sentences(v) for k, v in mydict.items()}

CSV selecting multiple columns

I have this CSV file whereby it contain lots of information. I have coded a program which are able to count what are inside the columns of 'Feedback' and the frequency of it.
My problem now is that, after I have produced the items inside 'Feedback' columns, I want to specifically bring out another columns which tally to the 'Feedback' columns.
Some example of the CSV file is as follow:
Feedback Description Status
Others Fire Proct Complete
Complaints Grass Complete
Compliment Wall Complete
... ... ...
With the frequency of the 'Feedback' columns, I now want to show, let's say if I select 'Complaints'. Then I want everything that tally with 'Complaints' from Description to show up.
Something like this:
Complaints Grass
Complaints Table
Complaints Door
... ...
Following is the code I have so far:
import csv, sys, os, shutil
from collections import Counter
reader = csv.DictReader(open('data.csv'))
result = {}
for row in reader:
for column, value in row.iteritems():
result.setdefault(column,[]).append(value)
list = []
for items in result['Feedback']:
if items == '':
items = items
else:
newitem = items.upper()
list.append(newitem)
unique = Counter(list)
for k, v in sorted(unique.items()):
print k.ljust(30),' : ', v
This is only the part whereby it count what's inside the 'Feedback' Columns and the frequency of it.
You could also store a defaultdict() holding a list of entries for each category as follows:
import csv
from collections import Counter, defaultdict
with open('data.csv', 'rb') as f_csv:
csv_reader = csv.DictReader(f_csv)
result = {}
feedback = defaultdict(list)
for row in csv_reader:
for column, value in row.iteritems():
result.setdefault(column, []).append(value)
feedback[row['Feedback'].upper()].append(row['Description'])
data = []
for items in result['Feedback']:
if items == '':
items = items
else:
newitem = items.upper()
data.append(newitem)
unique = Counter(data)
for k, v in sorted(unique.items()):
print "{:20} : {:5} {}".format(k, v, ', '.join(feedback[k]))
This would display your output as:
COMPLAINTS : 2 Grass, Door
COMPLIMENT : 2 Wall, Table
OTHERS1 : 1 Fire Proct
Or on multiple lines if instead you used:
print "{:20} : {:5}".format(k, v)
print ' ' + '\n '.join(feedback[k])
When using the csv library, you should open your file with rb in Python 2.x. Also avoid using list as a variable name as this overwrites the Python list() function.
Note: It is easier to use format() when printing aligned data.
You can do it with the code at the very end of this snippet, which is derived from the code in your question. I modified how the file is read by using a with statement which insures that it is closed when it's no longer needed. I also changed the name of the variable named list you had. because it hides the name of the built-in type and is considered by most to be a poor programming practice. See PEP 8 - Style Guide for Python Code for more on this and related topics.
For testing purposes, I also added a couple more rows of 'Complaints' type of 'Feedback' items.
import csv
from collections import Counter
with open('information.csv') as csvfile:
result = {}
for row in csv.DictReader(csvfile):
for column, value in row.iteritems():
result.setdefault(column, []).append(value)
items = [item.upper() for item in result['Feedback']]
unique = Counter(items)
for k, v in sorted(unique.items()):
print k.ljust(30), ' : ', v
print
for i, feedback in enumerate(result['Feedback']):
if feedback == 'Complaints':
print feedback, ' ', result['Description'][i]
Output:
COMPLAINTS : 3
COMPLIMENT : 1
OTHERS : 1
Complaints Grass
Complaints Table
Complaints Door

python 2.7:iterate dictionary and map values to a file

I have a list of dictionaries which I build from .xml file:
list_1=[{'lat': '00.6849879', 'phone': '+3002201600', 'amenity': 'restaurant', 'lon': '00.2855850', 'name': 'Telegraf'},{'lat': '00.6850230', 'addr:housenumber': '6', 'lon': '00.2844493', 'addr:city': 'XXX', 'addr:street': 'YYY.'},{'lat': '00.6860304', 'crossing': 'traffic_signals', 'lon': '00.2861978', 'highway': 'crossing'}]
My aim is to build a text file with values (not keys) in such order:
lat,lon,'addr:street','addr:housenumber','addr:city','amenity','crossing' etc...
00.6849879,00.2855850, , , ,restaurant, ,'\n'00.6850230,00.2844493,YYY,6,XXX, , ,'\n'00.6860304,00.2861978, , , , ,traffic_signals,'\n'
if value not exists there should be empty space.
I tried to loop with for loop:
for i in list_1:
line= i['lat'],i['lon']
print line
Problem occurs if I add value which does not exist in some cases:
for i in list_1:
line= i['lat'],i['lon'],i['phone']
print line
Also tried to loop and use map() function, but results seems not correct:
for i in list_1:
line=map(lambda x1,x2:x1+','+x2+'\n',i['lat'],i['lon'])
print line
Also tried:
for i in list_1:
for k,v in i.items():
if k=='addr:housenumber':
print v
This time I think there might be too many if/else conditions to write.
Seems like solutions is somewhere close. But can't figure out the solution and its optimal way.
I would look to use the csv module, in particular DictWriter. The fieldnames dictate the order in which the dictionary information is written out. Actually writing the header is optional:
import csv
fields = ['lat','lon','addr:street','addr:housenumber','addr:city','amenity','crossing',...]
with open('<file>', 'w') as f:
writer = csv.DictWriter(f, fields)
#writer.writeheader() # If you want a header
writer.writerows(list_1)
If you really didn't want to use csv module then you can simple iterate over the list of the fields you want in the order you want them:
fields = ['lat','lon','addr:street','addr:housenumber','addr:city','amenity','crossing',...]
for row in line_1:
print(','.join(row.get(field, '') for field in fields))
If you can't or don't want to use csv you can do something like
order = ['lat','lon','addr:street','addr:housenumber',
'addr:city','amenity','crossing']
for entry in list_1:
f.write(", ".join([entry.get(x, "") for x in order]) + "\n")
This will create a list with the values from the entry map in the order present in the order list, and default to "" if the value is not present in the map.
If your output is a csv file, I strongly recommend using the csv module because it will also escape values correctly and other csv file specific things that we don't think about right now.
Thanks guys
I found the solution. Maybe it is not so elegant but it works.
I made a list of node keys look for them in another list and get values.
key_list=['lat','lon','addr:street','addr:housenumber','amenity','source','name','operator']
list=[{'lat': '00.6849879', 'phone': '+3002201600', 'amenity': 'restaurant', 'lon': '00.2855850', 'name': 'Telegraf'},{'lat': '00.6850230', 'addr:housenumber': '6', 'lon': '00.2844493', 'addr:city': 'XXX', 'addr:street': 'YYY.'},{'lat': '00.6860304', 'crossing': 'traffic_signals', 'lon': '00.2861978', 'highway': 'crossing'}]
Solution:
final_list=[]
for i in list:
line=str()
for ii in key_list:
if ii in i:
x=ii
line=line+str(i[x])+','
else:
line=line+' '+','
final_list.append(line)

How do I remove everything after a certain character in a value in a dictionary for all dictionaries in a group of dictionaries?

My goal is to remove all characters after a certain character in a value from a set of dictionaries.
I have imported a CSV file from my local machine and printed using the following code:
import csv
with open('C:\Users\xxxxx\Desktop\Aug_raw_Page.csv') as csvfile:
reader=csv.DictReader(csvfile)
for row in reader:
print row
I get a set of directories that look like:
{Pageviews_Aug':'145', 'URL':'http://www.domain.com/#fbid=12345'}
For any directory that includes a value with #fbid, I am trying to removing #fbid and any characters that come after that - for all directories where this is true.
I have tried:
for key,value in row.items():
if key == 'URL' and '#' in value or 'fbid' in value
value.split('#')[0]
print row
Didn't work.
Don't think rsplit will work as it removes only whitespace.
Fastest way I thought about is using rsplit()
out = text.rsplit('#fbid')[0]
Okay, so I'm guessing your problem isn't in removing the text that comes afer the # but in getting to that string.
What is 'row'?
I'm guessing it's a dictionnary with a single 'URL' key, am I wrong?
for key,value in row.items():
if key == 'URL' and '#fbid' in value:
print value.split('#')[0]
I don't quite get the whole format of your data.
If you want to edit a single variable in your dictionary, you don't have to iterate through all the items:
if 'URL' in row.keys():
if '#fbid' in row['URL']:
row['URL'] = row['URL'].rsplit('#fbid')[0]
That should work.
But I really think you should copy an example of your whole data (three items would suffice)
Use a regular expression:
>>> import re
>>> value = 'http://www.domain.com/#fbid=12345'
>>> re.sub(ur'#fbid.*','',value)
'http://www.domain.com/'
>>> value = 'http://www.domain.com/'
>>> re.sub(ur'#fbid.*','',value)
'http://www.domain.com/'
for your code you could do something like this to get the answer in the same format as before:
import csv
with open('C:\Users\xxxxx\Desktop\Aug_raw_Page.csv') as csvfile:
reader=csv.DictReader(csvfile)
for row in reader:
row['URL'] = re.sub(ur'#fbid.*','',row['URL'])
print row
given your sample code, it looks to you that don't work because you don't save the result of value.split('#')[0], do something like
for key,value in row.items():
if key == 'URL' and '#' in value or 'fbid' in value
new_value = value.split('#')[0] # <-- here save the result of split in new_value
row[key] = new_value # <-- here update the dict row
print row # instead of print each time, print it once at the end of the operation
this can be simplify to
if '#fbid' in row['URL']:
row['URL'] = row['URL'].split('#fbid')[0]
because it only check for one key.
example
>>> row={'Pageviews_Aug':'145', 'URL':'http://www.domain.com/#fbid=12345'}
>>> if "#fbid" in row["URL"]:
row["URL"] = row['URL'].split("#fbid")[0]
>>> row
{'Pageviews_Aug': '145', 'URL': 'http://www.domain.com/'}
>>>

Categories

Resources