RAKE split sentences function on a Python dictionary - python

How would I be able to apply this function to just the values within a python dictionary:
def split_sentences(text):
"""
Utility function to return a list of sentences.
#param text The text that must be split in to sentences.
"""
sentence_delimiters = re.compile(u'[\\[\\]\n.!?,;:\t\\-\\"\\(\\)\\\'\u2019\u2013]')
sentences = (sentence_delimiters.split(text))
return sentences
The code I have used to create the dictionary from a CSV file input:
with open('second_table.csv', mode='r') as infile:
#Read in the csv file
reader = csv.reader(infile)
#Skip the headers
next(reader, None)
#Iterates through each row to get the key value pairs
mydict = {rows[0]:rows[1] for rows in reader}
The python dictionary looks like so:
{'INC000007581947': '$BREM - CATIAV5 - Catia does not start',
'INC000007581991': '$SPAI - REACT - react',
'INC000007582037': 'access request',
'INC000007582095': '$HAMB - DVOBROWSER - ACCESS RIGHTS',
'INC000007582136': 'SIGLUM issue by opening a REACT request'}

mydict.values() gives you all the values in the dictionary. You can then iterate over them and use your function.
for value in mydict.values():
split_sentences(value)

There are different solutions, depending on if you want to create a new dictionary or simply update the one you already have.
To update the dictionary values:
mydict.update({k : split_sentences(v) for k, v in mydict.items()})
To create a new dictionary:
new_dict = {k : split_sentences(v) for k, v in mydict.items()}

Related

Convert Text to Dict [repetitive key in textline]

I have sample text file as below
asy1 10.20.0.1
byt 192.1.10.100
byt 192.1.10.101
byt 192.1.10.102
hps 10.30.1.50
hps 10.30.1.53
hps 10.30.1.54
hps 10.30.1.55
hps 10.30.1.56
zte 10.100.1.1
zte 10.100.1.2
When i run script below
mydict = {}
with open('devices.txt', 'r') as file:
for line in file:
name, ip = line.split()
mydict[name] = ip.strip()
print(mydict)
It not return all the line/content as per text file.
{'hps': '10.30.1.56', 'zte': '10.100.1.2', 'byt': '192.1.10.102', 'asy1': '10.20.0.1'}
I miss something here...Please advise me. Thanks
In a dictionary the key must be uniques, when you do:
mydict[name] = ip.strip()
you overwrite the value, instead of having a single value for a key you could store a list of values, by doing this:
mydict = {}
with open('devices.txt', 'r') as file:
for line in file:
name, ip = line.split()
if name not in mydict:
mydict[name] = []
mydict[name].append(ip.strip())
print(mydict)
Output
{'asy1': ['10.20.0.1'], 'byt': ['192.1.10.100', '192.1.10.101', '192.1.10.102'], 'hps': ['10.30.1.50', '10.30.1.53', '10.30.1.54', '10.30.1.55', '10.30.1.56'], 'zte': ['10.100.1.1', '10.100.1.2']}
A second alternative would be to use setdefault instead:
mydict.setdefault(name, []).append(ip.strip())
A third option would be to use a defaultdict. If the values are unique consider using a set.
Many of those names are the same, but a dict can only have one value for a given key - if you try to add an IP to the dictionary with key hps, but there's already one in there, it will be overwritten. Maybe use a list instead?

Two dimensional dictionary with list as a value python

I'm writing a simple parser for exercise and I have a problem with saving downloaded data to a dictionary.
data = {"":{"":[]}}
with open("Training_01.txt", "r") as open_file:
text = open_file.read()
text = text.split("\n")
for i in text:
i = i.split("/")
try:
data[i[1]] = {i[2]:[].append(i[3])}
except:
print("Can't")
This is an example of the data that I want to parse:
/a/abbey/sun_aobrvxdhumowzajn.jpg
/a/abbey/sun_apstfzmbeiwbjqvb.jpg
/a/abbey/sun_apyilcssuybumhbu.jpg
/a/abbey/sun_arrohcvipmrghrzh.jpg
/a/abbey/sun_asgeghboyugsatii.jpg
/a/airplane_cabin/sun_blczihbhbntqccux.jpg
/a/airplane_cabin/sun_ayzaayjpoknjvpds.jpg
/a/airplane_cabin/sun_afuoinkozbbhqksk.jpg
/b/butte/sun_asfnwmuzhtjrztns.jpg
/b/butte/sun_ajzkngginlffsozz.jpg
/b/butte/sun_adonkmfgywrhpakt.jpg
/c/cabin/outdoor/sun_atqvmarllxqynnks.jpg
/c/cabin/outdoor/sun_acfcobswmnoyhyfi.jpg
/c/cabin/outdoor/sun_afgjdqosvakljsmc.jpg
I want to create dictionary with "a","b","c" or any letter, as a key (I cant hard code it) with dictionary as a value that contains place where images were taken and list of images.
But when I want to read my saved data I'm getting None as a value
print(data["a"])
Output: {'auto_factory': None}
Try to use defaultdict from python stdlib. It's very convenient in situations like this:
from collections import defaultdict
data = defaultdict(lambda: defaultdict(list))
with open("Training_01.txt", "r") as open_file:
text = open_file.read()
text = text.split("\n")
for line in text:
try:
_, key, subkey, rem = line.split("/", 3)
data[key][subkey].append(rem)
except:
print("Can't")
print(data)
Explanation: the first time you access data (which is a dictionary) with a not existing key, a new entry for such a key will be created. This entry is going to be again a defaultdict, but the first try you access it with a not existing key, again a new (nested this time) entry will be created. And this entry will be a list. So, then you can safely append a new element to such a list.
UPD: Here is a way to implement the same requirement but without defaultdict:
data = {} # just a plain dict
# for ...:
data[key] = data.get(key, {}) # try to access the key, if it doesn't exist - create a new dict entry for such a key
data[key][subkey] = data[key].get(subkey, []) # same as above but for the sub key
data[key][subkey].append(rem) # finally do the job
Because data[i[1]] = {i[2]:[].append(i[3])} create a new 2nd layer dictionary everytime.
This is a possible solution. It is the cleanest solution, but it shows step by step. It creates a new dict and list if the key is not in the last layer dict. But it append value to the list if the dict has the key.
data = {"":{"":[]}}
with open("Training_01.txt", "r") as open_file:
text = open_file.read()
text = text.split("\n")
for i in text:
i = i.split("/")
key_1 = i[1]
key_2 = i[2]
value = i[3]
try:
if key_1 in data.keys(): # Whether the key i[1] is in the 1st layer of the Dict
if key_2 in data[key_1].keys(): # Whether the key i[2] is in the 2nd layer of the Dict
# Yes, Append to the list
data[key_1][key_2].append(value)
else:
# No, Creat a new list
data[key_1][key_2] = [value]
# if i[1] not in the 1st layer, creat a 2nd layer dict with i[2] as key, i[3] as value
else:
data[key_1] = {key_2:[value]}
except:
print("Can't")
print(data['a'])

How to extract from dictionaries to only print certain variables python

I had a tsv file like such
Name School Course
Nicole UVA Biology
Jenna GWU CS
from there,
I only want to print the Name and the Course from that dictionary. How would I go about this?
The code below is how I put the original TSV file into the dictionary above.
import csv
data = csv.reader(open('data.tsv'),delimiter='\t')
fields = data.next()
for row in data:
item = dict(zip(fields, row))
print item
So now I got a dictionary like such:
{'Name':'Nicole.', 'School':'UVA.','Course':'Biology'}
{'Name':'Jenna.', 'School':'GWU','Course':'CS'}
{'Name':'Shan', 'School':'Columbia','Course':'Astronomy'}
{'Name':'BILL', 'School':'UMD.','Course':'Algebra'}
I only want to print the Name and the Course from that dictionary. How would I go about this?
I want to add code so that I'm only printing
{'Name':'Jenna.','Course':'CS'}
{'Name':'Shan','Course':'Astronomy'}
{'Name':'BILL','Course':'Algebra'}
Please guide. Thank You
Just delete the key in the loop
for row in data:
item = dict(zip(fields, row))
del item['School']
print item
The easiest way is to just remove the item['School'] entry before printing
for row in data:
item = dict(zip(fields, row))
del item['School']
print item
But this only works if you know exactly what the dictionary looks like, it has no other entries that you don't want, and it already has a School entry. I would instead recommend that you build a new dictionary out of the old one, only keeping the Name and Course entries
for row in data:
item = dict(zip(fields, row))
item = {k, v for k, v in item.items() if k in ('Name', 'Course')}
print item
Maybe use a DictReader in the first place, and rebuild the row-dict only if key matches a pre-defined list:
import csv
keep = {"Name","Course"}
data = csv.DictReader(open('data.tsv'),delimiter='\t')
for row in data:
row = {k:v for k,v in row.items() if k in keep}
print(row)
result:
{'Course': 'Biology', 'Name': 'Nicole'}
{'Course': 'CS', 'Name': 'Jenna'}
Based on the answer here:
filter items in a python dictionary where keys contain a specific string
print {k:v for k,v in item.iteritems() if "Name" in k or "Course" in k}
You're better off using a library designed for these kinds of tasks (Pandas). A dictionary is great for storing key-value pairs, but it looks like you have spreadsheet-like tabular data, so you should choose a storage type that better reflects the data at hand. You could simply do the following:
import pandas as pd
df = pd.read_csv('myFile.csv', sep = '\t')
print df[['Name','Course']]
You'll find that as you start doing more complicated tasks, it's better to use well written libraries than to cludge something together
replace the your " print(item) " line with the below line.
print(dict(filter(lambda e: e[0]!='School', item)))
OUTPUT:
{'Name':'Jenna.','Course':'CS'}
{'Name':'Shan','Course':'Astronomy'}
{'Name':'BILL','Course':'Algebra'}

Python: dictionary to collection

I have a file with 2 columns:
Anzegem Anzegem
Gijzelbrechtegem Anzegem
Ingooigem Anzegem
Aalst Sint-Truiden
Aalter Aalter
The first column is a town and the second column is the district of that town.
I made a dictionary of that file like this:
def readTowns(text):
input = open(text, 'r')
file = input.readlines()
dict = {}
verzameling = set()
for line in file:
tmp = line.split()
dict[tmp[0]] = tmp[1]
return dict
If I set a variable 'writeTowns' equal to readTowns(text) and do writeTown['Anzegem'], I want to get a collection of {'Anzegem', 'Gijzelbrechtegem', 'Ingooigem'}.
Does anybody know how to do this?
I think you can just create another function that can create appropriate data structure for what you need. Because, at the end you will end up writing code which basically manipulates the dictionary returned by readTowns to generate data as per your requirement. Why not keep the code clean and create another function for that. You Just create a name to list dictionary and you are all set.
def writeTowns(text):
input = open(text, 'r')
file = input.readlines()
dict = {}
for line in file:
tmp = line.split()
dict[tmp[1]] = dict.get(tmp[1]) or []
dict.get(tmp[1]).append(tmp[0])
return dict
writeTown = writeTowns('file.txt')
print writeTown['Anzegem']
And if you are concerned about reading the same file twice, you can do something like this as well,
def readTowns(text):
input = open(text, 'r')
file = input.readlines()
dict2town = {}
town2dict = {}
for line in file:
tmp = line.split()
dict2town[tmp[0]] = tmp[1]
town2dict[tmp[1]] = town2dict.get(tmp[1]) or []
town2dict.get(tmp[1]).append(tmp[0])
return dict2town, town2dict
dict2town, town2dict = readTowns('file.txt')
print town2dict['Anzegem']
You could do something like this, although, please have a look at #ubadub's answer, there are better ways to organise your data.
[town for town, region in dic.items() if region == 'Anzegem']
It sounds like you want to make a dictionary where the keys are the districts and the values are a list of towns.
A basic way to do this is:
def readTowns(text):
with open(text, 'r') as f:
file = input.readlines()
my_dict = {}
for line in file:
tmp = line.split()
if tmp[1] in dict:
my_dict[tmp[1]].append(tmp[0])
else:
my_dict[tmp[1]] = [tmp[0]]
return dict
The if/else blocks can also be achieved using python's defaultdict subclass (docs here) but I've used the if/else statements here for readability.
Also some other points: the variables dict and file are python types so it is bad practice to overwrite these with your own local variable (notice I've changed dict to my_dict in the code above.
If you build your dictionary as {town: district}, so the town is the key and the district is the value, you can't do this easily*, because a dictionary is not meant to be used in that way. Dictionaries allow you to easily find the values associated with a given key. So if you want to find all the towns in a district, you are better of building your dictionary as:
{district: [list_of_towns]}
So for example the district Anzegem would appear as {'Anzegem': ['Anzegem', 'Gijzelbrechtegem', 'Ingooigem']}
And of course the value is your collection.
*you could probably do it by iterating through the entire dict and checking where your matches occur, but this isn't very efficient.

How do I remove everything after a certain character in a value in a dictionary for all dictionaries in a group of dictionaries?

My goal is to remove all characters after a certain character in a value from a set of dictionaries.
I have imported a CSV file from my local machine and printed using the following code:
import csv
with open('C:\Users\xxxxx\Desktop\Aug_raw_Page.csv') as csvfile:
reader=csv.DictReader(csvfile)
for row in reader:
print row
I get a set of directories that look like:
{Pageviews_Aug':'145', 'URL':'http://www.domain.com/#fbid=12345'}
For any directory that includes a value with #fbid, I am trying to removing #fbid and any characters that come after that - for all directories where this is true.
I have tried:
for key,value in row.items():
if key == 'URL' and '#' in value or 'fbid' in value
value.split('#')[0]
print row
Didn't work.
Don't think rsplit will work as it removes only whitespace.
Fastest way I thought about is using rsplit()
out = text.rsplit('#fbid')[0]
Okay, so I'm guessing your problem isn't in removing the text that comes afer the # but in getting to that string.
What is 'row'?
I'm guessing it's a dictionnary with a single 'URL' key, am I wrong?
for key,value in row.items():
if key == 'URL' and '#fbid' in value:
print value.split('#')[0]
I don't quite get the whole format of your data.
If you want to edit a single variable in your dictionary, you don't have to iterate through all the items:
if 'URL' in row.keys():
if '#fbid' in row['URL']:
row['URL'] = row['URL'].rsplit('#fbid')[0]
That should work.
But I really think you should copy an example of your whole data (three items would suffice)
Use a regular expression:
>>> import re
>>> value = 'http://www.domain.com/#fbid=12345'
>>> re.sub(ur'#fbid.*','',value)
'http://www.domain.com/'
>>> value = 'http://www.domain.com/'
>>> re.sub(ur'#fbid.*','',value)
'http://www.domain.com/'
for your code you could do something like this to get the answer in the same format as before:
import csv
with open('C:\Users\xxxxx\Desktop\Aug_raw_Page.csv') as csvfile:
reader=csv.DictReader(csvfile)
for row in reader:
row['URL'] = re.sub(ur'#fbid.*','',row['URL'])
print row
given your sample code, it looks to you that don't work because you don't save the result of value.split('#')[0], do something like
for key,value in row.items():
if key == 'URL' and '#' in value or 'fbid' in value
new_value = value.split('#')[0] # <-- here save the result of split in new_value
row[key] = new_value # <-- here update the dict row
print row # instead of print each time, print it once at the end of the operation
this can be simplify to
if '#fbid' in row['URL']:
row['URL'] = row['URL'].split('#fbid')[0]
because it only check for one key.
example
>>> row={'Pageviews_Aug':'145', 'URL':'http://www.domain.com/#fbid=12345'}
>>> if "#fbid" in row["URL"]:
row["URL"] = row['URL'].split("#fbid")[0]
>>> row
{'Pageviews_Aug': '145', 'URL': 'http://www.domain.com/'}
>>>

Categories

Resources