Reverse a dictionary - python

I'm trying to write a program for school. I'm a biotech major and this is a required course, but I'm not a programmer. So, this is probably easy for many, but difficult for me. Anyway, I have a text file with about 30 lines. Each line has a movie name listed first and actors who appeared in the movie, separated by commas following. Here's what I have so far:
InputName = input('What is the name of the file? ')
File = open(InputName, 'r+').readlines()
ActorLst = []
for line in File:
MovieActLst = line.split(',')
Movie = MovieActLst[0]
Actors = MovieActLst[1:]
for actor in Actors:
if actor not in ActorLst:
ActorLst.append(actor)
MovieDict = {Movie: Actors for x in MovieActLst}
print (MovieDict)
print(len(MovieDict))
Output(shortened):
What is the name of the file? Movies.txt
{"Ocean's Eleven": ['George Clooney', 'Brad Pitt', 'Elliot Gould', 'Casey Affleck', 'Carl Reiner', 'Julia Roberts', 'Angie Dickinson', 'Steve Lawrence', 'Wayne Newton\n']}
1
{'Up in the Air': ['George Clooney', 'Sam Elliott', 'Jason Bateman\n']}
1
{'Iron Man': ['Robert Downey Jr', 'Jeff Bridges', 'Gwyneth Paltrow\n']}
1
{'The Big Lebowski': ['Jeff Bridges', 'John Goodman', 'Julianne Moore', 'Sam Elliott\n']}
1
I have created a dictionary (MovieDict) that contains a movie name for the key and a list of actors for the values. There are about 30 movie names (keys). I need to figure out how to iterate through this dictionary to essentially reverse it. I want a dictionary that contains an actor as a key and the movies they play in as the values.
However, I think I have created a list of dictionaries as well instead of one dictionary and now I have really confused myself! Any suggestions?

Trivial using collections.defaultdict:
from collections import defaultdict
reverse = defaultdict(list)
for movie, actors in MovieDict.items():
for actor in actors:
reverse[actor].append(movie)
Thedefaultdict class differs from dict because when you try to access a key that does not exist, it creates it and sets its value to an item created by the factory passed to the constructor(list in the above code), this avoids catching the KeyError or checking if the key is in the dictionary.
Putting this with Steven Rumbalski's loop results in:
from collections import defaultdict
in_fname = input('What is the name of the file? ')
in_file = open(in_fname, 'r+')
movie_to_actors = {}
actors_to_movie = defaultdict(list)
for line in in_file:
#assumes python3:
movie, *actors = line.strip().split(',')
#python2 you can do actors=line.strip().split(',');movie=actors.pop(0)
movie_to_actors[movie] = list(actors)
for actor in actors:
actors_to_movie[actor].append(movie)
Some explanations about the code above.
Iterating over the lines of a file
File object are iterable, and thus support iteration.
This means you can do:
for line in open('filename'):
instead of:
for line in open('filename').readlines():
(Also in python2 the latter reads all file and then splits the content, while iterating over the file does not read all file into memory[and so you may save a lot of RAM with big files]).
Tuple unpacking
To "unpack" a sequence into different variables you can use the "tuple unpacking" syntax:
>>> a,b = (0,1)
>>> a
0
>>> b
1
The syntax was extended to allow gathering of a variable number of values into a variable.
For example:
>>> head, *tail = (1, 2, 3, 4, 5)
>>> head
1
>>> tail
[2, 3, 4, 5]
>>> first, *mid, last = (0, 1, 2, 3, 4, 5)
>>> first
0
>>> mid
[1, 2, 3, 4]
>>> last
5
You can have only one "starred expression", so this does not work:
>>> first, *mid, center, *mid2, last =(0,1,2,3,4,5)
File "<stdin>", line 1
SyntaxError: two starred expressions in assignment
So basically when you have a star on the left hand side, python puts there everything that it wasn't able to put in other variables. Notice that this mean that the variable may refer to an empty list:
>>> first, *mid, last = (0,1)
>>> first
0
>>> mid
[]
>>> last
1
Using defaultdict
The defaultdict allows you to give a default value to non existent keys.
The class accepts a callable(~function or class) as parameter and calls it to build a default value everytime that it's required:
>>> def factory():
... print("Called!")
... return None
...
>>> mydict = defaultdict(factory)
>>> mydict['test']
Called!

reverse={}
keys=MovieDict.keys()
for key in keys:
val=MovieDict[key]
for actor in val:
try:
reverse[actor]=reverse[actor].append(actor)
except KeyError:
reverse[actor]=[]
reverse[actor]=reverse[actor].append(actor)
print(reverse)#retarded python 3 format! :)
That should do it.

Programming is about abstracting things, so try to write code in a way that doesn't depend on the specific problem. For example:
def csv_to_dict(seq, separator=','):
dct = {}
for item in seq:
data = [x.strip() for x in item.split(separator)]
if len(data) > 1:
dct[data[0]] = data[1:]
return dct
def flip_dict(dct):
rev = {}
for key, vals in dct.items():
for val in vals:
if val not in rev:
rev[val] = []
rev[val].append(key)
return rev
Note how these two functions don't "know" anything about "input files", "actors", "movies" and so on, but still are able to solve your problem with two lines of code:
with open("movies.txt") as fp:
print(flip_dict(csv_to_dict(fp)))

InputName = input('What is the name of the file? ')
with open(InputName, 'r') as f:
actors_by_movie = {}
movies_by_actor = {}
for line in f:
movie, *actors = line.strip().split(',')
actors_by_movie[movie] = actors
for actor in actors:
movies_by_actor.setdefault(actor, []).append(movie)

Per your naming conventions:
from collections import defaultdict
InputName = input('What is the name of the file? ')
File = open(InputName, 'rt').readlines()
ActorLst = []
ActMovieDct = defaultdict(list)
for line in File:
MovieActLst = line.strip().split(',')
Movie = MovieActLst[0]
Actors = MovieActLst[1:]
for actor in Actors:
ActMovieDct[actor].append(Movie)
# print results
for actor, movies in ActMovieDct.items():
print(actor, movies)

Related

Python - How do i build a dictionary from a text file?

for the class data structures and algorithms at Tilburg University i got a question in an in class test:
build a dictionary from testfile.txt, with only unique values, where if a value appears again, it should be added to the total sum of that productclass.
the text file looked like this, it was not a .csv file:
apples,1
pears,15
oranges,777
apples,-4
oranges,222
pears,1
bananas,3
so apples will be -3 and the output would be {"apples": -3, "oranges": 999...}
in the exams i am not allowed to import any external packages besides the normal: pcinput, math, etc. i am also not allowed to use the internet.
I have no idea how to accomplish this, and this seems to be a big problem in my development of python skills, because this is a question that is not given in a 'dictionaries in python' video on youtube (would be to hard maybe), but also not given in a expert course because there this question would be to simple.
hope you guys can help!
enter code here
from collections import Counter
from sys import exit
from os.path import exists, isfile
##i did not finish it, but wat i wanted to achieve was build a list of the
strings and their belonging integers. then use the counter method to add
them together
## by splitting the string by marking the comma as the split point.
filename = input("filename voor input: ")
if not isfile(filename):
print(filename, "bestaat niet")
exit()
keys = []
values = []
with open(filename) as f:
xs = f.read().split()
for i in xs:
keys.append([i])
print(keys)
my_dict = {}
for i in range(len(xs)):
my_dict[xs[i]] = xs.count(xs[i])
print(my_dict)
word_and_integers_dict = dict(zip(keys, values))
print(word_and_integers_dict)
values2 = my_dict.split(",")
for j in values2:
print( value2 )
the output becomes is this:
[['schijndel,-3'], ['amsterdam,0'], ['tokyo,5'], ['tilburg,777'], ['zaandam,5']]
{'zaandam,5': 1, 'tilburg,777': 1, 'amsterdam,0': 1, 'tokyo,5': 1, 'schijndel,-3': 1}
{}
so i got the dictionary from it, but i did not separate the values.
the error message is this:
28 values2 = my_dict.split(",") <-- here was the error
29 for j in values2:
30 print( value2 )
AttributeError: 'dict' object has no attribute 'split'
I don't understand what your code is actually doing, I think you don't know what your variables are containing, but this is an easy problem to solve in Python. Split into a list, split each item again, and count:
>>> input = "apples,1 pears,15 oranges,777 apples,-4 oranges,222 pears,1 bananas,3"
>>> parts = input.split()
>>> parts
['apples,1', 'pears,15', 'oranges,777', 'apples,-4', 'oranges,222', 'pears,1', 'bananas,3']
Then split again. Behold the list comprehension. This is an idiomatic way to transform a list to another in python. Note that the numbers are strings, not ints yet.
>>> strings = [s.split(',') for s in strings]
>>> strings
[['apples', '1'], ['pears', '15'], ['oranges', '777'], ['apples', '-4'], ['oranges', '222'], ['pears', '1'], ['bananas', '3']]
Now you want to iterate over pairs, and sum all the same fruits. This calls for a dict:
>>> result = {}
>>> for fruit, countstr in pairs:
... if fruit not in result:
... result[fruit] = 0
... result[fruit] += int(countstr)
>>> result
{'pears': 16, 'apples': -3, 'oranges': 999, 'bananas': 3}
This pattern of adding an element if it doesn't exist comes up frequently. You should checkout defaultdict in the collections module. If you use that, you don't even need the if.
Let's walk through what you need to do to. First, check if the file exists and read the contents to a variable. Second, parse each line - you need to split the line on the comma, convert the number from a string to an integer, and then pass the values to a dictionary. In this case I would recommend using defaultdict from collections, but we can also do it with a standard dictionary.
from os.path import exists, isfile
from collections import defaultdict
filename = input("filename voor input: ")
if not isfile(filename):
print(filename, "bestaat niet")
exit()
# this reads the file to a list, removing newline characters
with open(filename) as f:
line_list = [x.strip() for x in f]
# create a dictionary
my_dict = {}
# update the value in the dictionary if it already exists,
# otherwise add it to the dictionary
for line in line_list:
k, v_str = line.split(',')
if k in my_dict:
my_dict[k] += int(v_str)
else:
my_dict[k] = int(v_str)
# print the dictionary
table_str = '{:<30}{}'
print(table_str.format('Item','Count'))
print('='*35)
for k,v in sorted(my_dict.item()):
print(table_str.format(k,v))

how do i skip part of each line while reading from file in python

I am new to python and not able to figure out how do i accomplish this.
Suppose file.txt contains following
iron man 1
iron woman 2
man ant 3
woman wonder 4
i want to read this file into dictionary in the below format
dict = { 'iron' : ['man', 'woman'], 'man' : ['ant'], 'woman' : ['wonder'] }
That is the last part in each line being omitted while writing to dictionary.
My second question is can i read this file to dictionary in a way such that
dict2 = { 'iron' : [('man', '1'), ('woman', '2')], 'man' : [('ant', '3')], 'woman' : [('wonder', '4')] } .
That is key iron will have 2 values but these 2 values being individual tuple.
Second question is for implementation of uniform cost search so that i can access iron child man and woman and cost for these children being 1 and 2
Thank you in advance
Here you go both the parts of your question...Being new to python just spend some time with it like your grilfriend and you will know how it behaves
with open ('file.txt','r') as data:
k = data.read()
lst = k.splitlines()
print lst
dic = {}
dic2 = {}
for i in lst:
p=i.split(" ")
if str(p[0]) in dic.keys():
dic[p[0]].append(p[2])
dic2[p[0]].append((p[1],p[2]))
else:
dic[p[0]] = [p[1]]
dic2[p[0]] = [(p[1],p[2])]
print dic
print dic2
you can use collections.defaultdict:
you 1st answer
import collections.defaultdict
my_dict = cillections.defaultdict(list)
with open('your_file') as f:
for line in f:
line = line.strip().split()
my_dict[line[0]].append(line[1])
print my_dict
Taking above example, you might able to solve 2nd question.
For the first question, what you want to do is read each line, split it using white spaces and use a simple if rule to control how you add it to your dictionary:
my_dict = {}
with open('file.txt', 'r') as f:
content = f.read()
for line in content.split('\n'):
item = line.split()
if item[0] not in my_dict:
my_dict[item[0]] = [item[1]]
else:
my_dict[item[0]].append(item[1])
The second question is pretty much the same, only with a slightly different assignment:
my_dict2 = {}
for line in content.split('\n'):
item = line.split()
if item[0] not in my_dict:
my_dict[item[0]] = [(item[1], item[2])]
else:
my_dict[item[0]].append((item[1], item[2]))

Creating a program that compares two lists

I am trying to create a program that checks whether items from one list are not in another. It keeps returning lines saying that x value is not in the list. Any suggestions? Sorry about my code, it's quite sloppy.
Searching Within an Array
Putting .txt files into arrays
with open('Barcodes', 'r') as f:
barcodes = [line.strip() for line in f]
with open('EAN Staging', 'r') as f:
EAN_staging = [line.strip() for line in f]
Arrays
list1 = barcodes
list2 = EAN_staging
Main Code
fixed = -1
for x in list1:
for variable in list1: # Moves along each variable in the list, in turn
if list1[fixed] in list2: # If the term is in the list, then
fixed = fixed + 1
location = list2.index(list1[fixed]) # Finds the term in the list
print ()
print ("Found", variable ,"at location", location) # Prints location of terms
Instead of lists, read the files as sets:
with open('Barcodes', 'r') as f:
barcodes = {line.strip() for line in f}
with open('EAN Staging', 'r') as f:
EAN_staging = {line.strip() for line in f}
Then all you need to do is to calculate the symmetric difference between them:
diff = barcodes - EAN_staging # or barcodes.difference(EAN_stagin)
An extracted example:
a = {1, 2, 3}
b = {3, 4, 5}
print(a - b)
>> {1, 2, 4, 5} # 1, 2 are in a but in b
Note that if you are operating with sets, information about how many times an element is present will be lost. If you care about situations when an element is present in barcodes 3 times, but only 2 times in EAN_staging, you should use Counter from collections.
Your code doesn't seem to quite answer your question. If all you want to do is see which elements aren't shared, I think sets are the way to go.
set1 = set(list1)
set2 = set(list2)
in_first_but_not_in_second = set1.difference(set2) # outputs a set
not_in_both = set1.symmetric_difference(set2) # outputs a set

Loop through entries in a list and create new list

I have a list of strings that looks like that
name=['Jack','Sam','Terry','Sam','Henry',.......]
I want to create a newlist with the logic shown below. I want to go to every entry in name and assign it a number if the entry is seen for the first time. If it is being repeated(as in the case with 'Sam') I want to assign it the corresponding number, include it in my newlist and continue.
newlist = []
name[1] = 'Jack'
Jack = 1
newlist = ['Jack']
name[2] = 'Sam'
Sam = 2
newlist = ['Jack','Sam']
name[3] = 'Terry'
Terry = 3
newlist = ['Jack','Sam','Terry']
name[4] = 'Sam'
Sam = 2
newlist = ['Jack','Sam','Terry','Sam']
name[5] = 'Henry'
Henry = 5
newlist = ['Jack','Sam','Terry','Sam','Henry']
I know this can be done with something like
u,index = np.unique(name,return_inverse=True)
but for me it is important to loop through the individual entries of the list name and keep the logic above. Can someone help me with this?
Try using a dict and checking if keys are already paired to a value:
name = ['Jack','Sam','Terry','Sam','Henry']
vals = {}
i = 0
for entry in name:
if entry not in vals:
vals[entry] = i + 1
i += 1
print vals
Result:
{'Henry': 5, 'Jack': 1, 'Sam': 2, 'Terry': 3}
Elements can be accessed by "index" (read: key) just like you would do for a list, except the "index" is whatever the key is; in this case, the keys are names.
>>> vals['Henry']
5
EDIT: If order is important, you can enter the items into the dict using the number as the key: in this way, you will know which owner is which based on their number:
name = ['Jack','Sam','Terry','Sam','Henry']
vals = {}
i = 0
for entry in name:
#Check if entry is a repeat
if entry not in name[0:i]:
vals[i + 1] = entry
i += 1
print (vals)
print (vals[5])
This code uses the order in which they appear as the key. To make sure we don't overwrite or create duplicates, it checks if the current name has appeared before in the list (anywhere from 0 up to i, the current index in the name list).
In this way, it is still in the "sorted order" which you want. Instead of accessing items by the name of the owner you simply index by their number. This will give you the order you desire from your example.
Result:
>>> vals
{1: 'Jack', 2: 'Sam', 3: 'Terry', 5: 'Henry'}
>>> vals[5]
'Henry'
If you really want to create variable.By using globals() I am creating global variable .If you want you can create local variable using locals()
Usage of globals()/locals() create a dictionary which is the look up table of the variable and their values by adding key and value you are creating a variable
lists1 = ['Jack','Sam','Terry','Sam','Henry']
var = globals()
for i,n in enumerate(nl,1):
if n not in var:
var[n] = [i]
print var
{'Jack':1,'Sam': 2,'Terry': 3, 'Henry':5}
print Jack
1
If order of the original list is key, may I suggest two data structures, a dictionary and a newlist
d = {}
newlist = []
for i,n in enumerate(nl):
if n not in d:
d[n] = [i+1]
newlist.append({n: d[n]})
newlist will return
[{'Jack': [1]}, {'Sam': [2]}, {'Terry': [3]}, {'Sam': [2]}, {'Henry': [5]}]
to walk it:
for names in newlist:
for k, v in names.iteritems():
print('{} is number {}'.format(k, v))
NOTE: This does not make it easy to lookup the number based on the name as other suggested above. That would require more data structure logic. This does however let you keep the original list order, but keep track of the time the name was first found essentially.
Edit: Since order is important to you. Use orderedDict() from the collections module.
Use a dictionary. Iterate over your list with a for loop and then check to see if you have the name in the dictionary with a if statement. enumerate gives you the index number of your name, but keep in mind that index number start from 0 so in accordance to your question we append 1 to the index number giving it the illusion that we begin indexing from 1
import collections
nl = ['Jack','Sam','Terry','Sam','Henry']
d = collections.OrderedDict()
for i,n in enumerate(nl):
if n not in d:
d[n] = [i+1]
print d
Output:
OrderedDict([('Jack', [1]), ('Sam', [2]), ('Terry', [3]), ('Henry', [5])])
EDIT:
The ordered dict is still a dictionary. So you can use .items() to get the key value pairs as tuples. the number is essectially a list so you can do this:
for i in d.items():
print '{} = {}'.format(i[0],i[1][0]) #where `i[1]` gives you the number at that position in the tuple, then the `[0]` gives you the first element of the list.
Output:
Jack = 1
Sam = 2
Terry = 3
Henry = 5

Using a list as key-value pair to be inserted

Hello I have a list that I wish to insert into a dictionary - however not each element a new element in the dictionary - the list itself is 2 items long and should be used as "key-value" pair.
Or (as knowing python there are dozens of ways to do something so maybe this isn't even necessary). The base problem is that I wish to split a string into 2 parts around a delimiter and use the left as "key" and the right as "value":
for line in file:
if "=" in line:
tpair = line.split("=",1)
constantsMap.update(tpair)
Of course I could do a manual split like:
for line in file:
if "=" in line:
p = line.find("=")
constantsMap[line[:p]] = line[p+1:]
But that doesn't seem to be idiomaticcally "python", so I was wondering if there's a more clean way?
You can use sequence unpacking here:
key,val = line.split("=", 1)
constantsMap[key] = val
See a demonstration below:
>>> line = "a=1"
>>> constantsMap = {}
>>> key,val = line.split("=", 1)
>>> constantsMap[key] = val
>>> constantsMap
{'a': '1'}
>>>

Categories

Resources