How to count words in a list of lists? - python

I have a list of lists like this: [['Hello', 'Hi'], ["Hola", "Hi", "Ciao"], ["Ciao", "Hi"]].
What I want to do is to create a dictionary where the keys are every word from my list of lists and the values are the word count of words that appear only in the small lists where my key appeared.
Desired output:
dict = {'Hello': {'Hi':1}, 'Hi': {'Hello':1, 'Hola':1, 'Ciao':2},
'Hola':{'Hi':1, 'Ciao':1}, 'Ciao':{'Hola':1, 'Hi':2}}
Note: I know how to use Python and how to deal with data structures, but I am struggling with
the algorithm. I mean how many loops should I have and what my conditions should be?

Consider just one of your lists: ['Hello', 'Hi']. This produces two "pairs" in your output (Hi -> Hello) and (Hello -> Hi). To process one of the lists we're looking at something like:
for x in l:
for y in l:
if x != y:
[update the count of x -> y]
(You could use itertools.combinations or itertools.permutations (depending on preference) to turn this into one loop.)
So how should we store the counts? As noted, the inner dictionaries are calling out to be instances of Counter, since it's basically a dictionary which defaults to 0 if the key is missing (meaning you don't have to check for the presence of the key, you can just increment the value). It would be convenient if your outer dictionary could be a dictionary that defaults to an empty Counter, which you can accomplish with defaultdict.
I'll leave it to you to write the code to update the counts and process all the lists, but hopefully this is enough to put you on the right path. (Both defaultdict and Counter are in collections. I initially found the defaultdict documentation kind of confusion--you'd create one with counts = defaultdict(Counter).)

Related

How to convert the list into a list of dictionaries?

I have a list like this.
ls = ['Size:10,color:red,', 'Size:10,color: blue,']
I want to convert the list into this format.
[{'Size':'10','color':'red'}, {'Size':'10','color': 'blue'}]
What I have tried is:
[dict([pair.split(":", 1)]) for pair in ls]
# It gave me output like this.
[{'Size': '10,color:red,'}, {'Size': '10,color: blue,'}]
But this method works if the list is like this ['color:blue,'] but didn't worked properly with the above list.
We can see that for pair in ls in your list comprehension is already doubtful, because elements of ls are not pairs. Each element actually contains a sequence of pairs.
There will be two loops needed here, one to iterate the outer list, and then another one to iterate within each value, since those values are actually strings consisting of multiple fields.
While this is possible with a nested list comprehension, it will be easier (and more readable) if you break the problem down into simpler parts rather than trying to fit it all into one-line.
result = []
for text in ls:
d = {}
pairs = text.strip(",").split(",")
for pair in pairs:
key, val = pair.split(":")
d[key] = val.strip()
result.append(d)

Python 3 - list inside dictionary, if list contains a 3rd item do this

I am very new to python so sorry if this is a silly question. I have looked around but any example i have found i havnt been able to apply to my goal.
I have a dictionary with lists inside.
myDict = {'list1': ['item1', 'item2'], 'list2': ['item1', 'item2',
'item3']}
I am taking user input to decide which list to read. I want to print something with each item in the list but my lists do not contain the same amount of items so i think i need an if statement to say something like:
if list selected has 3 items do this?
If you want to print something with each element of your dictionary you can do
for k, v in myDict.items():
for x in v:
print(x)
This way you don't need to worry about the size.
Now, if you care about finding a specific list, i.e. the one with 3 elements, you can do:
if len(myDict[chosen_list])>2:
print('found list with more than 2 items')

How to compare a python dictionary key with a part of another dictionary's key? something like a .contains() function

Most of my small-scale project worked fine using dictionaries, so changing it now would basically mean starting over.
Let's say I have two different dictionaries(dict1 and dict2).
One being:
{'the dog': 3, 'dog jumped': 4, 'jumped up': 1, 'up onto': 8, 'onto me': 13}
Second one being:
{'up': 12, 'dog': 22, 'jumped': 33}
I want to find wherever the first word of the first dictionary is equal to the word of the second one. These 2 dictionaries don't have the same length, like in the example. Then after I find them, divide their values.
So what I want to do, sort of using a bit of Java is:
for(int i = 0;i<dict1.length(),i++){
for(int j = 0;j<dict2.length(),j++){
if(dict1[i].contains(dict2[j]+" ") // not sure if this works, but this
// would theoretically remove the
// possibility of the word being the
// second part of the 2 word element
dict1[i] / dict2[j]
What I've tried so far is trying to make 4 different lists. A list for dict1 keys, a list for dict1 values and the same for dict2. Then I've realized I don't even know how to check if dict2 has any similar elements to dict1.
I've tried making an extra value in the dictionary (a sort of index), so it would kind of get me somewhere, but as it turns out dict2.keys() isn't iterable either. Which would in turn have me believe using 4 different lists and trying to compare it somehow using that is very wrong.
Dictionaries don't have any facilities at all to handle parts of keys. Keys are opaque objects. They are either there or not there.
So yes, you would loop over all the keys in the first dictionary, extract the first word, and then test if the other dictionary has that first word as a key:
for key, dict1_value in dict1.items():
first_word = key.split()[0] # split on whitespace, take the first result
if first_word in dict2:
dict2_value = dict2[first_word]
print(dict1_value / dict2_value)
So this takes every key in dict1, splits off the first word, and tests if that word is a key in dict2. If it is, get the values and print the result.
If you need to test those first words more often, you could make this a bit more efficient by first building another structure to to create an index from first words to whole keys. Simply store the first words every key of the first dictionary, in a new dictionary:
first_to_keys = {}
for key in dict1:
first_word = key.split()[0]
# add key to a set for first_word (and create the set if there is none yet)
first_to_keys.setdefault(first_word, set()).add(key)
Now first_to_key is a dictionary of first words, pointing to sets of keys (so if the same first word appears more than once, you get all full keys, not just one of them). Build this index once (and update the values each time you add or remove keys from dict1, so keep it up to date as you go).
Now you can compare that mapping to the other dictionary:
for matching in first_to_key.keys() & dict2.keys():
dict2_value = dict2[matching]
for dict1_key in first_to_key[matching]:
dict1_value = dict1[dict1_key]
print(dict1_value / dict2_value)
This uses the keys from two dictionaries as sets; the dict.keys() object is a dictionary view that lets you apply set operations. & gives you the intersection of the two dictionary key sets, so all keys that are present in both.
You only need to use this second option if you need to get at those first words more often. It gives you a quick path in the other direction, so you could loop over dict2, and quickly go back to the first dictionary again.
Here's a solution using the str.startswith method of strings
for phrase, val1 in dict1.items():
for word, val2 in dict2.items():
if phrase.startswith(word):
print(val1/val2)

Ensure list of dicts has a dict with key for each key in list

Context:
I'm using an Ajax call to return some complex JSON from a python module. I have to use a list of keys and confirm that a list of single-item dicts contains a dict with each key.
Example:
mylist=['this', 'that', 'these', 'those']
mydictlist=[{'this':1},{'that':2},{'these':3}]
How do I know that mydictlist is missing the "those" key? Once I know that, I can append {'those':4} to mylist. Simply checking for "those" won't work since the list is dynamic. The data structure cannot change.
Thanks.
Simple code is to convert your search list to a set, then use differencing to determine what you're missing:
missing = set(mylist).difference(*mydictlist)
which gets you missing of {'those'}.
Since the named set methods can take multiple arguments (and they need not be sets themselves), you can just unpack all the dicts as arguments to difference to subtract all of them from your set of desired keys at once.
If you do need to handle duplicates (to make sure you see each of the keys in mylist at least that many time in mydictlist's keys, so mylist might contain a value twice which must occur twice in the dicts), you can use collections and itertools to get remaining counts:
from collections import Counter
from itertools import chain
c = Counter(mylist)
c.subtract(chain.from_iterable(mydictlist))
# In 3.3+, easiest way to remove 0/negative counts
c = +c
# In pre-3.3 Python, change c = +c to get the same effect slightly less efficiently
c += Counter()
The most straightforward way is to iterate over both the containers and check:
for key in mylist:
if not any(key in dic for dic in mydictlist):
print key, "missing"
However, if you have a lot of keys and/or dictionaries, this is not going to be efficient: it iterates over mydictlist once for each element in mylist, which is O(n*m). Instead, consider a set operation:
print set(mylist).difference(*mydictlist)
The pandas package is a great way to handle list of dicts problems. It takes all the keys and makes them column headers, values with similar keys populate the same column.
Check this out:
import pandas as pd
mydictlist=[{'this':1},{'that':2},{'these':3}]
# Convert data to a DataFrame
df = pd.DataFrame(mydictlist)
# List all the column header names and check if any of the key words are missing
df.columns

can I compare the keys of two dictionaries that are not in the same order?

I apologize this must be a basic question for using dictionaries. I'm learning python, and the objective I have is to compare two dictionaries and recover the Key and Value entries from both entries that are identical. I understand that the order in dictionaries is not relevant like if one is working with a list. But I adopted a code to compare my dictionaries and i just wanted to make sure that the order of the dictionaries does not matter.
The code I have written so far is:
def compare_dict(first,second):
with open('Common_hits_python.txt', 'w') as file:
for keyone in first:
for keytwo in second:
if keytwo == keyone:
if first[keyone] == second[keytwo]:
file.write(keyone + "\t" + first[keyone] + "\n")
Any recommendations would be appreciated. I apologize for the redundany in the code above. But if someone could confirm that comparing two dictionaries this way does not require the key to be in the same order would great. Other ways of writing the function would be really appreciated as well.
Since you loop over both dictionaries and compare all the combinations, no, order doesn't matter. Every key in one dictionary is compared with every key in the other dictionary, eventually.
It is not a very efficient way to test for matching keys, however. Testing if a key is present is as simple as keyone in second, no need to loop over all the keys in second here.
Better still, you can use set intersections instead:
for key, value in first.viewitems() & second.viewitems():
# loops over all key - value pairs that match in both.
file.write('{}\t{}\n'.format(key, value))
This uses dictionary view objects; if you are using Python 3, then you can use first.items() & second.items() as dictionaries there return dictionary views by default.
Using dict.viewitems() as a set only works if the values are hashable too, but since you are treating your values as strings when writing to the file I assumed they were.
If your values are not hashable, you'll need to validate that the values match, but you can still use views and intersect just the keys:
for key in first.viewkeys() & second.viewkeys():
# loops over all keys that match in both.
if first[key] == second[key]:
file.write('{}\t{}\n'.format(key, first[key]))
Again, in Python 3, use first.keys() & second.keys() for the intersection of the two dictionaries by keys.
Your way of doing it is valid. As you look through both lists, the order of the dictionaries does not matter.
You could do this instead, to optimize your code.
for keyone in first:
if keyone in second: # returns true if keyone is present in second.
if first[keyone] == second[keyone]:
file.write(keyone + "\t" + first[keyone] + "\n")
The keys of a dictionary are effectively a set, and Python already has a built-in set type with an efficient intersection method. This will produce a set of keys that are common to both dictionaries:
dict0 = {...}
dict1 = {...}
set0 = set(dict0)
set1 = set(dict1)
keys = set0.intersection(set1)
Your goal is to build a dictionary out of these keys, which can be done with a dictionary comprehension. It will require a condition to keep out the keys that have unequal values in the two original dictionaries:
new_dict = {k: dict0[k] for k in keys if dict0[k] == dict1[k]}
Depending on your intended use for the new dictionary, you might want to copy or deepcopy the old dictionary's values into the new one.

Categories

Resources