How to add a string to all integer value of dictionary - python

I have made a dictionary that with following code. I want to add a string "English" to all values but since there is integer in the value it does not accept.
key = ["I", "you", "we", "us", "they", "their"]
value = list(range(len(key)))
dictionary = dict(zip(key,value))
print(dictionary)
Output:
{'they': 4, 'I': 0, 'you': 1, 'we': 2, 'us': 3, 'their': 5}
I want following output:
output = {'they': 'English 4', 'I': 'English 0', 'you': 'English 1', 'we': 'English 2', 'us': 'English 3', 'their': 'English 5'}

You mean having 2 different fields for every value (one is index, one is language). To do that you just turn value into a list of tuples instead lf a single value.
So value should contain ((0, "English"), (1, "English"), ... (len(key), "English"))
You can do that easily with enumerate:
value = enumerate(["English"] * len(key))
Output:
{'their': (5, 'English'), 'you': (1, 'English'), 'us': (3, 'English'), 'I': (0, 'English'), 'they': (4, 'English'), 'we': (2, 'English')}
(iyou might have realized: enumerate(a) returns every item of a with index attached, i.e. ( (0, a[0]), (1, a[1]) , (2, a[2]), ...)

You can use list comprehension. change
value = range(len(key))
dictionary = dict(zip(key, ['English {}'.format(i) for i in value]))

Related

Build a dictionary with the words of a sentence as keys and the number of position of the words from 1 as values in python

I expect this output from the code below:
{'Tell': 1, 'a': 2, 'little': 3, 'more': 4, 'about': 5, 'yourself': 6, 'as': 7, 'a': 8, 'developer': 9}
But I get this output:
{'Tell': 1, 'a': 8, 'little': 3, 'more': 4, 'about': 5, 'yourself': 6, 'as': 7, 'developer': 9}
This is the code:
sentence = 'Tell a little more about yourself as a developer'
list_words = sentence.split()
d = {word: i for i, word in enumerate(list_words, 1)}
print(d)
What do you think is the problem? What is the code that gives the output I want?
You cannot have two identical keys in a dictionary so it is impossible to get your expected result where 'a' is present twice (once for 'a':2 and again for 'a':8).
You output data structure could be a list of tuples instead of a dictionary:
r = [(word,i) for i,word in enumerate(list_words,1)]
[('Tell', 1), ('a', 2), ('little', 3), ('more', 4), ('about', 5),
('yourself', 6), ('as', 7), ('a', 8), ('developer', 9)]
Or, it could be a dictionary with a list of positions for each word:
d = dict()
for i,word in enumerate(list_words,1):
d.setdefault(word,[]).append(i)
{'Tell': [1], 'a': [2, 8], 'little': [3], 'more': [4],
'about': [5], 'yourself': [6], 'as': [7], 'developer': [9]}
You need to access the Index of the list to get the order of the words in your sentence.
sentence = 'Tell a little more about yourself as a developer'
list_words = sentence.split()
words = [(value, index+1) for index, value in enumerate(list_words)]
print(words)
#output
[('Tell', 1), ('a', 2), ('little', 3), ('more', 4), ('about', 5), ('yourself', 6), ('as', 7), ('a', 8), ('developer', 9)]
Your requested output is a dictionary, but in a specific order. Python dictionaries don't support duplicate keys (a, a), which creates problems with getting this output.
sentence = 'Tell a little more about yourself as a developer'
list_words = sentence.split()
words = [(value, index+1) for index, value in enumerate(list_words)]
dict_words = {}
for item in words:
dict_words.update({item[0]:item[1]})
print(dict_words)
#output
{1: 'Tell', 2: 'a', 3: 'little', 4: 'more', 5: 'about', 6: 'yourself', 7: 'as', 8: 'a', 9: 'developer'}
sentence ='Tell a little more about yourself as a developer'
list_words=sentence.split()
uniquewords = list(set(list_words))
d = {i:0 for i in uniquewords}
for i in list_words:
for j in s1:
if i==j:
d[j]+=1
print(d)
Maybe you printed the index of the letters instead of the index of the words.
You can try:
sentence = 'Tell a little more about yourself as a developer'
words_list = sentence.split()
words_dictionary = dict()
for word in words_list:
words_dictionary[word] = words_list.index(word) + 1
print(words_dictionary)
#output :
# {'Tell': 1, 'a': 2, 'little': 3, 'more': 4, 'about': 5, 'yourself': 6, 'as': 7, 'developer': 9}

Converting a dictionary to a mathematical expression

{'YOU': {'HE': {'EST': 8, 'OLM': 6}, 'SLO': {'WLR': 8}},
'ARE': {'KLP': {'EST': 6}, 'POL': {'WLR': 4}},
'DOING': {'TIS': {'OIL': 8}},
'GREAT': {'POL': {'EOL': 6}},
'WORK': {'KOE': {'RIW': 8, 'PNG': 4}, 'ROE': {'ERC': 8, 'WQD': 6}},
'KEEP': {'PAR': {'KOM': 8, 'RTW': 6}, 'PIL': {'XCE': 4, 'ACE': 8}},
'ROCKING': {'OUL': {'AZS': 6, 'RVX': 8}}}
Need to perform a calculation on the numbers in dictionary.
Eg: {'YOU': {'HE': {'EST': 8, 'OLM': 6}, 'SLO': {'WLR': 8}},
'WORK': {'KOE': {'RIW': 8, 'PNG': 4}, 'ROE': {'ERC': 8, 'WQD': 6}}} for this example the output would be
[(8+6)x8]+[(8+4)x(8+6)]
[14x8]+[12x14]
112+168
280
Following is the code I tried :
a = [tuple([k]+list(v.keys())+list(j.values())) for k,v in data.items() for i,j in v.items()]
and it gives :
[('YOU', 'HE', 'SLO', 8, 6),
('YOU', 'HE', 'SLO', 8),
('ARE', 'KLP', 'POL', 6),
('ARE', 'KLP', 'POL', 4),
('DOING', 'TIS', 8),
('GREAT', 'POL', 6),
('WORK', 'KOE', 'ROE', 8, 4),
('WORK', 'KOE', 'ROE', 8, 6),
('KEEP', 'PAR', 'PIL', 8, 6),
('KEEP', 'PAR', 'PIL', 4, 8),
('ROCKING', 'OUL', 6, 8)]
The rules aren't well-defined, but I'll give it a shot anyway. I am assuming you only want this calculation to apply to keys YOU and WORK in your nested dictionary. I think a list comprehension will get pretty complicated, and it's more readable to work with loops.
For each key YOU and WORK, I summed up these two innermost sets of values 8+6, 8 for YOU and 8+4, 8+6 for WORK, multiplied these values together 14*8 for YOU and 12*14 for WORK, then added the products together to get the result = 280
dict_nested = {'YOU': {'HE': {'EST': 8, 'OLM': 6}, 'SLO': {'WLR': 8}},
'ARE': {'KLP': {'EST': 6}, 'POL': {'WLR': 4}},
'DOING': {'TIS': {'OIL': 8}},
'GREAT': {'POL': {'EOL': 6}},
'WORK': {'KOE': {'RIW': 8, 'PNG': 4}, 'ROE': {'ERC': 8, 'WQD': 6}},
'KEEP': {'PAR': {'KOM': 8, 'RTW': 6}, 'PIL': {'XCE': 4, 'ACE': 8}},
'ROCKING': {'OUL': {'AZS': 6, 'RVX': 8}}}
keys = ['YOU','WORK']
result = 0
for key in keys:
inner_keys = dict_nested[key].keys()
# multiply the values together for the first values of the inner key
inner_product = 1
for inner_key in inner_keys:
inner_product *= sum(list(dict_nested[key][inner_key].values()))
# print(inner_product)
result += inner_product
Output:
>>> result
280
NOTE
By any means don't use eval, it is insecure ("eval is evil").
For more details about eval harmfulness (there are too many, I've just cherry-picked one) read here.
Some Inspiration Towards a Solution
As others and smarter before me have noted, I haven't found any reasonable explanation regarding the operands assignment in the example you've provided.
However, this is a little try - hope it will help you with the challenge.
So here you go:
import json
d = {'YOU': {'HE': {'EST': 8, 'OLM': 6}, 'SLO': {'WLR': 8}}, 'WORK': {'KOE': {'RIW': 8, 'PNG': 4}, 'ROE': {'ERC': 8, 'WQD': 6}}}
# Convet dictionary to a string
r = json.dumps(d)
# Convert string to char list
chars = list(r)
# Legal chars needed for computing
legal_chars = ['{', '}', ','] + [str(d) for d in range(10)]
# Filtering in only legal chars
filtered_chars = [x for x in chars if x in legal_chars]
# Replacing the {} with () and , with +
expression = ''.join(filtered_chars).replace('{', '(').replace('}', ')').replace(',', '+')
# Evaluating expression
result = eval(expression)
# (((8+6)+(12))+((8+4)+(8+6)))=52
print(f'{expression}={result}')

Check if dict key is a substring of any other element in the dictionary in Python?

I've got a dictionary ngram_list as follows:
ngram_list = dict_items([
('back to back breeding', {'wordcount': 4, 'count': 3}),
('back breeding', {'wordcount': 2, 'count': 5}),
('several consecutive heats', {'wordcount': 3, 'count': 2}),
('how often should', {'wordcount': 3, 'count': 2}),
('often when breeding', {'wordcount': 3, 'count': 1})
])
I want to sort the list from the shortest wordcount to the largest and then loop through the dictionary and if the key is a substring of any other item, delete it (the substring item.)
Expected output:
ngram_list = dict_items([
('several consecutive heats', {'wordcount': 3, 'count': 2}),
('how often should', {'wordcount': 3, 'count': 2}),
('often when breeding', {'wordcount': 3, 'count': 1}),
('back to back breeding', {'wordcount': 4, 'count': 3})
])
First filter the input dict to get rid of unwanted items, then using sorted function with key to sort the items by wordcount, and finally build the dict with OrderedDict
Using simple in to check for substring only, might need to use regex if wanna take care exact full word boundary match
from collections import OrderedDict
ngram_dict = {
'back to back breeding': {'wordcount': 4, 'count': 3},
'back breeding': {'wordcount': 2, 'count': 5},
'several consecutive heats': {'wordcount': 3, 'count': 2},
'how often should': {'wordcount': 3, 'count': 2},
'often when breeding': {'wordcount': 3, 'count': 1}
}
# ngram items with unwanted items filter out
ngram_filter = [i for i in ngram_dict.items() if not any(i[0] in k and i[0] != k for k in ngram_dict.keys())]
final_dict = OrderedDict( sorted(ngram_filter, key=lambda x:x[1].get('wordcount')) )
# final_dict = OrderedDict([('several consecutive heats', {'count': 2, 'wordcount': 3}), ('how often should', {'count': 2, 'wordcount': 3}), ('often when breeding', {'count': 1, 'wordcount': 3}), ('back to back breeding', {'count': 3, 'wordcount': 4})])
All this can be fitted into 1 liner as below
from collections import OrderedDict
final_dict = OrderedDict(
sorted((i for i in ngram_dict.items() if not any(i[0] in k and i[0] != k for k in ngram_dict.keys())),
key=lambda x:x[1].get('wordcount')) )

fast way to find occurrences in a list in python

I have a set of unique words called h_unique. I also have a 2D list of documents called h_tokenized_doc which has a structure like:
[ ['hello', 'world', 'i', 'am'],
['hello', 'stackoverflow', 'i', 'am'],
['hello', 'world', 'i', 'am', 'mr'],
['hello', 'stackoverflow', 'i', 'am', 'pycahrm'] ]
and h_unique as:
('hello', 'world', 'i', 'am', 'stackoverflow', 'mr', 'pycharm')
what I want is to find the occurrences of the unique words in the tokenized documents list.
So far I came up with this code but this seems to be VERY slow. Is there any efficient way to do this?
term_id = []
for term in h_unique:
print term
for doc_id, doc in enumerate(h_tokenized_doc):
term_id.append([doc_id for t in doc if t == term])
In my case I have a document list of 7000 documents, structured like:
[ [doc1], [doc2], [doc3], ..... ]
It'll be slow because you're running through your entire document list once for every unique word. Why not try storing the unique words in a dictionary and appending to it for each word found?
unique_dict = {term: [] for term in h_unique}
for doc_id, doc in enumerate(h_tokenized_doc):
for term_id, term in enumerate(doc):
try:
# Not sure what structure you want to keep it in here...
# This stores a tuple of the doc, and position in that doc
unique_dict[term].append((doc_id, term_id))
except KeyError:
# If the term isn't in h_unique, don't do anything
pass
This runs through all the document's only once.
From your above example, unique_dict would be:
{'pycharm': [], 'i': [(0, 2), (1, 2), (2, 2), (3, 2)], 'stackoverflow': [(1, 1), (3, 1)], 'am': [(0, 3), (1, 3), (2, 3), (3, 3)], 'mr': [(2, 4)], 'world': [(0, 1), (2, 1)], 'hello': [(0, 0), (1, 0), (2, 0), (3, 0)]}
(Of course assuming the typo 'pycahrm' in your example was deliberate)
term_id.append([doc_id for t in doc if t == term])
This will not append one doc_id for each matching term; it will append an entire list of potentially many identical values of doc_id. Surely you did not mean to do this.
Based on your sample code, term_id ends up as this:
[[0], [1], [2], [3], [0], [], [2], [], [0], [1], [2], [3], [0], [1], [2], [3], [], [1], [], [3], [], [], [2], [], [], [], [], []]
Is this really what you intended?
If I understood correctly, and based on your comment to the question where you say
yes because a single term may appear in multiple docs like in the above case for hello the result is [0,1, 2, 3] and for world it is [0, 2]
it looks like what you wanna do is: For each of the words in the h_unique list (which, as mentioned, should be a set, or keys in a dict, which both have a search access of O(1)), go through all the lists contained in the h_tokenized_doc variable and find the indexes in which of those lists the word appears.
IF that's actually what you want to do, you could do something like the following:
#!/usr/bin/env python
h_tokenized_doc = [['hello', 'world', 'i', 'am'],
['hello', 'stackoverflow', 'i', 'am'],
['hello', 'world', 'i', 'am', 'mr'],
['hello', 'stackoverflow', 'i', 'am', 'pycahrm']]
h_unique = ['hello', 'world', 'i', 'am', 'stackoverflow', 'mr', 'pycharm']
# Initialize a dict with empty lists as the value and the items
# in h_unique the keys
results = {k: [] for k in h_unique}
for i, line in enumerate(h_tokenized_doc):
for k in results:
if k in line:
results[k].append(i)
print results
Which outputs:
{'pycharm': [], 'i': [0, 1, 2, 3], 'stackoverflow': [1, 3],
'am': [0, 1, 2, 3], 'mr': [2], 'world': [0, 2],
'hello': [0, 1, 2, 3]}
The idea is using the h_unique list as keys in a dictionary (the results = {k: [] for k in h_unique} part).
Keys in dictionaries have the advantage of a constant lookup time, which is great for the if k in line: part (if it were a list, that in would take O(n)) and then check if the word (the key k) appears in the list. If it does, append the index of the list within the matrix to the dictionary of results.
Although I'm not certain this is what you want to achieve, though.
You can optimize your code to do the trick with
Using just a single for loop
Generators dictionaries for constant lookup time, as suggested previously. Generators are faster than for loops because the generate values on the fly
In [75]: h_tokenized_doc = [ ['hello', 'world', 'i', 'am'],
...: ['hello', 'stackoverflow', 'i', 'am'],
...: ['hello', 'world', 'i', 'am', 'mr'],
...: ['hello', 'stackoverflow', 'i', 'am', 'pycahrm'] ]
In [76]: h_unique = ('hello', 'world', 'i', 'am', 'stackoverflow', 'mr', 'pycharm')
In [77]: term_id = {k: [] for k in h_unique}
In [78]: for term in h_unique:
...: term_id[term].extend(i for i in range(len(h_tokenized_doc)) if term in h_tokenized_doc[i])
which yields the output
{'am': [0, 1, 2, 3],
'hello': [0, 1, 2, 3],
'i': [0, 1, 2, 3],
'mr': [2],
'pycharm': [],
'stackoverflow': [1, 3],
'world': [0, 2]}
A more descriptive solution would be
In [79]: for term in h_unique:
...: term_id[term].extend([(i,h_tokenized_doc[i].index(term)) for i in range(len(h_tokenized_doc)) if term in h_tokenized_doc[i]])
In [80]: term_id
Out[80]:
{'am': [(0, 3), (1, 3), (2, 3), (3, 3)],
'hello': [(0, 0), (1, 0), (2, 0), (3, 0)],
'i': [(0, 2), (1, 2), (2, 2), (3, 2)],
'mr': [(2, 4)],
'pycharm': [],
'stackoverflow': [(1, 1), (3, 1)],
'world': [(0, 1), (2, 1)]}

create nested dictionarys in python3

I would like to create nested dict in python3, I've the following list(from a sql-query):
[('madonna', 'Portland', 'Oregon', '0.70', '+5551234', 'music', datetime.date(2016, 9, 8), datetime.date(2016, 9, 1)), ('jackson', 'Laredo', 'Texas', '2.03', '+555345', 'none', datetime.date(2016, 5, 23), datetime.date(2016, 5, 16)), ('bohlen', 'P', 'P', '2.27', '+555987', 'PhD Student', datetime.date(2016, 9, 7))]
I would like to have the following output:
{madonna:{city:Portland, State:Oregon, Index: 0.70, Phone:+5551234, art:music, exp-date:2016, 9, 8, arrival-date:datetime.date(2016, 5, 23)},jackson:{city: Laredo, State:Texas........etc...}}
Can somebody show me an easy to understand code?
I try:
from collections import defaultdict
usercheck = defaultdict(list)
for accname, div_ort, standort, raum, telefon, position, exp, dep in cur.fetchall():
usercheck(accname).append[..]
but this don't work, I can't think any further myself
You can use Dict Comprehension (defined here) to dynamically create a dictionary based on the elements of a list:
sql_list = [
('madonna', 'Portland', 'Oregon', '0.70', '+5551234', 'music', datetime.date(2016, 9, 8), datetime.date(2016, 9, 1)),
('jackson', 'Laredo', 'Texas', '2.03', '+555345', 'none', datetime.date(2016, 5, 23), datetime.date(2016, 5, 16)),
('bohlen', 'P', 'P', '2.27', '+555987', 'PhD Student', datetime.date(2016, 9, 7))
]
sql_dict = {
element[0]: {
'city': element[1],
'state': element[2],
'index': element[3],
'phone': element[4],
'art': element[5],
} for element in sql_list
}
Keep in mind that every item in the dictionary needs to have a key and a value, and in your example you have a few values with no key.
If you have a list of the columns, you can use the zip function:
from collections import defaultdict
import datetime
# list of columns returned from your database query
columns = ["city", "state", "index", "phone", "art", "exp-date", "arrival-date"]
usercheck = defaultdict(list)
for row in cur.fetchall():
usercheck[row[0]] = defaultdict(list, zip(columns, row[1:]))
print usercheck
This will output a dictionary like:
defaultdict(<type 'list'>, {'madonna': defaultdict(<type 'list'>, {'city': 'Portland', 'art': 'music', 'index': '0.70', 'phone': '+5551234', 'state': 'Oregon', 'arrival-date': datetime.date(2016, 9, 1), 'exp-date': datetime.date(2016, 9, 8)}), 'jackson': defaultdict(<type 'list'>, {'city': 'Laredo', 'art': 'none', 'index': '2.03', 'phone': '+555345', 'state': 'Texas', 'arrival-date': datetime.date(2016, 5, 16), 'exp-date': datetime.date(2016, 5, 23)}), 'bohlen': defaultdict(<type 'list'>, {'city': 'P', 'art': 'PhD Student', 'index': '2.27', 'phone': '+555987', 'state': 'P', 'arrival-date': None, 'exp-date': datetime.date(2016, 9, 7)})})
When using defaultdict, the argument specifies the default value type in the dictionary.
from collections import defaultdict
usercheck = defaultdict(dict)
for accname, div_ort, standort, raum, telefon, position, exp, dep in cur.fetchall():
usercheck[accname]['city'] = div_ort
usercheck[accname]['state'] = standout
...
The keys in the dictionary are referenced using [key], not (key).

Categories

Resources