Is it possible to get a partial view of a dict in Python analogous of pandas df.tail()/df.head(). Say you have a very long dict, and you just want to check some of the elements (the beginning, the end, etc) of the dict. Something like:
dict.head(3) # To see the first 3 elements of the dictionary.
{[1,2], [2, 3], [3, 4]}
Thanks
Kinda strange desire, but you can get that by using this
from itertools import islice
# Python 2.x
dict(islice(mydict.iteritems(), 0, 2))
# Python 3.x
dict(islice(mydict.items(), 0, 2))
or for short dictionaries
# Python 2.x
dict(mydict.items()[0:2])
# Python 3.x
dict(list(mydict.items())[0:2])
Edit:
in Python 3.x:
Without using libraries it's possible to do it this way. Use method:
.items()
which returns a list of dictionary keys with values.
It's necessary to convert it to a list otherwise an error will occur 'my_dict' object is not subscriptable. Then convert it to the dictionary. Now it's ready to slice with square brackets.
dict(list(my_dict.items())[:3])
import itertools
def glance(d):
return dict(itertools.islice(d.iteritems(), 3))
>>> x = {1:2, 3:4, 5:6, 7:8, 9:10, 11:12}
>>> glance(x)
{1: 2, 3: 4, 5: 6}
However:
>>> x['a'] = 2
>>> glance(x)
{1: 2, 3: 4, u'a': 2}
Notice that inserting a new element changed what the "first" three elements were in an unpredictable way. This is what people mean when they tell you dicts aren't ordered. You can get three elements if you want, but you can't know which three they'll be.
I know this question is 3 years old but here a pythonic version (maybe simpler than the above methods) for Python 3.*:
[print(v) for i, v in enumerate(my_dict.items()) if i < n]
It will print the first n elements of the dictionary my_dict
one-up-ing #Neb's solution with Python 3 dict comprehension:
{k: v for i, (k, v) in enumerate(my_dict.items()) if i < n}
It returns a dict rather than printouts
For those who would rather solve this problem with pandas dataframes. Just stuff your dictionary mydict into a dataframe, rotate it, and get the first few rows:
pd.DataFrame(mydict, index=[0]).T.head()
0 hi0
1 hi1
2 hi2
3 hi3
4 hi4
From the documentation:
CPython implementation detail: Keys and values are listed in an
arbitrary order which is non-random, varies across Python
implementations, and depends on the dictionary’s history of insertions
and deletions.
I've only toyed around at best with other Python implementations (eg PyPy, IronPython, etc), so I don't know for certain if this is the case in all Python implementations, but the general idea of a dict/hashmap/hash/etc is that the keys are unordered.
That being said, you can use an OrderedDict from the collections library. OrderedDicts remember the order of the keys as you entered them.
If keys are someway sortable, you can do this:
head = dict([(key, myDict[key]) for key in sorted(myDict.keys())[:3]])
Or perhaps:
head = dict(sorted(mydict.items(), key=lambda: x:x[0])[:3])
Where x[0] is the key of each key/value pair.
list(reverse_word_index.items())[:10]
Change the number from 10 to however many items of the dictionary reverse_word_index you want to preview
A quick and short solution can be this:
import pandas as pd
d = {"a": [1,2], "b": [2, 3], "c": [3, 4]}
pd.Series(d).head()
a [1, 2]
b [2, 3]
c [3, 4]
dtype: object
This gives back a dictionary:
dict(list(my_dictname.items())[0:n])
If you just want to have a glance of your dict, then just do:
list(freqs.items())[0:n]
Order of items in a dictionary is preserved in Python 3.7+, so this question makes sense.
To get a dictionary with only 10 items from the start you can use pandas:
d = {"a": [1,2], "b": [2, 3], "c": [3, 4]}
import pandas as pd
result = pd.Series(d).head(10).to_dict()
print(result)
This will produce a new dictionary.
d = {"a": 1,"b": 2,"c": 3}
for i in list(d.items())[:2]:
print('{}:{}'.format(d[i][0], d[i][1]))
a:1
b:2
Related
I have a list within a dictionary within a dictionary. The data set is very large. How can I most quickly return the list nested in the two dictionaries if I am given a List that is specific to the key, dict pairs?
{"Dict1":{"Dict2": ['UNIOUE LIST'] }}
Is there an alternate data structure to use for this for efficiency?
I do not believe a more efficient data structure exists in Python. Simply retrieving the list using the regular indexing operator should be a very fast operation, even if both levels of dictionaries are very large.
nestedDict = {"Dict1":{"Dict2": ['UNIOUE LIST'] }}
uniqueList = nestedDict["Dict1"]["Dict2"]
My only thought for improving performance was to try flattening the data structure into a single dictionary with tuples for keys. This would take more memory than the nested approach since the keys in the top-level dictionary will be replicated for every entry in the second-level dictionaries, but it will only compute the hash function once for every lookup. But this approach is actually slower than the nested approach in practice:
nestedDict = {i: {j: ['UNIQUE LIST'] for j in range(1000)} for i in range(1000)}
flatDict = {(i, j): ['UNIQUE LIST'] for i in range(1000) for j in range(1000)}
import random
def accessNested():
i = random.randrange(1000)
j = random.randrange(1000)
return nestedDict[i][j]
def accessFlat():
i = random.randrange(1000)
j = random.randrange(1000)
return nestedDict[(i,j)]
import timeit
print(timeit.timeit(accessNested))
print(timeit.timeit(accessFlat))
Output:
2.0440238649971434
2.302736301004188
The fastest way to access the list within the nested dictionary is,
d = {"Dict1":{"Dict2": ['UNIOUE LIST'] }}
print(d["Dict1"]["Dict2"])
Output :
['UNIOUE LIST']
But if you perform iteration on the list that is in nested dictionary. so you can use the following code as example,
d = {"a":{"b": ['1','2','3','4'] }}
for i in d["a"]["b"]:
print(i)
Output :
1
2
3
4
If I understand correctly, you want to access a nested dictionary structure if...
if I am given a List that is specific to the key
So, here you have a sample dictionary and key that you want to access
d = {'a': {'a': 0, 'b': 1},
'b': {'a': {'a': 2}, 'b': 3}}
key = ('b', 'a', 'a')
The lazy approach
This is fast if you know Python dictionaries already, no need to learn other stuff!
>>> value = d
>>> for level in key:
... value = temp[level]
>>> value
2
NestedDict from the ndicts package
If you pip install ndicts then you get the same "lazy approach" implementation in a nicer interface.
>>> from ndicts import NestedDict
>>> nd = NestedDict(d)
>>> nd[key]
2
>>> nd["b", "a", "a"]
2
This option is fast because you can't really write less code than nd[key] to get what you want.
Pandas dataframes
This is the solution that will give you performance. Lookups in dataframes should be quick, especially if you have a sorted index.
In this case we have hierarchical data with multiple levels, so I will create a MultiIndex first. I will use the NestedDict for ease, but anything else to flatten the dictionary will do.
>>> keys = list(nd.keys())
>>> values = list(nd.values())
>>> from pandas import DataFrame, MultiIndex
>>> index = MultiIndex.from_tuples(keys)
>>> df = DataFrame(values, index=index, columns="Data").sort_index()
>>> df
Data
a a NaN 0
b NaN 1
b a a 2
b NaN 3
Use the loc method to get a row.
>>> nd.loc[key]
Data 2
Name: (b, a, a), dtype: int64
I know this is a very efficient way in python 2, to intersect 2 dictionaries
filter(dict_1.has_key, dict_2.keys())
However has_key() was removed from Python3, so I can't really use the fast filter() and has_key() functions. What I'm doing right now is:
[key for key in dict_2 if key in dict_1]
But it seems a bit janky, on top of not being so much readable. Is this really the new fastest way with python3, or is there a faster, cleaner way by using filter()?
Instead of has_key in Python 2, you can use the in operator in Python 3.x. With filter, which gives a lazy iterator in 3.x, you can use dict.__contains__. There's also no need to call dict.keys:
res = filter(dict_1.__contains__, dict_2) # lazy
print(list(res))
# [2, 3]
An equivalent, but less aesthetic, lambda-based solution:
res = filter(lambda x: x in dict_1, dict_2) # lazy
A generator expression is a third alternative:
res = (x for x in dict_2 if ix in dict_1) # lazy
For a non-lazy method, you can use set.intersection (or its syntactic sugar &):
res = set(dict_1) & set(dict_2) # {2, 3}
As you want the intersection of the keys, you could do:
d1 = {1 : 1, 2 : 2}
d2 = {1 : 3, 2 : 4, 3 : 5}
common = list(d1.keys() & d2.keys())
print(common)
Output
[1, 2]
I want to create a dictionary where the value is a list. Now with every new incoming key I want to start a new list.
If I simply try to append to the dictionary key value it shows an error, since the value is not yet declared a list, and I can't do this at the beginning since I don't know how many keys I'll have.
For example (Note that for MyDict keyList is variable, something not previously known)
keyList = [['a',1,2],['b',3,4]]
MyDict={}
for key in keyList:
MyDict[key[0]].append(key[1:])
What I want to create is:
MyDict={'a': [[1, 2]], 'b': [[3, 4]]}
Is there a way to do this?
You can use setdefault.
Instead of MyDict[key[0]].append(key[1:])
Try using MyDict.setdefault(key[0],[]).append(key[1:])
So when you are trying to append value, if the list doesn't exist it will make a default list and then append value to it
What you need is defaultdict, which is supplied with Python exactly for this purpose:
keyList = [['a',1,2],['b',3,4]]
MyDict=defaultdict(list)
for key in keyList:
MyDict[key[0]].append(key[1:])
print(MyDict)
Output
defaultdict(<class 'list'>, {'b': [[3, 4]], 'a': [[1, 2]]})
defaultdict gets in its constructor a function which is called without arguments to create default elements. list without arguments creates an empty list.
defaultdict behaves in all other ways like a normal dictionary.
If, nevertheless, in the end you want a simple dict, then convert it using dict:
print(dict(MyDict))
gives
{'b': [[3, 4]], 'a': [[1, 2]]}
Use collections.defaultdict. Example:
from collections import defaultdict
d = defaultdict(lambda: [])
d['key'].append('value to be add in list')
Sidenote: Defaultdict takes a callable argument, which means that you can also create your dict in this way:
defaultdict(list)
So I tried to only allow the program to store only last 3 scores(values) for each key(name) however I experienced a problem of the program only storing the 3 scores and then not updating the last 3 or the program appending more values then it should do.
The code I have so far:
#appends values if a key already exists
while tries < 3:
d.setdefault(name, []).append(scores)
tries = tries + 1
Though I could not fully understand your question, the concept that I derive from it is that, you want to store only the last three scores in the list. That is a simple task.
d.setdefault(name,[]).append(scores)
if len(d[name])>3:
del d[name][0]
This code will check if the length of the list exceeds 3 for every addition. If it exceeds, then the first element (Which is added before the last three elements) is deleted
Use a collections.defaultdict + collections.deque with a max length set to 3:
from collections import deque,defaultdict
d = defaultdict(lambda: deque(maxlen=3))
Then d[name].append(score), if the key does not exist the key/value will be created, if it does exist we will just append.
deleting an element from the start of a list is an inefficient solution.
Demo:
from random import randint
for _ in range(10):
for name in range(4):
d[name].append(randint(1,10))
print(d)
defaultdict(<function <lambda> at 0x7f06432906a8>, {0: deque([9, 1, 1], maxlen=3), 1: deque([5, 5, 8], maxlen=3), 2: deque([5, 1, 3], maxlen=3), 3: deque([10, 6, 10], maxlen=3)})
One good way for keeping the last N items in python is using deque with maxlen N, so in this case you can use defaultdict and deque functions from collections module.
example :
>>> from collections import defaultdict ,deque
>>> l=[1,2,3,4,5]
>>> d=defaultdict()
>>> d['q']=deque(maxlen=3)
>>> for i in l:
... d['q'].append(i)
...
>>> d
defaultdict(<type 'collections.deque'>, {'q': deque([3, 4, 5], maxlen=3)})
A slight variation on another answer in case you want to extend the list in the entry name
d.setdefault(name,[]).extend(scores)
if len(d[name])>3:
del d[name][:-3]
from collections import defaultdict
d = defaultdict(lambda:[])
d[key].append(val)
d[key] = d[key][:3]
len(d[key])>2 or d[key].append(value) # one string solution
I've been experimenting trying to get this to work and I've exhausted every idea and web search. Nothing seems to do the trick. I need to sum numbers in a defaultdict(list) and i just need the final result but no matter what i do i can only get to the final result by iterating and returning all sums adding up to the final. What I've been trying generally,
d = { key : [1,2,3] }
running_total = 0
#Iterate values
for value in d.itervalues:
#iterate through list inside value
for x in value:
running_total += x
print running_total
The result is :
1,3,6
I understand its doing this because its iterating through the for loop. What i dont get is how else can i get to each of these list values without using a loop? Or is there some sort of method iv'e overlooked?
To be clear i just want the final number returned e.g. 6
EDIT I neglected a huge factor , the items in the list are timedealta objects so i have to use .seconds to make them into integers for adding. The solutions below make sense and I've tried similar but trying to throw in the .seconds conversion in the sum statement throws an error.
d = { key : [timedelta_Obj1,timedelta_Obj2,timedelta_Obj3] }
I think this will work for you:
sum(td.seconds for sublist in d.itervalues() for td in sublist)
Try this approach:
from datetime import timedelta as TD
d = {'foo' : [TD(seconds=1), TD(seconds=2), TD(seconds=3)],
'bar' : [TD(seconds=4), TD(seconds=5), TD(seconds=6), TD(seconds=7)],
'baz' : [TD(seconds=8)]}
print sum(sum(td.seconds for td in values) for values in d.itervalues())
You could just sum each of the lists in the dictionary, then take one final sum of the returned list.
>>> d = {'foo' : [1,2,3], 'bar' : [4,5,6,7], 'foobar' : [10]}
# sum each value in the dictionary
>>> [sum(d[i]) for i in d]
[10, 6, 22]
# sum each of the sums in the list
>>> sum([sum(d[i]) for i in d])
38
If you don't want to iterate or to use comprehensions you can use this:
d = {'1': [1, 2, 3], '2': [3, 4, 5], '3': [5], '4': [6, 7]}
print(sum(map(sum, d.values())))
If you use Python 2 and your dict has a lot of keys it's better you use imap (from itertools) and itervalues
from itertools import imap
print sum(imap(sum, d.itervalues()))
Your question was how to get the value "without using a loop". Well, you can't. But there is one thing you can do: use the high performance itertools.
If you use chain you won't have an explicit loop in your code. chain manages that for you.
>>> data = {'a': [1, 2, 3], 'b': [10, 20], 'c': [100]}
>>> import itertools
>>> sum(itertools.chain.from_iterable(data.itervalues()))
136
If you have timedelta objects you can use the same recipe.
>>> data = {'a': [timedelta(minutes=1),
timedelta(minutes=2),
timedelta(minutes=3)],
'b': [timedelta(minutes=10),
timedelta(minutes=20)],
'c': [timedelta(minutes=100)]}
>>> sum(td.seconds for td in itertools.chain.from_iterable(data.itervalues()))
8160