Python - process only new element of dictionnary - python

I have a unique (unique keys) dictionnary that I update adding some new keys depending data on a webpage.
and I want to process only the new keys that may appear after a long time. Here is a piece of code to understand :
a = UniqueDict()
while 1:
webpage = update() # return a list
for i in webpage:
title = getTitle(i)
a[title] = new_value # populate only new title obtained because it's a unique dictionnary
if len(a) > 50:
a.clear() # just to clear dictionnary if too big
# Condition before entering this loop to process only new title entered
for element in a.keys():
process(element)
Is there a way to know only new keys added in the dictionnary (because most of the time, it will be the same keys and values so I don't want them to be processed) ?
Thank you.

What you might also do, is keep the processed keys in a set.
Then you can check for new keys by using set(d.keys()) - set_already_processed.
And add processed keys using set_already_processed.add(key)

You may want to use a OrderedDict:
Ordered dictionaries are just like regular dictionaries but they remember the order that items were inserted. When iterating over an ordered dictionary, the items are returned in the order their keys were first added.

Make your own dict that tracks additions:
class NewKeysDict(dict):
"""A dict, but tracks keys that are added through __setitem__
only. reset() resets tracking to begin tracking anew. self.new_keys
is a set holding your keys.
"""
def __init__(self, *args, **kw):
super(NewKeysDict, self).__init__(*args, **kw)
self.new_keys = set()
def reset(self):
self.new_keys = set()
def __setitem__(self, key, value):
super(NewKeysDict, self).__setitem__(key, value)
self.new_keys.add(key)
d = NewKeysDict((i,str(i)) for i in range(10))
d.reset()
print(d.new_keys)
for i in range(5, 10):
d[i] = '{} new'.format(i)
for k in d.new_keys:
print(d[k])

(because most of the time, it will be the same keys and values so I don't want them to be processed)
You get complicate !
The keys are immutable and unique.
Each key is followed by a value separated, by a colon.
dict = {"title",title}
text = "textdude"
dict["keytext"]=text
This is add a value textdude, with the new key called "keytext".
For check, we use "in".
"textdude" in dict
He return true

Related

Python Remove Duplicate Dict

I am trying to find a way to remove duplicates from a dict list. I don't have to test the entire object contents because the "name" value in a given object is enough to identify duplication (i.e., duplicate name = duplicate object). My current attempt is this;
newResultArray = []
for i in range(0, len(resultArray)):
for j in range(0, len(resultArray)):
if(i != j):
keyI = resultArray[i]['name']
keyJ = resultArray[j]['name']
if(keyI != keyJ):
newResultArray.append(resultArray[i])
, which is wildly incorrect. Grateful for any suggestions. Thank you.
If name is unique, you should just use a dictionary to store your inner dictionaries, with name being the key. Then you won't even have the issue of duplicates, and you can remove from the list in O(1) time.
Since I don't have access to the code that populates resultArray, I'll simply show how you can convert it into a dictionary in linear time. Although the best option would be to use a dictionary instead of resultArray in the first place, if possible.
new_dictionary = {}
for item in resultArray:
new_dictionary[item['name']] = item
If you must have a list in the end, then you can convert back into a dictionary as such:
new_list = [v for k,v in new_dictionary.items()]
Since "name" provides uniqueness... and assuming "name" is a hashable object, you can build an intermediate dictionary keyed by "name". Any like-named dicts will simply overwrite their predecessor in the dict, giving you a list of unique dictionaries.
tmpDict = {result["name"]:result for result in resultArray}
newArray = list(tmpDict.values())
del tmpDict
You could shrink that down to
newArray = list({result["name"]:result for result in resultArray}.values())
which may be a bit obscure.

Why isn't all the data being stored?

I have a dictionary carrying key:value however it only saves the last iteration and discards the previous entries where is it being reset ?? This is the output from the ctr of iterations and the length of the dictionary
Return the complete Term and DocID Ref.
LENGTH:6960
CTR:88699
My code:
class IndexData:
def getTermDocIDCollection(self):
...............
for term in terms:
#TermDocIDCollection[term] = sourceFile['newid']
TermDocIDCollection[term] = []
TermDocIDCollection[term].append(sourceFile['newid'])
return TermDocIDCollection
The piece of code you've commented out does the following:
Sets a value to the key (removing whatever was there before, if it existed)
Sets a new value to the key (an empty list)
Appends the value set in step 1 to the new empty list
Sadly, it would do the same each iteration, so you'd end up with [last value] assigned to the key. The new code (with update) does something similar. In the old days you'd do this:
if term in TermDocIDCollection:
TermDocIDCollection[term].append(sourceFile['newid'])
else:
TermDocIDCollection[term] = [sourceFile['newid']]
or a variation of the theme using try-except. After collections was added you can do this instead:
from collections import defaultdict
# ... code...
TermDocIDCollection = defaultdict(list)
and you'd update it like this:
TermDocIDCollection[term].append(sourceFile['newid'])
no need to check if term exists in the dictionary. If it doesn't, the defaultdict type will first call to the constructor you passed (list) to create the initial value for the key

Pandas Dataframe to Dictionary with Multiple Keys

I am currently working with a dataframe consisting of a column of 13 letter strings ('13mer') paired with ID codes ('Accession') as such:
However, I would like to create a dictionary in which the Accession codes are the keys with values being the 13mers associated with the accession so that it looks as follows:
{'JO2176': ['IGY....', 'QLG...', 'ESS...', ...],
'CYO21709': ['IGY...', 'TVL...',.............],
...}
Which I've accomplished using this code:
Accession_13mers = {}
for group in grouped:
Accession_13mers[group[0]] = []
for item in group[1].iteritems():
Accession_13mers[group[0]].append(item[1])
However, now I would like to go back through and iterate through the keys for each Accession code and run a function I've defined as find_match_position(reference_sequence, 13mer) which finds the 13mer in in a reference sequence and returns its position. I would then like to append the position as a value for the 13mer which will be the key.
If anyone has any ideas for how I can expedite this process that would be extremely helpful.
Thanks,
Justin
I would suggest creating a new dictionary, whose values are another dictionary. Essentially a nested dictionary.
position_nmers = {}
for key in H1_Access_13mers:
position_nmers[key] = {} # replicate key, val in new dictionary, as a dictionary
for value in H1_Access_13mers[key]:
position_nmers[key][value] = # do something
To introspect the dictionary and make sure it's okay:
print position_nmers
You can iterate over the groupby more cleanly by unpacking:
d = {}
for key, s in df.groupby('Accession')['13mer']:
d[key] = list(s)
This also makes it much clearer where you should put your function!
... However, I think that it might be better suited to an enumerate:
d2 = {}
for pos, val in enumerate(df['13mer']):
d2[val] = pos

How to check if keys exists and retrieve value from Dictionary in descending priority

I have a dictionary and I would like to get some values from it based on some keys. For example, I have a dictionary for users with their first name, last name, username, address, age and so on. Let's say, I only want to get one value (name) - either last name or first name or username but in descending priority like shown below:
(1) last name: if key exists, get value and stop checking. If not, move to next key.
(2) first name: if key exists, get value and stop checking. If not, move to next key.
(3) username: if key exists, get value or return null/empty
#my dict looks something like this
myDict = {'age': ['value'], 'address': ['value1, value2'],
'firstName': ['value'], 'lastName': ['']}
#List of keys I want to check in descending priority: lastName > firstName > userName
keySet = ['lastName', 'firstName', 'userName']
What I tried doing is to get all the possible values and put them into a list so I can retrieve the first element in the list. Obviously it didn't work out.
tempList = []
for key in keys:
get_value = myDict.get(key)
tempList .append(get_value)
Is there a better way to do this without using if else block?
One option if the number of keys is small is to use chained gets:
value = myDict.get('lastName', myDict.get('firstName', myDict.get('userName')))
But if you have keySet defined, this might be clearer:
value = None
for key in keySet:
if key in myDict:
value = myDict[key]
break
The chained gets do not short-circuit, so all keys will be checked but only one used. If you have enough possible keys that the extra lookups matter, use the for loop.
Use .get(), which if the key is not found, returns None.
for i in keySet:
temp = myDict.get(i)
if temp is not None:
print temp
break
You can use myDict.has_key(keyname) as well to validate if the key exists.
Edit based on the comments -
This would work only on versions lower than 3.1. has_key has been removed from Python 3.1. You should use the in operator if you are using Python 3.1
If we encapsulate that in a function we could use recursion and state clearly the purpose by naming the function properly (not sure if getAny is actually a good name):
def getAny(dic, keys, default=None):
return (keys or default) and dic.get(keys[0],
getAny( dic, keys[1:], default=default))
or even better, without recursion and more clear:
def getAny(dic, keys, default=None):
for k in keys:
if k in dic:
return dic[k]
return default
Then that could be used in a way similar to the dict.get method, like:
getAny(myDict, keySet)
and even have a default result in case of no keys found at all:
getAny(myDict, keySet, "not found")

How to go from a values_list to a dictionary of lists

I have a django queryset that returns a list of values:
[(client pk, timestamp, value, task pk), (client pk, timestamp, value, task pk),....,].
I am trying to get it to return a dictionary of this format:
{client pk:[[timestamp, value],[timestamp, value],...,], client pk:[list of lists],...,}
The values_list may have multiple records for each client pk. I have been able to get dictionaries of lists for client or task pk using:
def dict_from_vl(vls_list):
keys=[values_list[x][3] for x in range(0,len(values_list),1)]
values = [[values_list[x][1], values_list[x][2]] for x in range(0,len(values_list),1)]
target_dict=dict(zip(keys,values))
return target_dict
However using this method, values for the same key write over previous values as it iterates through the values_list, rather than append them to a list. So this works great for getting the most recent if the values list is sorted oldest records to newest, but not for the purpose of creating a list of lists for the dict value.
Instead of target_dict=dict(zip(keys,values)), do
target_dict = defaultdict(list)
for i, key in enumerate(keys):
target_dict[k].append(values[i])
(defaultdict is available in the standard module collections.)
from collections import defaultdict
d = defaultdict(list)
for x in vls_list:
d[x].append(list(x[1:]))
Although I'm not sure if I got the question right.
I know in Python you're supposed to cram everything into a single line, but you could do it the old fashioned way...
def dict_from_vl(vls_list):
target_dict = {}
for v in vls_list:
if v[0] not in target_dict:
target_dict[v[0]] = []
target_dict[v[0]].append([v[1], v[2]])
return target_dict
For better speed, I suggest you don't create the keys and values lists separately but simply use only one loop:
tgt_dict = defaultdict(list)
for row in vas_list:
tgt_dict[row[0]].append([row[1], row[2]])

Categories

Resources