searching for distinct dict in a list - python

With the following list of dict:
[ {'isn': '1', 'fid': '4', 'val': '1', 'ptm': '05/08/2019 14:22:39', 'sn': '111111' 'procesado': '0'},
{'isn': '1', 'fid': '4', 'val': '0', 'ptm': '05/08/2019 13:22:39', 'sn': '111111', 'procesado': '0'},
<...> ]
I would need to compare for each dict of the list if there are other element with:
equal fid
equal sn
distinct val (if val(elemX)=0 then val(elemY)=1)
distinct ptm (if val=0 of elemX then ptm of elemX < ptm of elemY)
This could be done in a traditional way using an external for loop an internal while, but this is not the optimal way to do it.
Trying to find a way to do that, I tried with something like this:
for p in lista:
print([item for item in lista if ((item["sn"] == p["sn"]) & (item["val"] == 0) & (p["val"] == 1) & (
datetime.strptime(item["ptm"], '%d/%m/%Y %H:%M:%S') < datetime.strptime(p["ptm"],'%d/%m/%Y %H:%M:%S')))])
But this does not work (and also is not optimal)

Just build a mapping from (fid,sn,val) onto a list of candidates (the whole dict, its index, or just its ptm (shown below), depending on what output you need). Also check whether any of its opposite numbers (under (fid,sn,!val)) are already present and do the ptm comparison if so:
seen={}
for d in dd:
f=d['fid']; s=d['sn']; v=int(d['val'])
p=datetime.strptime(d['ptm'],'%d/%m/%Y %H:%M:%S')
for p0 in seen.get((f,s,not v),()):
if p0!=p and (p0<p)==v: …
seen.setdefault((f,s,v),[]).append(p)
If you have a large number of values with the same key, you could use a tree to hasten the ptm comparisons, but that seems unlikely here. Using real data types for the individual values, and perhaps a namedtuple to contain them, would of course make this a lot nicer.

Related

Extracting Strings From a List

Hi I'm fairly new to Python and needed help with extracting strings from a list. I am using Python on Visual Studios.
I have hundreds of similar strings and I need to extract specific information so I can add it to a table in columns - the aim is to automate this task using python. I would like to extract the data between the headers 'Names', 'Ages' and 'Jobs'. The issue I am facing is that the number of entries of names, ages and jobs varies a lot within all the lists and so I would like to write unique code which could apply to all the lists.
list_x = ['Names','Ashley','Lee','Poonam','Ages', '25', '35', '42' 'Jobs', 'Doctor', 'Teacher', 'Nurse']
I am struggling to extract
['Ashley', 'Lee', 'Poonam']
I have tried the following:
for x in list_x:
if x == 'Names':
for y in list_x:
if y == 'Ages':
print(list_x[x:y])
This however comes up with the following error:
"Exception has occurred: typeError X
slice indices must be integers or None or have an index method"
Is there a way of doing this without specifying exact indices?
As the comment suggested editing the data is the easiest way to go, but if you have to...
newList = oldList[oldList.index('Names') + 1:oldList.index("Ages")]
It just finds the indices of "Names" and "Ages" in the list, and extracts the bit between.
Lots can (and will) go wrong with this method though - if there's a name which is "Names", or if they are misspelt, etc.
For completeness sake, it might be not a bad idea to use an approach similar to the below.
First, build a list of indices of each of the desired headers:
list_x = ['Names', 'Ashley', 'Lee', 'Poonam', 'Ages', '25', '35', '42', 'Jobs', 'Doctor', 'Teacher', 'Nurse']
headers = ('Names', 'Ages', 'Jobs')
header_indices = [list_x.index(header) for header in headers]
print('indices:', header_indices) # [0, 4, 8]
Then, create a list of values for each header, which we can infer from the positions where each header shows up in the list:
values = {}
for i in range(len(header_indices)):
header = headers[i]
start = header_indices[i] + 1
try:
values[header] = list_x[start:header_indices[i + 1]]
except IndexError:
values[header] = list_x[start:]
And finally, we can display it for debugging purposes:
print('values:', values)
# {'Names': ['Ashley', 'Lee', 'Poonam'], 'Ages': ['25', '35', '42'], 'Jobs': ['Doctor', 'Teacher', 'Nurse']}
assert values['Names'] == ['Ashley', 'Lee', 'Poonam']
For better time complexity O(N), we can alternatively use an approach like below so that we only have one for loop over the list to build a dict object with the values:
from collections import defaultdict
values = defaultdict(list)
header_idx = -1
for x in list_x:
if x in headers:
header_idx += 1
else:
values[headers[header_idx]].append(x)
print('values:', values)
# defaultdict(<class 'list'>, {'Names': ['Ashley', 'Lee', 'Poonam'], 'Ages': ['25', '35', '42'], 'Jobs': ['Doctor', 'Teacher', 'Nurse']})

Searching List of non-standardized Dictionaries

I scraped a betting website API. Scraping this API returned JSON in what is essentially a list of dictionaries. I am trying to search through this list of dictionaries and return specific bets and odds. Below is my code:
Team1 = 'Lechia Gdańsk'
Team2 = 'Sokół Ostróda'
WDW = [{'id': 2853153615, 'label': '1', 'englishLabel': '1', 'odds': 1250, 'participant': 'Lechia Gdańsk', 'type': 'OT_ONE', 'betOfferId': 2244628386, 'changedDate': '2021-01-14T18:52:07Z', 'participantId': 1000020086, 'oddsFractional': '1/4', 'oddsAmerican': '-400', 'status': 'OPEN', 'cashOutStatus': 'ENABLED'}, {'id': 2853153626, 'label': 'X', 'englishLabel': 'X', 'odds': 5750, 'type': 'OT_CROSS', 'betOfferId': 2244628386, 'changedDate': '2021-01-14T18:52:07Z', 'oddsFractional': '19/4', 'oddsAmerican': '475', 'status': 'OPEN', 'cashOutStatus': 'ENABLED'}, {'id': 2853153638, 'label': '2', 'englishLabel': '2', 'odds': 7000, 'participant': 'Sokół Ostróda', 'type': 'OT_TWO', 'betOfferId': 2244628386, 'changedDate': '2021-01-14T18:52:07Z', 'participantId': 1001302448, 'oddsFractional': '6/1', 'oddsAmerican': '600', 'status': 'OPEN', 'cashOutStatus': 'ENABLED'}]
unisoccerWDWTeam1 = next((item['odds']/1000 for item in WDW if item['participant'] == Team1), None)
unisoccerWDWTeam2 = next((item['odds']/1000 for item in WDW if item['participant'] == Team2), None)
unisoccerWDWDraw = next((item['odds']/1000 for item in WDW if item['label'] == 'X'), None)
I use next() and dictionary comprehension to search for a matching team name and then return the odds from the list of dictionaries. The issue is the dictionaries are not uniform within the list and so when my code encounters a dictionary without the key I'm searching for it returns an error. In this case WDW[1] does not have 'participant' in the dictionary. Is there a way to avoid this problem or skip dictionaries that do not contain the Key:Value pair I am searching for? I am considering simply writing nested for loops maybe with try except blocks to avoid this however this seems like a slow and inelegant solution. Any suggestions or help would be much appreciated.
To simply avoid the exception, you can use item.get('participant'), which will return None if "participant" doesn't exist, rather than raising an exception. (Which is similar to the None default you are using for you next call.
Depending on how you implement the loop, this should allow you to silently skip dictionaries that don't contain the keys you are looking for.
This would be something like:
unisoccerWDWTeam1 = next((item['odds']/1000 for item in WDW if item.get('participant') == Team1), None)
Code not tested, but should get you started.

Fix Ordering in dictionary

I am trying to limit the ordering of a dictionary but the ouput that I get are returned in a sorted order which is something I do not want.
test = {}
test['bbb'] = '0'
test['aaa'] = '1'
# Returns me {'aaa': '1', 'bbb': '0'} when I am expecting {'bbb': '0', 'aaa': '1'}
The above is a simple example, in which both aaa and bbb are queried from a list. And I had thought this may have resolves the ordering but it did not.
Dictionary is not preserving the order of added elements. You need to use collections.OrderedDict instead to have this feature available.

How do I turn list values into an array with an index that matches the other dic values?

Hoping someone can help me out. I've spent the past couple hours trying to solve this, and fair warning, I'm still fairly new to python.
This is a repost of a question I recently deleted. I've misinterpreted my code in the last example.The correct example is:
I have a dictionary, with a list that looks similar to:
dic = [
{
'name': 'john',
'items': ['pants_1', 'shirt_2','socks_3']
},
{
'name': 'bob',
items: ['jacket_1', 'hat_1']
}
]
I'm using .append for both 'name', and 'items', which adds the dic values into two new lists:
for x in dic:
dic_name.append(dic['name'])
dic_items.append(dic['items'])
I need to split the item value using '_' as the delimiter, so I've also split the values by doing:
name, items = [i if i is None else i.split('_')[0] for i in dic_name],
[if i is None else i.split('_')[0] for i in chain(*dic_items)])
None is used in case there is no value. This provides me with a new list for name, items, with the delimiter used. Disregard the fact that I used '_' split for names in this example.
When I use this, the index for name, and item no longer match. Do i need to create the listed items in an array to match the name index, and if so, how?
Ideally, I want name[0] (which is john), to also match items[0] (as an array of the items in the list, so pants, shirt, socks). This way when I refer to index 0 for name, it also grabs all the values for items as index 0. The same thing regarding the index used for bob [1], which should match his items with the same index.
#avinash-raj, thanks for your patience, as I've had to update my question to reflect more closely to the code I'm working with.
I'm reading a little bit between the lines but are you trying to just collapse the list and get rid of the field names, e.g.:
>>> dic = [{'name': 'john', 'items':['pants_1','shirt_2','socks_3']},
{'name': 'bob', 'items':['jacket_1','hat_1']}]
>>> data = {d['name']: dict(i.split('_') for i in d['items']) for d in dic}
>>> data
{'bob': {'hat': '1', 'jacket': '1'},
'john': {'pants': '1', 'shirt': '2', 'socks': '3'}}
Now the data is directly related vs. indirectly related via a common index into 2 lists. If you want the dictionary split out you can always
>>> dic_name, dic_items = zip(*data.items())
>>> dic_name
('bob', 'john')
>>> dic_items
({'hat': '1', 'jacket': '1'}, {'pants': '1', 'shirt': '2', 'socks': '3'})
You need a list of dictionaries because the duplicate keys name and items are overwritten:
items = [[i.split('_')[0] for i in d['items']] for d in your_list]
names = [d['name'] for d in your_list] # then grab names from list
Alternatively, you can do this in one line with the built-in zip method and generators, like so:
names, items = zip(*((i['name'], [j.split('_')[0] for j in i['items']]) for i in dic))
From Looping Techniques in the Tutorial.
for name, items in div.items():
names.append(name)
items.append(item)
That will work if your dict is structured
{'name':[item1]}
In the loop body of
for x in dic:
dic_name.append(dic['name'])
dic_items.append(dic['items'])
you'll probably want to access x (to which the items in dic will be assigned in turn) rather than dic.

Comparing values in dictionaries python

I have 2 nested dictionaries in Python that have this format:
1166869: {'probL2': '0.000', 'probL1': '0.000', 'pronNDiff_site': '1.000', 'StateBin': '0', 'chr': 'chrX', 'rangehist': '59254000-59255000', 'start_bin': '59254000', 'countL2': '4', 'countL1': '0'}
1166870: {'probL2': '0.148', 'probL1': '0.000', 'pronNDiff_site': '0.851', 'StateBin': '0', 'chr': 'chr2', 'rangehist': '59254000-59255000', 'start_bin': '59255000', 'countL2': '5', 'countL1': '15'}
1166871: {'probL2': '0.000', 'probL1': '0.000', 'pronNDiff_site': '1.000', 'StateBin': '0', 'chr': 'chrY', 'rangehist': '59290000-59291000', 'start_bin': '59290000', 'countL2': '1', 'countL1': '2'}
where 1166869, 1166870 and 1166871 represent a line in a file from where I read the data, and the rest of the keys are the data itself.
Now I want to make a list where I store all the different values in the key "chr" because there are some repeated ones.
How can I go through the dictionary and make the comparison between the 2 values? This code is not working:
for k in range(len(file_dict)):
for j in range(len(file_dict)-1):
if (file_dict[j]["chr"] != file_dict[k]["chr"]):
list_chr.append(file_dict[j]["chr"])
Use a set, and just all the items in one go:
chr = { v['chr'] for v in file_dict.itervalues() }
This uses a set comprehension to generate your set in one line of code.
Set comprehensions were introduced in Python 2.7; in earlier versions use:
chr = set(v['chr'] for v in file_dict.itervalues())
In Python 3, you'd need to replace .itervalues() by .values().
Your own code doesn't work because python dictionaries are not lists; you don't retrieve values by index, but by key. You'd have to change it to:
for key in file_dict:
for other_key in file_dict:
if key == other_key:
continue
if file_dict[key]['chr'] != file_dict[otherkey]['chr']:
list_chr.append(filed_dict[key]['chr'])
but that is really inefficient, not to mention incorrect.
how about something along the lines of:
list_chr = list(set([val['chr'] for val in file_dict.values()]))
how does this work?
first a list comprehension gets all the chr entries in the inner dict
these are then converted to a set, such that there are no duplicate entries
these are then converted to a list if that's the format you prefere
please note that maybe you really want to use a set, then the look up time is O(1) instead of O(n)

Categories

Resources