Efficient looping through dictionary with keys as tuple - python

I have a very large dictionary with 200 million keys. The keys are tuple with integer as individual elements of the tuple. I want to search for the key where the "query integer" lies within the two integers of the tuple in dictionary keys.
Currently, I am looping through all dictionary keys and comparing the integer with each element of tuple if it lies within that range. It works but the time to look up each query is around 1-2 minutes and I need to perform around 1 Million such queries. The example of the dictionary and the code which I have written are following:
Sample dictionary:
[{ (3547237440, 3547237503) : {'state': 'seoul teukbyeolsi', 'country': 'korea (south)', 'country_code': 'kr', 'city': 'seoul'} },
{ (403044176, 403044235) : {'state': 'california', 'country': 'united states', 'country_code': 'us', 'city': 'pleasanton'} },
{ (3423161600, 3423161615) : {'state': 'kansas', 'country': 'united states', 'country_code': 'us', 'city': 'lenexa'} },
{ (3640467200, 3640467455) : {'state': 'california', 'country': 'united states', 'country_code': 'us', 'city': 'san jose'} },
{ (853650485, 853650485) : {'state': 'colorado', 'country': 'united states', 'country_code': 'us', 'city': 'arvada'} },
{ (2054872064, 2054872319) : {'state': 'tainan', 'country': 'taiwan', 'country_code': 'tw', 'city': 'tainan'} },
{ (1760399104, 1760399193) : {'state': 'texas', 'country': 'united states', 'country_code': 'us', 'city': 'dallas'} },
{ (2904302140, 2904302143) : {'state': 'iowa', 'country': 'united states', 'country_code': 'us', 'city': 'hampton'} },
{ (816078080, 816078335) : {'state': 'district of columbia', 'country': 'united states', 'country_code': 'us', 'city': 'washington'} },
{ (2061589204, 2061589207) : {'state': 'zhejiang', 'country': 'china', 'country_code': 'cn', 'city': 'hangzhou'} }]
The code I have written:
ipint=int(ipaddress.IPv4Address(ip))
for k in ip_dict.keys():
if ipint >= k[0] and ipint <= k[1]:
print(ip_dict[k]['country'], ip_dict[k]['country_code'], ip_dict[k]['state'])
where ip is just ipaddress like '192.168.0.1'.
If anyone could provide a hint regarding more efficient way to perform this task, it would be much appreciated.
Thanks

I suggest you to use another structure with a good query complexity like a tree.
Maybe you can try this library I just found https://pypi.org/project/rangetree/
As they say, it is optimized for lookups but not for insertions so if you need to insert once and lopk many it should be OK.
Another solution is to not used a dict but a list, to order it and to build an index over it. Do a dichotomy on this index when there is a query (can be less optimal if ranges are not regular so I prefer the first solution)

Create a index for each of the 2 integers: a sorted list like this:
[(left_int, [list_of_row_ids_that have_this_left_int]),
(another_greater_left_int, [...])]
You can then search for all rows that have a left int greater than the searched one in log(n).
A binary search will do here.
Do the same for the right int.
Keep the rest of the data in a list of tuples.
More in detail:
data = [( (3547237440, 3547237503), {'state': 'seoul'} ), ...]
left_idx = [(3547237440, [0,43]), (9547237440, [3])]
# 0, 43, 3 are indices in the data list
# search
min_left_idx = binary_search(left_idx, 3444444)
# now all rows referred to by left_idx[min_left_idx] ... left_idx[-1] will satisfy your criteria
min_right_idx = ...
# between these 2 all referred rows satisfy the range check
# intersect the sets

Related

How to print specific key values from a list of dictionaries in Python? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 11 months ago.
Improve this question
I am trying to print specific key value from a list of dictionaries in Python, list is as following:
employees = [
{'name': 'Tanya', 'age': 20, 'birthday': '1990-03-10',
'job': 'Back-end Engineer', 'address': {'city': 'New York', 'country': 'USA'}},
{'name': 'Tim', 'age': 35, 'birthday': '1985-02-21', 'job': 'Developer', 'address': {'city': 'Sydney', 'country': 'Australia'}}]
How can print out specific specific key values and loop through each dictionary? Output should be Name, job and city.
Possible solution is following:
employees = [
{'name': 'Tanya', 'age': 20, 'birthday': '1990-03-10',
'job': 'Back-end Engineer', 'address': {'city': 'New York', 'country': 'USA'}},
{'name': 'Tim', 'age': 35, 'birthday': '1985-02-21', 'job': 'Developer', 'address': {'city': 'Sydney', 'country': 'Australia'}}]
for employee in employees:
print(f"{employee['name']}, {employee['job']}, {employee['address']['city']}")
Prints
Tanya, Back-end Engineer, New York
Tim, Developer, Sydney
for i in employees:
print("name:{0}, job:{1}, city:{2}".format(i['name'],i['job'],i['address']['city']))

Parsing deeply (multiple) nested JSON blocks with Python 3.6.8

I've seen several related "nested json in Python" questions but the syntax for this Corona virus JSON data is giving me problems. Here's a sample:
{"recovered":524855,"list":[
{"countrycode":"US","country":"United States of America","state":"South Carolina","latitude":"34.22333378","longitude":"-82.46170658","confirmed":15228,"deaths":568},
{"countrycode":"US","country":"United States of America","state":"Louisiana","latitude":"30.2950649","longitude":"-92.41419698","confirmed":43612,"deaths":2957}
]}
If I just want to get to Louisiana, here's what I was trying:
import json
import requests
url = "https://covid19-data.p.api.com/us"
headers = {
'x-api-key': "<api-key>",
'x-api-host': "covid19-data.p.api.com"
}
response = requests.request("GET", url, headers=headers)
coronastats = json.loads(response.text)
la_deaths = coronastats["list"][0]["countrycode"]["US"]["country"]["United States of America"]["state"]["Louisiana"]["deaths"]
print("Value: %s" % la_deaths)
I get: "TypeError: string indices must be integers"
This is obviously a list (I'm a detective and deduced that a variable named "list" might be a list) but the long key-value list is throwing me off (I think).
The problem is that once you get the first element of the list, you're left with only a depth-one dictionary. The data isn't as nested as you think it is. You're getting to a string quickly, and then trying to indice it using the US string, which raises the exception.
In [2]: data
Out[2]:
{'recovered': 524855,
'list': [{'countrycode': 'US',
'country': 'United States of America',
'state': 'South Carolina',
'latitude': '34.22333378',
'longitude': '-82.46170658',
'confirmed': 15228,
'deaths': 568},
{'countrycode': 'US',
'country': 'United States of America',
'state': 'Louisiana',
'latitude': '30.2950649',
'longitude': '-92.41419698',
'confirmed': 43612,
'deaths': 2957}]}
In [3]: data["list"][0]
Out[3]:
{'countrycode': 'US',
'country': 'United States of America',
'state': 'South Carolina',
'latitude': '34.22333378',
'longitude': '-82.46170658',
'confirmed': 15228,
'deaths': 568}
In [7]: data["list"][0]["countrycode"]
Out[7]: 'US'
In [8]: type(data["list"][0]["countrycode"])
Out[8]: str
In [9]: data["list"][0]["countrycode"]["asdf"]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-9-cb9dbc39ef82> in <module>
----> 1 data["list"][0]["countrycode"]["asdf"]
TypeError: string indices must be integers
To get to a specific country what you want to do is to FIND the state in the list, for example with code:
In [14]: [f"{row['state']}: {row['deaths']} deaths. Wear a mask!" for row in data["list"] if row["state"] == "Louisiana"]
Out[14]: ['Louisiana: 2957 deaths. Wear a mask!']
You can also use filter, pandas, and a million other solutions to sort through a table.
Try this:
coronastats = json.loads(response.text)
[coronastats["list"][i]["deaths"] for i in range(len(coronastats["list"])) if coronastats["list"][i]["state"] == "Louisiana"]

Find value of dictionary inside a dictionary

I have some data like this:
{'cities': [{'abbrev': 'NY', 'name': 'New York'}, {'abbrev': 'BO', 'name': 'Boston'}]}
From my scarce knowledge of Python this looks like a dictionary within a dictionary.
But either way how can I use "NY" as a key to fetch the value "New York"?
It's a dictionary with one key-value pair. The value is a list of dictionaries.
d = {'cities': [{'abbrev': 'NY', 'name': 'New York'}, {'abbrev': 'BO', 'name': 'Boston'}]}
To find the name for an abbreviation you should iterate over the dictionaries in the list and then compare the abbrev-value for a match:
for city in d['cities']: # iterate over the inner list
if city['abbrev'] == 'NY': # check for a match
print(city['name']) # print the matching "name"
Instead of the print you can also save the dictionary containing the abbreviation, or return it.
When you've got a dataset not adapted to your need, instead of using it "as-is", you can build another dictionary from that one, using a dictionary comprehension with key/values as values of your sub-dictionaries, using the fixed keys.
d = {'cities': [{'abbrev': 'NY', 'name': 'New York'}, {'abbrev': 'BO', 'name': 'Boston'}]}
newd = {sd["abbrev"]:sd["name"] for sd in d['cities']}
print(newd)
results in:
{'NY': 'New York', 'BO': 'Boston'}
and of course: print(newd['NY']) yields New York
Once the dictionary is built, you can reuse it as many times as you need with great lookup speed. Build other specialized dictionaries from the original dataset whenever needed.
Use next and filter the sub dictionaries based upon the 'abbrev' key:
d = {'cities': [{'abbrev': 'NY', 'name': 'New York'},
{'abbrev': 'BO', 'name': 'Boston'}]}
city_name = next(city['name'] for city in d['cities']
if city['abbrev'] == 'NY')
print city_name
Output:
New York
I think that I understand your problem.
'NY' is a value, not a key.
Maybe you need something like {'cities':{'NY':'New York','BO':'Boston'}, so you could type: myvar['cities']['NY'] and it will return 'New York'.
If you have to use x = {'cities': [{'abbrev': 'NY', 'name': 'New York'}, {'abbrev': 'BO', 'name': 'Boston'}]} you could create a function:
def search(abbrev):
for cities in x['cities']:
if cities['abbrev'] == abbrev:
return cities['name']
Output:
>>> search('NY')
'New York'
>>> search('BO')
'Boston'
PD: I use python 3.6
Also with this code you could also find abbrev:
def search(s, abbrev):
for cities in x['cities']:
if cities['abbrev'] == abbrev: return cities['name'], cities['abbrev']
if cities['name'] == abbrev: return cities['name'], cities['abbrev']

Strip Quote from key Using DictReader

I am currently reading data out from a csv files, and i wanted to turn it into a dictionary, Key Value Pair.
I was able to do that using csv.DictReader. But is there anyway to strip the quotes from the keys?
I have it print out like this
{'COUNTRY': 'Germany', 'price': '49', 'currency': 'EUR', 'ID': '1', 'CITY': 'Munich'}
{'COUNTRY': 'United Kingdom', 'price': '40', 'currency': 'GBP', 'ID': '2', 'CITY': 'London'}
{'COUNTRY': 'United Kingdom', 'price': '40', 'currency': 'GBP', 'ID': '3', 'CITY': 'Liverpool'}
is there anyway to make it look like this
{COUNTRY: 'Germany', price: '49', currency: 'EUR', ID: '1', CITY: 'Munich'}
{COUNTRY: 'United Kingdom', price: '40', currency: 'GBP', ID: '2', CITY: 'London'}
{COUNTRY: 'United Kingdom', price: '40', currency: 'GBP', ID: '3', CITY: 'Liverpool'}
import csv
input_file = csv.DictReader(open("201611022225.csv"))
for row in input_file:
print row
Python uses quotes to indicate that it is a string object when printing. In your case, the dictionary uses string as keys, so when you print, it shows the quotes. But it doesn't actually save the quotes as part of the data, it's just to indicate the data type.
For example, if you write this to a text file and open it later, it will not show you quotes.

Get an ordered dictionary from an ordinary one

What is a good way to get an ordered dictionary from a regular dictionary? I need the keys (and these keys are known ahead of time) to be in a certain order. I will be "dump"ing a list of these dictionaries into a JSON file and need things ordered a certain way.
--- Edited and added the following
For instance i have a dictionary ...
employee = { 'phone': '1234567890', 'department': 'HR', 'country': 'us', 'name': 'Smith' }
when i dump it into JSON format, i would like for it to print out as
{ 'name': 'Smith', 'department': 'HR', 'country': 'us', 'phone': '1234567890'}
Sort your dict items and create an OrderedDict from the sorted elements making sure to pass reverse=True to sort from highest to lowest:
from collections import OrderedDict
order = ("name","department","country","phone")
employee = { 'phone': '1234567890', 'department': 'HR', 'country': 'us', 'name': 'Smith' }
od = OrderedDict((k, employee[k]) for k in order)
But if you dump to a json file and load again the order will not be maintained and you will not get an OrderedDict back, when you dump it will look like:
{"name": "Smith", "department": "HR", "country": "us", "phone": "1234567890"}
But loading will will not be in the same order because normal dicts have no order like below:
{'phone': '1234567890', 'name': 'Smith', 'country': 'us', 'department': 'HR'}
If you are trying to just store the dicts to use again and want to maintain order you can pickle:
import pickle
with open("foo.pkl","wb") as f:
pickle.dump(od,f)
with open("foo.pkl","rb") as f:
d = pickle.load(f)
print(d)
You could do something like the following ... you collect the keys in order in a list of String, traverse through the list and look up in the dictionary, and create an ordered dictionary
def makeOrderedDict(dictToOrder, keyOrderList):
tupleList = []
for key in keyOrderList:
tupleList.append((key, dictToOrder[key]))
return OrderedDict(tupleList)

Categories

Resources