Python: Compare values from 2 dictionaries with the same key - python

Hello i am new to python and i have a question about dictionaries:
Let's say we have a dictionary :
Cars {"Audi": {"Wheels": 4, "Body": 1, "Other": 20},"Ford": {"Wheels": 2, "Body": 3, "Other":10},"BMW": {"Wheels": 5, "Body": 0.5, "Other": 30}}
And and other dictionary:
Materials {"Wheels": 30, "Body": 5, "Other": 110}
I want to return the number of cars i can produce with the materials i have so:
def production(car,Materials):
return
production("Audi",Materials)
My output in this example should return the number 5,because there are only 5 body parts to use.
I was thinking to make it somehow like this:
Divide the values from Materials with values from cars. Then write the numbers to an other list ,and then return the min number in whole.
More exaples:
production("BMW",Materials)
3.0 # because the value of key only is 110 and for 3 cars we need 90 other:
production("Ford",Materials)
1.0 # because the value of key body is 3 and for 1 car we need 3 body:
I thank you in advance for everything.

If what you want is to see how many of any given car can be created without actually affecting the contents of Materials, you could write your method like so:
def number_of_units_creatable(car_key):
required_parts = Cars[car_key]
return min(Materials["Wheels"] // required_parts["Wheels"],
Materials["Body"] // required_parts["Body"],
Materials["Other"] // required_parts["Other"])
In production, you'd want to add conditional guards to check whether your Cars and Materials have all the required keys. You'll get an exception if you try to get the value for a key that doesn't exist.
This will allow you to figure out the maximum number of any given car you can create with the resources available in Materials.
I'd strongly recommend you not use nested dicts like this, though - this design would be greatly helped by creating, say, a Materials class, and storing this as your value rather than another dictionary. abarnert has a little more on this in his post.
Another note, prompted by abarnert - it's an extremely bad idea to rely on all a shared, static set of keys between two separate dictionaries. What happens if you want to build, say, an armored car, and now you need a gun? Either you have to add Gun: 0 within the required attributes of every car, or you'll run into an exception. Every single car will require an entry for every single part required by each and every car in existence, and a good deal of those will signify nothing other than the fact that the car doesn't need it. As it stands, your design is both very constraining and brittle - chance are good it'll break as soon as you try and add something new.

If the set of possible materials is a static collection—that is, it can only have "Wheels", "Body", and "Other"—then you really ought to be using a class rather than a dict, as furkle's answer suggests, but you can fake it with your existing data structure, as his answer shows.
However, if the set of possible materials is open-ended, then you don't want to refer to them one by one explicitly; you want to loop over them. Something like:
for material, count in car.items():
In this case:
return min(Materials[material] // count for material, count in car.items())

You can iterate over materials and decrement the values until one become 0:
def production(car, materials):
count = 0
while 0 not in materials.values():
for part in cars[car]:
materials[part] -= 1
count += 1
return count
If you don't want to change the material dict:
def production(car, materials):
count = 0
vals = materials.values()
while not 0 in vals:
for ind, part in enumerate(Cars[car]):
vals[ind] -= 1
count += 1
return count

Related

Separating from a List of Dictionaries in Python with optional parameters

I am attempting to match a very long list of Python dictionaries. What I'm looking for is to append dicts from this list into a new list based on the values of the keys of the dict. An example of what I have is:
A list of 1000+ dictionaries structured like this:
{'regions': ['north','south'],
'age':35,
'name':'john',
'cars':['ford','kia']}
I want to sort and match through this list using almost all keys and append the matching ones to a new list. Sometimes I might be searching with only age, where as other times I will be searching with regions & name, all the way to searching with all keys like age, name, regions, & cars because all parameters to search with are optional.
I currently have used for loops to sort through it, but as I add more and more optional parameters, it gets slower and more complex. Is there an easier way to accompany what I am doing?
An example of what the user would send is:
regions: north
age: 10
And it would return a list of all dictionaries with north as a region and 10 as age
I think this one is pretty open ended, especially because you suggest that you want this to be extensible as you add more keys etc. and haven't really discussed your operational requirements. But here are a few thoughts:
Third party modules
Are these dictionaries going to get any more nested? Or is it going to always be 'key -> value' or 'key -> [list, of, values]'?
If you can accept a chunky dependency, you might consider something like pandas, which we normally think of as representing tables, but which can certainly manage nesting to some degree.
For example:
from functools import partial
from pandas import DataFrame
from typing import Dict
def matcher(comparator, target=None) -> bool:
"""
matcher
Checks whether a value matches or contains a target
(won't match if the target is a substring of the value)
"""
if target == comparator: # simple case, return True immediately
return True
if isinstance(comparator, str):
return False # doesn't match exactly and string => no match
try: # handle looking in collections
return target in comparator
except TypeError: # if it fails, you know there's no match
return False
def search(data: DataFrame, query: Dict) -> DataFrame:
"""
search
Pass in a DataFrame and a query in the form of a dictionary
of keys and values to match, for example:
{"age": 42, "regions": "north", "cars": "ford"}
Returns a matching subset of the data
"""
# each element of the resulting list is a boolean series
# corresponding to a dictionary key
masks = [
data[key].map(partial(matcher, target=value)) for key, value in query.items()
]
# collapse the masks down to a single boolean series indicating
# whether ALL conditions are met for each record
mask = pd.concat(masks, axis="columns").all(axis="columns")
return data.loc[mask]
if __name__ == "__main__":
data = DataFrame(your_big_list)
query = {"age": 35, "regions": "north", "cars": "ford"}
results = search(data, query)
list_results = results.to_dict(orient="records")
Here list_results would restore the filtered data to the original format, if that's important to you.
I found that the matcher function had to be surprisingly complicated, I kept thinking of edge-cases (like: we need to support searching in a collection, but in can also find substrings, which isn't what we want ... unless it is, of course!).
But at least all that logic is walled off in there. You could write a series of unit tests for it, and if you extend your schema in future you can then alter the function accordingly and check the tests still pass.
The search function then is purely for nudging pandas into doing what you want with the matcher.
match case
In Python 3.10 the new match case statement might allow you to very cleanly encapsulate the matching logic.
Performance
A pretty fundamental issue here is that if you care about performance (and I got the sense that this was secondary to maintainability for you) then
the bigger the data get, the slower things will be
python is already not fast, generally speaking
You could possibly improve things by building some sort of index for your data. Ultimately, however, it's always going to be more reliable to use a specialist tool. That's going to be some sort of database.
The precise details will depend on your requirements. e.g. are these data going to become horribly unstructured? Are any fields going to be text that will need to be properly indexed in something like Elasticsearch/Solr?
A really light-touch solution that you could implement in the short term with Python would be to
chuck that data into SQLite
rely on SQL for the searching
I am suggesting SQLite since it runs out-of-the-box and just in a single local file:
from sqlalchemy import create_engine
engine = create_engine("sqlite:///mydb.sql")
# ... that's it, we can now connect to a SQLite DB/the 'mydb.sql' file that will be created
... but the drawback is that it won't support array-like data. Your options are:
use postgreSQL instead and take the hit of running a DB with more firepower
normalise those data
I don't think option 2 would be too difficult. Something like:
REGIONS
id | name
----------
1 | north
2 | south
3 | east
4 | west
CUSTOMERS
id | age | name
---------------
...
REGION_LINKS
customer_id | region_id
-----------------------
1 | 1
1 | 2
I've called the main data table 'customers' but you haven't mentioned what these data really represent, so that's more by way of example.
Then your SQL queries could get built and executed using sqlalchemy's ORM capabilities.
I made a code to do this
test = [
{
'regions': ['south'],
'age':35,
'name':'john',
'cars':['ford']
},
{
'regions': ['north'],
'age':15,
'name':'michael',
'cars':['kia']
},
{
'regions': ['north','south'],
'age':20,
'name':'terry',
'cars':['ford','kia']
},
{
'regions': ['East','south'],
'age':35,
'name':'user',
'cars':['other','kia']
},
{
'regions': ['East','south'],
'age':75,
'name':'john',
'cars':['other']
}
]
def Finder(inputs: list, regions: list = None, age: int = None, name: str = None, cars: list = None) -> list:
output = []
for input in inputs:
valid = True
if regions != None and valid: valid = all(i in input["regions"] for i in regions)
if age != None and valid: valid = input["age"] == age
if name != None and valid: valid = input["name"] == name
if cars != None and valid: valid = all(i in input["cars"] for i in cars)
if valid:
output.append(input)
return output
print(Finder(test))
print(Finder(test, age = 25))
print(Finder(test, regions = ["East"]))
print(Finder(test, cars = ["ford","kia"]))
print(Finder(test, name = "john", regions = ["south"]))
This function just checks all parameters and check if the input is valid, and he puts all valid inputs in an output list

Testing if a dictionary key ends in a letter?

tools = {"Wooden_Sword1" : 10, "Bronze_Helmet1 : 20}
I have code written to add items, i'm adding an item like so:
tools[key_to_find] = int(b)
the key_to_find is the tool and the b is the durability and i need to find a way so if i'm adding and Wooden_Sword1 already exists it adds a Wooden_Sword2 instead. This has to work with other items as well
As user3483203 and ShadowRanger commented, it's probably a bad idea to use numbers in your key string as part of the data. Manipulating those numbers will be awkward, and there are better alternatives. For instance, rather than storing a single value for each numbered key, use simple keys and store a list. The index into the list will take the place of the number in the key.
Here's how you could implement it:
tools = {"Wooden_Sword" : [10], "Bronze_Helmet" : [20]}
Add a new wooden sword with durability 10:
tools.setdefault("Wooden_Sword", []).append(10)
Find how many bronze helmets we have:
helmets = tools.get("Bronze_Helmet", [])
print("we have {} helmets".format(len(helmets)))
Find the first bronze helmet with a non-zero durability, and reduce it by 1:
helmets = tools.get("Bronze_Helmet", [])
for i, durability in helmets:
if durability > 0:
helmets[i] -= 1
break
else: # this runs if the break statement was never reached and the loop ran to completion
take_extra_damage() # or whatever
You could simplify some of this code by using a collections.defaultdict instead of a regular dictionary, but if you learn how to use get and setdefault it's not too hard to get by with the regular dict.
To ensure a key name is not taken yet, and add a number if it is, create the new name and test. Then increment the number if it is already in your list. Just repeat until none is found.
In code:
def next_name(basename, lookup):
if basename not in lookup:
return basename
number = 1
while basename+str(number) in lookup:
number += 1
return basename+str(number)
While this code does what you ask, you may want to look at other methods. A possible drawback is that there is no association between, say, WoodenShoe1 and WoodenShoe55 – if 'all wooden shoes' need their price increased, you'd have to iterate over all possible names between 1 and 55, just in case these existed at some time.
From what I understand of the question, your keys have 2 parts: "Name" and "ID". The ID is just an integer that starts at 1, so you can initialize a counter for every name:
numOfWoodenSwords = 0
And to add to the array:
numOfWoodenSwords += 1
tools["wodden_sword" + str(numOfWoodenSwords)] = int(b)
If you need to have an unknown amount of tools, I recommend looking at the re module: https://docs.python.org/3/library/re.html.
Or you could iterate over tools.keys to see if the entry exists.
You could write a function that determines if a character is a letter:
def is_letter(char):
return 65 <= ord(char) <= 90 or 97 <= ord(char) <= 122
Then when you are looking at a key in your dictionary, simply:
if is_letter(key[-1]):
...

How do I replace lines in a file using data contained elsewhere in the same file?

Let's say I have a file called 'Food' listing the names of some food, and their prices. Some of these items are raw ingredients, and others are made from different amounts of these- for example i might manually list the price of eggs as 1 and find that the omelette has a default price of 10, but then find that an omelette will only need 5 eggs, so i would need the program to read the price of eggs, find the line containing the omelette, and replace it with "omelette: " + str(5*eggs). I may also need to add extra ingredients/ items of food e.g. a pile of omelettes which is made from 5 omelettes. the basic goal would be to make it possible to just edit the value of eggs, and the value of omelette and pileofomelettes to update. I've started the code simply by creating a list of the lines contained within the file.
with open("Food.txt") as g:
foodlist=g.readlines()
The file 'Food.txt' would be in the following format:
eggs: 5
omelette: 20
pileofomelettes: 120
etc...
and after the code runs it should look like
eggs: 5
omelette: 25
pileofomelettes: 125
I would code the relations manually since they would be so unlikely to ever change (and even if they did it would be fairly easy for me to go in and change the coefficients)
and would be read by python in its list format as something like
'['egg 2\n', 'flour 1\n', 'butter 1\n', 'sugar 3\n', 'almond 5\n', 'cherry 8\n']'
I have searched for search/replace algorithms that can search for a specific phrase and replace it with another specific phrase, but i don't know how i'd apply it if the line was subject to change (the user could change the raw ingredient values if he wanted to update all of the values related to it). One solution i can think of involves converting them into a dictionary format, with them all listed as a string-integer value pair, so that i could just replace the integer part of the pair based on the integer values stored within other string-integer pairs, but, being inexperienced, I don't know how i'd convert the list (or the raw file itself, even better) into a dictionary.
Any advice on how to carry out steps of this program would be greatly appreciated :)
EDIT- in the actual application of the program, it doesn't matter what order the items are listed in in the final file, so if i listed out all the raw ingredients in 1 place and all of the composite items in another (With a large space in between them if more raw items need to be added) then i could just re-write the entire second half of the file in an arbitrary order with no problem- so long as the line position of the raw ingredients remains the same.
Okay, I would suggest make a relations text file which you can parse in case you think the relations can later change, or just so that your code is easier to read and mutable. This can be then parsed to find the required relations between raw ingredients and complexes. Let it be "relations.txt" , and of the type:
omelette: 5 x eggs + 1 x onions
pileofomelettes: 6 x omelette
Here, you can put arbitrary number of ingredients of the type:
complex: number1 x ingredient1 + number2 x ingredient2 + ...
and so on.
And your food.txt contains prices of all ingredients and complexes:
eggs: 2
onions: 1
omelette: 11.0
pileofomelettes: 60
Now we can see that the value for pileofomlettes is intentionally not mapped here correctly. So, we will run the code below, and also you can change numbers and see the results.
#!/usr/bin/python
''' This program takes in a relations file and a food text files as inputs
and can be used to update the food text file based on changes in either of these'''
relations = {}
foodDict = {}
# Mapping ingredients to each other in the relations dictionary
with open("relations.txt") as r:
relationlist=r.read().splitlines()
for relation in relationlist:
item, equatedTo = relation.split(': ')
ingredientsAndCoefficients = equatedTo.split(' + ')
listIngredients = []
for ingredient in ingredientsAndCoefficients:
coefficient, item2 = ingredient.split(' x ')
# A list of sets of amount and type of ingredient
listIngredients.append((float(coefficient),item2))
relations.update({item:listIngredients})
# Creating a food dictionary with values from food.txt and mapping to the relations dictionary
with open("food.txt") as g:
foodlist=g.read().splitlines()
for item in foodlist:
food,value = item.split(': ')
foodDict.update({food:value})
for food in relations.keys():
# (Raw) Ingredients with no left hand side value in relations.txt will not change here.
value = 0.
for item2 in range(len(relations[food])):
# Calculating new value for complex here.
value += relations[food][item2][0]* float(foodDict[relations[food][item2][1]])
foodDict.update({food: value })
# Altering the food.txt with the new dictionary values
with open("food.txt",'w') as g:
for key in sorted(foodDict.keys()):
g.write(key + ': ' + str (foodDict[key])+ '\n')
print key + ': ' + str(foodDict[key])
And it comes out be:
eggs: 2
onions: 1
omelette: 11.0
pileofomelettes: 66.0
You can change the price of eggs to 5 in the food.txt file, and
eggs: 5
onions: 1
omelette: 26.0
pileofomelettes: 156.0
How does your program know the components of each item? I suggest that you keep two files: one with the cost of atomic items (eggs) and another with recipes (omelette <= 5 eggs).
Read both files. Store the atomic costs, remembering how many of these items you have, atomic_count. Extend this table from the recipes file, one line at a time. If the recipe you're reading consists entirely of items with known costs, then compute the cost and add that item to the "known" list. Otherwise, append the recipe to a "later" list and continue.
When you reach the end of both input files, you will have a list of known costs, and a few other recipes that depended on items farther down the recipe file. Now cycle through this "unknown" list until (a) it's empty; (b) you don't have anything with all the ingredients known. If case (b), you have something wrong with your input: either an ingredient with no definition, or a circular dependency. Print the remaining recipes list and debug your input files.
In case (a), you are now ready to print your Food.txt list. Go through your "known" list and write out one item or recipe at a time. When you get to item [atomic_count], write out a second file, a new recipe list. This is your old recipe list, but in a useful top-down order. In the future, you won't have any "unknown" recipes after the first pass.
For future changes ... don't bother. You have only 173 items, and the list sounds unlikely to grow past 500. When you change or add an item, just hand-edit the file and rerun the program. That will be faster than the string-replacement algorithm you're trying to write.
In summary, I suggest that you do just the initial computation problem, which is quite a bit simpler than adding the string update. Don't do incremental updates; redo the whole list from scratch. For such a small list, the computer will do this faster than you can write and debug the extra coding.
I'm still not really sure what you are asking but this is what I came up with...
from collections import OrderedDict
food_map = {'omelette': {'eggs': 5, 'price': None}, 'pileofomelettes': {'eggs': 25, 'price': None}, 'eggs': {'price': 5}}
with open('food.txt', 'rb') as f:
data = f.read().splitlines()
data = OrderedDict([(x[0], int(x[1])) for x in [x.split(': ') for x in data]])
for key, val in data.items():
if key == 'eggs':
continue
food_rel = food_map.get(key, {})
val = food_rel.get('eggs', 1) * food_map.get('eggs', {}).get('price', 1)
data[key] = val
with open('out.txt', 'wb') as f:
data = '\n'.join(['{0}: {1}'.format(key, val) for key, val in data.items()])
f.write(data)

ArcMap Field Calculator Program to create Unique ID's

I'm using the Field Calculator in ArcMap and
I need to create a unique ID for every storm drain in my county.
An ID Should look something like this: 16-I-003
The first number is the municipal number which is in the column/field titled "Munic"
The letter is using the letter in the column/field titled "Point"
The last number is simply just 1 to however many drains there are in a municipality.
So far I have:
rec=0
def autoIncrement()
pStart=1
pInterval=1
if(rec==0):
rec=pStart
else:
rec=rec+pInterval
return "16-I-" '{0:03}'.format(rec)
So you can see that I have manually been typing in the municipal number, the letter, and the hyphens. But I would like to use the fields: Munic and Point so I don't have to manually type them in each time it changes.
I'm a beginner when it comes to python and ArcMap, so please dumb things down a little.
I'm not familiar with the ArcMap, so can't directly help you, but you might just change your function to a generator as such:
def StormDrainIDGenerator():
rec = 0
while (rec < 99):
rec += 1
yield "16-I-" '{0:03}'.format(rec)
If you are ok with that, then parameterize the generator to accept the Munic and Point values and use them in your formatting string. You probably should also parameterize the ending value as well.
Use of a generator will allow you to drop it into any later expression that accepts an iterable, so you could create a list of such simply by saying list(StormDrainIDGenerator()).
Is your question on how to get Munic and Point values into the string ID? using .format()?
I think you can use following code to do that.
def autoIncrement(a,b):
global rec
pStart=1
pInterval=1
if(rec==0):
rec=pStart
else:
rec=rec+pInterval
r = "{1}-{2}-{0:03}".format(a,b,rec)
return r
and call
autoIncrement( !Munic! , !Point! )
The r = "{1}-{2}-{0:03}".format(a,b,rec) just replaces the {}s with values of variables a,b which are actually the values of Munic and Point passed to the function.

Python data structure recommendation?

I currently have a structure that is a dict: each value is a list that contains numeric values. Each of these numeric lists contain what (to borrow a SQL idiom) you could call a primary key containing the first three values which are: a year, a player identifier, and a team identifier. This is the key for the dict.
So you can get a unique row by passing the a value in for the year, player ID, and team ID like so:
statline = stats[(2001, 'SEA', 'suzukic01')]
Which yields something like
[305, 20, 444, 330, 45]
I'd like to alter this data structure to be quickly summed by either of these three keys: so you could easily slice the totals for a given index in the numeric lists by passing in ONE of year, player ID, and team ID, and then the index. I want to be able to do something like
hr_total = stats[year=2001, idx=3]
Where that idx of 3 corresponds to the third column in the numeric list(s) that would be retrieved.
Any ideas?
Read up on Data Warehousing. Any book.
Read up on Star Schema Design. Any book. Seriously.
You have several dimensions: Year, Player, Team.
You have one fact: score
You want to have a structure like this.
You then want to create a set of dimension indexes like this.
years = collections.defaultdict( list )
players = collections.defaultdict( list )
teams = collections.defaultdict( list )
Your fact table can be this a collections.namedtuple. You can use something like this.
class ScoreFact( object ):
def __init__( self, year, player, team, score ):
self.year= year
self.player= player
self.team= team
self.score= score
years[self.year].append( self )
players[self.player].append( self )
teams[self.team].append( self )
Now you can find all items in a given dimension value. It's a simple list attached to a dimension value.
years['2001'] are all scores for the given year.
players['SEA'] are all scores for the given player.
etc. You can simply use sum() to add them up. A multi-dimensional query is something like this.
[ x for x in players['SEA'] if x.year == '2001' ]
Put your data into SQLite, and use its relational engine to do the work. You can create an in-memory database and not even have to touch the disk.
The syntax stats[year=2001, idx=3] is invalid Python and there is no way you can make it work with those square brackets and "keyword arguments"; you'll need to have a function or method call in order to accept keyword arguments.
So, say we make it a function, to be called like wells(stats, year=2001, idx=3). I imagine the idx argument is mandatory (which is very peculiar given the call, but you give no indication of what could possibly mean to omit idx) and exactly one of year, playerid, and teamid must be there.
With your current data structure, wells can already be implemented:
def wells(stats, year=None, playerid=None, teamid=None, idx=None):
if idx is None: raise ValueError('idx must be specified')
specifiers = [(i, x) for x in enumerate((year, playerid, teamid)) if x is not None]
if len(specifiers) != 2:
raise ValueError('Exactly one of year, playerid, teamid, must be given')
ikey, keyv = specifiers[0]
return sum(v[idx] for k, v in stats.iteritems() if k[ikey]==keyv)
of course, this is O(N) in the size of stats -- it must examine every entry in it. Please measure correctness and performance with this simple implementation as a baseline. An alternative solutions (much speedier in use, but requiring much time for preparation) is to put three dicts of lists (one each for year, playerid, teamid) to the side of stats, each entry indicating (or copying, but I think indicating by full key may suffice) all entries of stats that match that that ikey / keyv pair. But it's not clear at this time whether this implementation may not be premature, so please try first with the simple-minded idea!-)
def getSum(d, year, idx):
sum = 0
for key in d.keys():
if key[0] == year:
sum += d[key][idx]
return sum
This should get you started. I have made the assumption in this code, that ONLY year will be asked for, but it should be easy enough for you to manipulate this to check for other parameters as well
Cheers

Categories

Resources