Related
I have a list of dictionary:: Sample data:: Like this I have n number of data.
datas = [{"_id":"1234as", "Total students":"123,321", "TotalPresent":"321,345"},
{"_id":"1234asas","TotalStudents":"343,431","TotalPresent":"541,656"}]
I tried
for data in datas:
for i in data.values():
re.sub('[^A-Za-z0-9]+', '', i)
datas.append(i)
I just want to remove comma(,) from TotalStudents and TotalPresent and replace the value in datas.
Edit 1
In my list of dictionary I also have value as::
datas = [{"_id":"1234as","Totalstudents":"123,321","TotalPresent":"321,345"},
{"_id":"1234asas","TotalStudents":"343,431","TotalPresent":"541,656"},
{"_id":"9934 asas","TotalStudents":"NA","TotalPresent":""}]
Here, in key TotalStudents value is "NA" and TotalPresent is "". Is there a way to replace whereever "NA" or "" appears replace with "0".
If you want to replace the values of specific keys, make sure that the keys are the same because the first dict in your example has Total Students but the second has TotalStudents.
Try this:
datas = [{"_id": "1234as", "Total Students": "123,321", "TotalPresent": "321,345"},
{"_id": "1234asas", "Total Students": "343,431", "TotalPresent": "541,656"}]
for d in datas:
d["Total Students"] = d["Total Students"].replace(",", "")
d["TotalPresent"] = d["TotalPresent"].replace(",", "")
print(datas)
# output: [{'_id': '1234as', 'Total Students': '123321', 'TotalPresent': '321345'}, {'_id': '1234asas', 'Total Students': '343431', 'TotalPresent': '541656'}]
If you want to replace commas from all the keys, you can try (but bare in mind that in this case, all the values of your dict must be strings):
datas = [{"_id": "1234as", "Total Students": "123,321", "TotalPresent": "321,345"},
{"_id": "1234asas", "Total Students": "343,431", "TotalPresent": "541,656"}]
for d in datas:
for k in d:
d[k] = d[k].replace(",", "")
You can iterate over the key,value pairs in the dictionaries. And after removing the comma replace the value for that key.
import re
datas = [{"_id": "1234as", "Total Students": "123,321", "TotalPresent": "321,345"},
{"_id": "1234asas", "TotalStudents": "343,431", "TotalPresent": "541,656"}]
for data in datas:
for key, value in data.items():
print(key, value)
value = re.sub('[^A-Za-z0-9]+', '', value)
data[key] = value
print(datas)
Result
_id 1234as
Total Students 123,321
TotalPresent 321,345
_id 1234asas
TotalStudents 343,431
TotalPresent 541,656
[{'_id': '1234as', 'Total Students': '123321', 'TotalPresent': '321345'},
{'_id': '1234asas', 'TotalStudents': '343431', 'TotalPresent': '541656'}]
This is a way to make your code working, and thus always replacing all values. If necessary you need to add your own checks to make it smarter.
EDIT
To catch the "NA" and "" values I have added some if statements. It's simple and stays close to your own code.
import re
datas = [{"_id":"1234as","TotalStudents":"123,321","TotalPresent":"321,345"},
{"_id":"1234asas","TotalStudents":"343,431","TotalPresent":"541,656"},
{"_id":"9934 asas","TotalStudents":"NA","TotalPresent":""}]
for data in datas:
print(data)
for key, value in data.items():
if key == "TotalStudents":
if value == "NA":
value = "0"
else:
value = re.sub('[^A-Za-z0-9]+', '', value)
elif key == "TotalPresent":
if not value:
value = "0"
else:
value = re.sub('[^A-Za-z0-9]+', '', value)
data[key] = value
print()
for data in datas:
print(data)
Result
{'_id': '1234as', 'TotalStudents': '123321', 'TotalPresent': '321345'}
{'_id': '1234asas', 'TotalStudents': '343431', 'TotalPresent': '541656'}
{'_id': '9934 asas', 'TotalStudents': '0', 'TotalPresent': '0'}
To make the code more efficient you can place the new values directly in data. In this case you don't replace the "_id" anymore with it's own value.
import re
datas = [{"_id":"1234as","TotalStudents":"123,321","TotalPresent":"321,345"},
{"_id":"1234asas","TotalStudents":"343,431","TotalPresent":"541,656"},
{"_id":"9934 asas","TotalStudents":"NA","TotalPresent":""}]
for data in datas:
print(data)
for key, value in data.items():
if key == "TotalStudents":
if value == "NA":
data[key] = "0"
else:
data[key] = re.sub('[^A-Za-z0-9]+', '', value)
elif key == "TotalPresent":
if not value:
data[key] = "0"
else:
data[key] = re.sub('[^A-Za-z0-9]+', '', value)
print()
for data in datas:
print(data)
re.sub does not work in-place - it does return altered str. More generally as strs are immutable functions processing them are not working in-place. Solution using re.sub might looks following way:
import re
datas = [{"_id":"1234as","Total Students":"123,321","TotalPresent":"321,345"},
{"_id":"1234asas","TotalStudents":"343,431","TotalPresent":"541,656"}]
cleandatas = []
for data in datas:
cleandatas.append({k:re.sub('[^A-Za-z0-9]+', '', v) for k,v in data.items()})
print(cleandatas)
Output:
[{'_id': '1234as', 'Total Students': '123321', 'TotalPresent': '321345'}, {'_id': '1234asas', 'TotalStudents': '343431', 'TotalPresent': '541656'}]
I used dict-comprehension to create new cleaned dicts
I've two lists containing dictionaries. I want to compare certain fields in each of these dictionaries.
current_list = [{"name": "Bill","address": "Home", "age": 23, "accesstime":11:14:01},
{"name": "Fred","address": "Home", "age": 26, "accesstime":11:57:43},
{"name": "Nora","address": "Home", "age": 33, "accesstime":11:24:14}]
backup_list = [{"name": "Bill","address": "Home", "age": 23, "accesstime":13:34:24},
{"name": "Fred","address": "Home", "age": 26, "accesstime":13:34:26},
{"name": "Nora","address": "Home", "age": 33, "accesstime":13:35:14}]
The list / dictionaries should be the same in order, and i just want to compare certain key, value pairs. Like name, address, age and ignore access time, but what i have so far compares each key / pair. So i just want to compare
current_list:dictionary[0][name] -> backup_list:dictionary[0][name] and then
current_list:dictionary[0][address] -> backup_list:dictionary[0][address]
and so on.
for x in current_list:
for y in backup_list:
for k, v in x.items():
for kk, vv in y.items():
if k == kk:
print("Match: {0}".format(kk))
break
elif k != kk:
print("No match: {0}".format(kk))
Current output
Match name with name
No Match address with name
Match address with address
No Match age with name
No Match age with address
Match age with age
No Match dateRegistered with name
No Match dateRegistered with address
No Match dateRegistered with age
Match dateRegistered with dateRegistered
Preferred output
Match name with name
Match address with address
Match age with age
* Due to a requirement change my list became a list of Elementtree xml elements *
So instead of the above list, its becomes
backup_list = ["<Element 'New' at 0x0000000002698C28>, <Element 'Update' at 0x0000000002698CC8>, <Element 'New' at 0x0000000002698CC8>"]
Where the ElementTree is an xml element containing:
{"name": "Nora", "address": "Home", "age": 33, "dateRegistered": 20140812}"
So this based on the answer below seems to satisfy my requirements so far:
value_to_compare = ["name", "address", "age"]
for i, elem in enumerate(current_list):
backup_dict = backup_list[i]
if elem.tag == "New":
for key in value_to_compare:
try:
print("Match {0} {1} == {2}:".format(key, backup_dict.attrib[key], elem.attrib[key]))
except KeyError:
print("key {} not found".format(key))
except:
raise
else:
continue
I don't know if I fully understood your question but I think the following code should do the trick:
compare_arguments = ["name", "age", "address"]
for cl, bl in zip(current_list, backup_list):
for ca in compare_arguments:
if cl[ca] == bl[ca]:
print("Match {0} with {0}".format(cl[ca]))
print("-" * 10)
What is done in the code above is a zip iteration over both lists. With another list you specify the fields you want to compare. In the main loop you iterate over the comparable fields and print them accordingly.
Someone has already made a module called deepdiff that does this and sooo much more! Refer to this answer for their detailed explanation!
First - install it
pip install deepdiff
Then - enjoy
#of course import it
from deepdiff import DeepDiff
current_list, backup_list = [...], [...] #values stated in question.
for c, b in zip(current_list, backup_list):
dif = DeepDiff(c, b)
for key in ["name", "age", "address"]:
try:
assert dif['values_changed'][f"root['{key}'"]
#pass the below line to exclude any non-matching values like your desired output has
print(f"No Match {key} with {key}")
except KeyError:
print(f"Match {key} with {key}")
Results: - as expected
Match name with name
Match address with address
Match age with age
Match name with name
Match address with address
Match age with age
Match name with name
Match address with address
Match age with age
Final Note
This module has soo much else you can utilize such as type changes, key changes/removals/additions, an extensive text comparison, and searches as well. Definitely well worth a look into.
~GL on your project!
Simply compare with this-
for current in current_list:
for backup in backup_list:
for a in backup:
for b in current:
if a == b:
if a == "name" or a== "age" or a== "address" :
if backup[a] == current[b]:
print (backup[a])
print (current[b])
I do not understand the rationnal of your data structure, but I think that will do the trick:
value_to_compare = ["name", "address", "age"]
for i, elem in enumerate(current_list):
backup_dict = backup_list[i]
for key in value_to_compare:
try:
print("Match {}: {} with {}".format(key, elem[key], backup_dict[key]))
except KeyError:
print("key {} not found".format(key))
# may be a raise here.
except:
raise
You can compare all corresponding fields with this code:
for dct1, dct2 in zip(current_list, backup_list):
for k, v in dct1.items():
if k == "accesstime":
continue
if v == dct2[k]:
print("Match: {0} with {0}".format(k))
else:
print("No match: {0} with {0}".format(k))
Note that the values of your "accesstime" keys are not valid Python objects!
If you are happy to use a 3rd party library, this kind of task can be more efficiently implemented, and in a more structured way, via Pandas:
import pandas as pd
res = pd.merge(pd.DataFrame(current_list),
pd.DataFrame(backup_list),
on=['name', 'address', 'age'],
how='outer',
indicator=True)
print(res)
accesstime_x address age name accesstime_y _merge
0 11:14:01 Home 23 Bill 13:34:24 both
1 11:57:43 Home 26 Fred 13:34:26 both
2 11:24:14 Home 33 Nora 13:35:14 both
The result _merge = 'both' for each row indicates the combination of ['name', 'address', 'age'] occurs in both lists but, in addition, you get to see the accesstime from each input.
You can use zip method to iterate over lists simultaneously.
elements_to_compare = ["name", "age", "address"]
for dic1, dic2 in zip(current_list, backup_list):
for element in elements_to_compare :
if dic1[element] == dic2[element]:
print("Match {0} with {0}".format(element))
I have a script that parses a yaml file and extracts key/value pairs and prints them, but I keep getting single quotes in the output.
How do I get rid of the quote marks?
YML Snippet
AsNum:
description: Local AS for BGP global
format: string
type: string
Function from Script
def getVals(dict):
for key,value in dict.items():
#print(keys)
if isDict(value):
if key != "properties" and key != "items":
print(key)
getVals(value)
else:
print("key: ", key, " value: ", value)
Example Output
AsNum
('key: ', 'type', ' value: ', 'string')
('key: ', 'description', ' value: ', 'Local AS for BGP global')
('key: ', 'format', ' value: ', 'string')
I am a python beginner. I try to write my hometown city with dictionary after practicing ex39 of Learn Python The Hard Way.
Here are what I wrote:
states = {
'Orangon': 'OR',
'Florida': 'FL',
'California': 'CA',
'New York': 'NY',
'Michigan': 'MI',
}
for state, abbrev in states.items():
print "%s is abbreviated %s" % (state, abbrev)
print states.get('Florida')
print states.get('California')
cities = {
'New Taipei': 'NTP',
'Taipei': 'TP',
'Kaohsiung': 'KHU',
'Taichung': 'TAC',
'Taoyuan': 'TYN',
'Tainan': 'TNA',
'Hsinchu': 'HSC',
'Keelung': 'KLG',
'Chiayi': 'CYI',
'Changhua': 'CHA',
'Pingtung': 'PTG',
'Zhubei': 'ZBI',
'Yuanlin': 'Yln',
'Douliu': 'Dlu',
'Taitung': 'TAT',
'Hualien': 'HUl',
'Toufen': 'TFE',
'Yilan': 'Yln',
'Miaoli': 'Mli',
'Magong': 'Mgn',
}
for cities, abbrev in cities.items():
print "%s is %s" % (cities, abbrev)
print cities.get('Magong')
There is error in the last code:
Traceback (most recent call last):
File "ex39.2.py", line 27, in
print cities.get('Magong')
AttributeError: 'str' object has no attribute 'get'
I don't understand why there is no error in print states.get('California') but there is error in print cities.get('Magong')
In your for loop you are assigning a string to the variable cities:
for cities, abbrev in cities.items():
print "%s is %s" % (cities, abbrev)
thus, after the for loop, cities is no longer a dict, but a string.
Solution: use a different variable in your loop:
for city, abbrev in cities.items():
print "%s is %s" % (city, abbrev)
I'm writing a program using dictionaries nested within a list. I want to print the name of each dictionary when looping through the list, but don't know how to do that without calling the entire contents of the dictionary. Here is my code:
sam = {
'food' : 'tortas',
'country' : 'mexico',
'song' : 'Dream On',
}
dave = {
'food' : 'spaghetti',
'country' : 'USA',
'song' : 'Sweet Home Alabama',
}
people = [sam, dave]
for person in people:
for key, value in sorted(person.items()):
print( #person's name +
"'s favorite " + key + " is " + value + ".")
Here is the output:
's favorite country is mexico.
's favorite food is tortas.
's favorite song is Dream On.
's favorite country is USA.
's favorite food is spaghetti.
's favorite song is Sweet Home Alabama.
Everything works, I just need the names of my dictionaries to print. What's the solution?
The (more) correct way of doing this is to construct a dict of dicts instead, such as:
people = {'sam': {'food' : 'tortas',
'country' : 'mexico',
'song' : 'Dream On',
},
'dave': {'food' : 'spaghetti',
'country' : 'USA',
'song' : 'Sweet Home Alabama',
}
}
Then you can simply do the following:
for name, person in people.items():
for key, value in sorted(person.items()):
print(name + "'s favorite " + key + " is " + value + ".")
This will print the following:
dave's favorite country is USA.
dave's favorite food is spaghetti.
dave's favorite song is Sweet Home Alabama.
sam's favorite country is mexico.
sam's favorite food is tortas.
sam's favorite song is Dream On.
As a side note, it is more readable to use string formatting in your print statement:
print("{0}'s favorite {1} is {2}".format(name, key, value))
what you are basically trying to do is printing the name of a variable. Of course, this is not reccomended. If you really want to do this, you should take a look at this post:
How can you print a variable name in python?
What i would do, is to store the name of the dictionary inside of the lists. You could do this by changing 'people = [sam, dave]' to 'people = [["sam", sam], ["dave", dave]]'. This way, person[0] is the name of the person, and person[1] contains the information.
The simplest way is to store the name as a string that maps to the matching variable identifier:
people = {'sam':sam, 'dave':dave}
for name, person in people.items():
for key, value in sorted(person.items()):
print(name + "'s favorite " + key + " is " + value + ".")
If you really don't like the idea of typing each name twice, you could 'inline' the dictionaries:
people = {
'sam':{
'food' : 'tortas',
'country' : 'mexico',
'song' : 'Dream On',
},
'dave':{
'food' : 'spaghetti',
'country' : 'USA',
'song' : 'Sweet Home Alabama',
}
}
Finally, if you can rely on those variables being in the global namespace and are more concerned with just making it work than purity of practice, you can find them this way:
people = ['sam', 'dave']
for name in people:
person = globals()[name]
for key, value in sorted(person.items()):
print(name + "'s favorite " + key + " is " + value + ".")
Values in a list aren't really variables any more. They aren't referred to by a name in some namespace, but by an integer indicating their offsets from the front of the list (0, 1, ...).
If you want to associate each dict of data with some name, you have to do it explicitly. There are two general options, depending on what's responsible for tracking the name: the collection of people, or each person in the collection.
The first and easiest is the collections.OrderedDict --- unlike the normal dict, it will preserve the order of the people in your list.
from collections import OrderedDict
sam = {
'food': 'tortas',
'country': 'Mexico',
'song': 'Dream On',
}
dave = {
'food': 'spaghetti',
'country': 'USA',
'song': 'Sweet Home Alabama',
}
# The OrderedDict stores each person's name.
people = OrderedDict([('Sam', sam), ('Dave', dave)])
for name, data in people.items():
# Name is a key in the OrderedDict.
print('Name: ' + name)
for key, value in sorted(data.items()):
print(' {0}: {1}'.format(key.title(), value))
Alternatively, you can store each person's name in his or her own dict... assuming you're allowed to change the contents of those dictionaries. (Also, you wouldn't want to add anything to the data dictionary that would require you to change / update the data more than you already do. Since most people change their favorite food or song much more often than they change their name, this is probably safe.)
sam = {
# Each dict has a new key: 'name'.
'name': 'Sam',
'food': 'tortas',
'country': 'Mexico',
'song': 'Dream On',
}
dave = {
'name': 'Dave',
'food': 'spaghetti',
'country': 'USA',
'song': 'Sweet Home Alabama',
}
people = [sam, dave]
for data in people:
# Name is a value in the dict.
print('Name: ' + data['name'])
for key, value in sorted(data.items()):
# Have to avoid printing the name again.
if 'name' != key:
print(' {0}: {1}'.format(key.title(), value))
Note that how you print the data depends on whether you store the name in the collection (OrderedDict variant), or in each person's dict (list variant).
Thanks for the great input. This program is for a practice example in "Python Crash Course" by Eric Matthes, so the inefficient "dictionaries inside list" format is intentional. That said, I got a lot out of your comments, and altered my code to get the desired output:
sam = {
#Added a 'name' key-value pair.
'name' : 'sam',
'food' : 'tortas',
'country' : 'mexico',
'song' : 'Dream On',
}
dave = {
'name' : 'dave',
'food' : 'spaghetti',
'country' : 'USA',
'song' : 'Sweet Home Alabama',
}
people = [sam, dave]
for person in people:
for key, value in sorted(person.items()):
#Added if statement to prevent printing the name.
if key != 'name':
print(person['name'].title() + "'s favorite " + key + " is " + value + ".")
#Added a blank line at the end of each for loop.
print('\n')
Here is the output:
Sam's favorite country is mexico.
Sam's favorite food is tortas.
Sam's favorite song is Dream On.
Dave's favorite country is USA.
Dave's favorite food is spaghetti.
Dave's favorite song is Sweet Home Alabama.
Thanks again, all who provided insightful answers.