DataFrame.replace by nested dict - python

I have a huge Dataframe, where a lot of entries needs to be changed. So, I've created a translation dict, which has the following structure:
{'Data.Bar Layer': {'1': 0,
'1.E': 21,
'2': 13,
'2.E': 22,
'3': 14,
'3.E': 24,
'4': 15,
'4.E': 23,
'B': 16,
'CL1': 1,
'CL2': 2,
'CL2a': 6,
'CL3': 3,
'CL3a': 4,
'CL4': 5,
'E': 18,
'L1': 7,
'L2': 8,
'L2a': 12,
'L3': 9,
'L3a': 10,
'L4': 11,
'T': 17,
'T&B': 19,
'T+B': 20},
'Data.Bar Type': {'N': 0, 'R': 1},
'Data.Defined No. Bars': {'No': 0, 'Yes': 1},
'Data.Design Option': {'-1': 0, 'Main Model': 1},...}
screenshot of the dictionaries print representation
The first key corresponds to the dataframe column and the second key to the value that needs to be changed, e.g. in column Data.Bar Layer all '1' should be 0. This is how the documentation of pandas.dataframe.replace states the dictionary to look like
However, same values have to be exchanged multiple times, which (I guess) leads to the error:
Replacement not allowed with overlapping keys and values
Here is a snippet of the Dataframe. Is there any work around to avoid this error? I tried some approaches with apply and map, but they didn't work, unfortunately.
Thanks in advance and kind regards,
Max

There might be a more pythonic way, but this code works for me;
for col in your_dict.keys():
df[col].replace(your_dict[col], inplace=True)

Related

Issues transforming tuple to denormalized dataframe

I have a tuple which is a list of 200 dicts:
eg:
mytuple= ([{'reviewId': '1234', 'userName': 'XXX', 'userImage': 'imagelink', 'content': 'AAA', 'score': 1, 'thumbsUpCount': 1, 'reviewCreatedVersion': '3.31.0', 'at': datetime.datetime(2022, 12, 1, 11, 49, 34), 'replyContent': "replycontent", 'repliedAt': datetime.datetime(2022, 12, 1, 12, 19, 51)},
{'reviewId': '5678', 'userName': 'S L', 'userImage': 'imagelink2', 'content': "content2", 'score': 1, 'thumbsUpCount': 0, 'reviewCreatedVersion': '3.31.0', 'at': datetime.datetime(2022, 11, 29, 12, 27, 46), 'replyContent': "replycontent2", 'repliedAt': datetime.datetime(2022, 11, 29, 12, 30, 40)}])
Ideally, I'd like to transform this into a dataframe with the following column headers:
reviewId
userName
userImage
1234
XXXX
imagelink
5678
S L
imagelink2
and so on with the column headers as the key and the columns containing the values.
mytuple was initially of size 2, from which I removed the second index and brought it down to just a list of dicts.
I tried different possibilities which include:
df=pd.DataFrame(mytuple)
df=pd.DataFrame.from_dict(mytuple)
df=pd.json_normalize(mytuple)
However, in all these cases, I get a dataframe as below
1
2
3
4
{'reviewId':..}
{'reviewId':..}
{}
{}
I'd like to understand where I'm going wrong. Thanks in advance!

How do i work on Checksum of Singapore Car License Plate with Python

I have researched and searched internet on the checksum of Singapore Car License Plate. For the license plate of SBA 1234, I need to convert all the digits excluding the S to numbers. A being 1, B being 2, and so on. SBA 1234 is in a string in a text format. How do i convert B and A to numbers for the calculation for the checksum while making sure that the value B and A do not change. The conversion of B and A to numbers is only for the calculation.
How do i do the conversion for this with Python. Please help out. Thank you.
There are multiple ways to create a dictionary with values A thru Z representing values 1 thru 26. One of the simple way to do it will be:
value = dict(zip("ABCDEFGHIJKLMNOPQRSTUVWXYZ", range(1,27)))
An alternate way to it would be using the ord() function.
ord('A') is 65. You can create a dictionary with values A thru Z representing values 1 thru 26. To do that, you can use simple code like this.
atoz = {chr(i): i - 64 for i in range(ord("A"), ord("A") + 26)}
This will provide an output with a dictionary
{'A': 1, 'B': 2, 'C': 3, 'D': 4, 'E': 5, 'F': 6, 'G': 7, 'H': 8, 'I': 9, 'J': 10, 'K': 11, 'L': 12, 'M': 13, 'N': 14, 'O': 15, 'P': 16, 'Q': 17, 'R': 18, 'S': 19, 'T': 20, 'U': 21, 'V': 22, 'W': 23, 'X': 24, 'Y': 25, 'Z': 26}
You can search for the char in the dictionary to get 1 thru 26.
Alternate, you can directly use ord(x) - 64 to get a value of the alphabet. If x is A, you will get 1. Similarly, if x is Z, the value will be 26.
So you can write the code directly to calculate the value of the Singapore Number Plate as:
snp = 'SBA 1234'
then you can get a value of
snp_num = [ord(snp[1]) - 64,ord(snp[2]) - 64, int(snp[4]), int(snp[5]), int(snp[6])]
This will result in
[2, 1, 1, 2, 3]
I hope one of these options will work for you. Then use the checksum function to do your calculation. I hope this is what you are looking for.

Difference of list of dictionaries

I've searched quite a lot but I haven't found any similar question to that one.
I have two lists of dictionaries in following format:
data1 = [
{'id': 4, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
{'id': 4, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
{'id': 6, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
{'id': 7, 'date_time': datetime.datetime(2020, 4, 3, 16, 14, 21)},
]
data2 = [
{'id': 4, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
{'id': 6, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
]
desired output:
final_data = [
{'id': 4, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
{'id': 7, 'date_time': datetime.datetime(2020, 4, 3, 16, 14, 21)},
]
I want only dictionaries which are in data1 and not in data2.
Until now when I found a match in two for loops I popped the dictionary out of the list but that does not seem like a good approach to me. How can I achieve desired output?
It doesn't have to be time efficient since there will be max tens of dictionaries in each list
Current implementation:
counter_i = 0
for i in range(len(data1)):
counter_j = 0
for j in range(len(data2)):
if data1[i-counter_i]['id'] == data2[j-counter_j]['id'] and data1[i-counter_i]['date_time'] == data2[j-counter_j]['date_time']
data1.pop(i-counter_i)
data2.pop(j-counter_j)
counter_i += 1
counter_j += 1
break
If performance is not an issue, why not:
for d in data2:
try:
data1.remove(d)
except ValueError:
pass
list.remove checks for object equality, not identity, so will work for dicts with equal keys and values. Also, list.remove only removes one occurrence at a time.
schwobaseggl's answer is probably the cleanest solution (just make a copy before removing if you need to keep data1 intact).
But if you want to use a set difference... well dicts are not hashable, because their underlying data could change and lead to issues (same reason why lists or sets are not hashable either).
However, you can get all the dict pairs in a frozenset to represent a dictionary (assuming the dictionary values are hashable -schwobaseggl). And frozensets are hashable, so you can add those to a set a do normal set difference. And reconstruct the dictionaries at the end :D.
I don't actually recommend doing it, but here we go:
final_data = [
dict(s)
for s in set(
frozenset(d.items()) for d in data1
).difference(
frozenset(d.items()) for d in data2
)
]
you can go in either way:
Method 1:
#using filter and lambda function
final_data = filter(lambda i: i not in data2, data1)
final_data = list(final_data)
Method 2:
# using list comprehension to perform task
final_data = [i for i in data1 if i not in data2]

Assigning certain variable, then value to nested dictionary without modifing the original variable [duplicate]

This question already has an answer here:
Only latest value of dictionary getting added to list [duplicate]
(1 answer)
Closed 3 years ago.
To the point, i got two events:
a = {'key': 'a', 'time': datetime.datetime(2020, 2, 15, 11, 18, 18, 982000)}
b = {'key': 'b', 'time': datetime.datetime(2020, 2, 1, 11, 47, 14, 522000)}
my goal is to assign and nest one event to the other like this:
a['key2'] = b
and this is result:
{'key': 'a', 'time': datetime.datetime(2020, 2, 15, 11, 18, 18, 982000), 'key2': {'key': 'b', 'time': datetime.datetime(2020, 2, 1, 11, 47, 14, 522000)}}
but when i want to assign new key to nested it works but it does also modify variable b, result:
a['key2']['nestedkey'] = {'somekey': 'somevalue'}
{'key': 'a', 'time': datetime.datetime(2020, 2, 15, 11, 18, 18, 982000), 'key2': {'key': 'b', 'time': datetime.datetime(2020, 2, 1, 11, 47, 14, 522000), 'nestedkey': {'somekey': 'somevalue'}}}
{'key': 'b', 'time': datetime.datetime(2020, 2, 1, 11, 47, 14, 522000), 'nestedkey': {'somekey': 'somevalue'}}
Can someone explain why variable b is getting modified? And if there is anyway to do it without modifying it?
In python by default you're not making a copy of an object when you assign it. So when you're doing a['key2'] = b, a['key2] just hold a reference to b. Weather you modify b or a['key2'] it's going to modify the same object.
To make a copy you can use deepcopy:
import copy
a['key2'] = copy.deepcopy(b)
Then it would works as you are expecting, modifying a['key2'] will not modify b
This happens because of variable b is being used by reference. Basically a['key2']=b says that a['key2'] points to the location in memory where b is stored, so when changes are made to a['key2'] or the variable b the same data is being changed.
To avoid this you can make a deep copy of of b and assign that to a[key2] like so:
import copy
a[key2] = copy.deepcopy(b)
This should give you your desired results.
To get more details about how copy works see here

How to retrieve the top-5 largest values (integers) from a dictionary of key-value pairs?

I have created 3 dictionaries from a function that iterates through a dataframe of the popular ios apps. The 3 dictionaries contain key-value pairs based on how often the key occur in the dataframe. From these dictionaries, I want to retrieve the 5 largest values of each dictionary as well as the corresponding keys. These are the results from the dataframe iterations. Obviously I can see this manually but I want python to determine the 5 largest.
Prices: {0.0: 415, 4.99: 10, 2.99: 13, 0.99: 31, 1.99: 13, 9.99: 1, 3.99: 2, 6.99: 3}
Genres: {'Productivity': 9, 'Shopping': 12, 'Reference': 3, 'Finance': 6, 'Music': 19, 'Games': 308, 'Travel': 6, 'Sports': 7, 'Health & Fitness': 8, 'Food & Drink': 4, 'Entertainment': 19, 'Photo & Video': 25, 'Social Networking': 21, 'Business': 4, 'Lifestyle': 4, 'Weather': 8, 'Navigation': 2, 'Book': 4, 'News': 2, 'Utilities': 12, 'Education': 5}
Content Ratings: {'4+': 304, '12+': 100, '9+': 54, '17+': 30}
You can sort the dictionaries by values and then slice the top 5:
sorted(Prices, key=Prices.get, reverse=True)[:5]
Same for the other two dicts.
you can also accomplish this using itemgetter.
prices= {0.0: 415, 4.99: 10, 2.99: 13, 0.99: 31, 1.99: 13, 9.99: 1, 3.99: 2, 6.99: 3}
genres= {'Productivity': 9, 'Shopping': 12, 'Reference': 3, 'Finance': 6, 'Music': 19, 'Games': 308, 'Travel': 6, 'Sports': 7, 'Health & Fitness': 8, 'Food & Drink': 4, 'Entertainment': 19, 'Photo & Video': 25, 'Social Networking': 21, 'Business': 4, 'Lifestyle': 4, 'Weather': 8, 'Navigation': 2, 'Book': 4, 'News': 2, 'Utilities': 12, 'Education': 5}
contentRatings= {'4+': 304, '12+': 100, '9+': 54, '17+': 30}
arr = [prices,contentRatings,genres]
from operator import itemgetter
for test_dict in arr:
# printing original dictionary
print("The original dictionary is : " + str(test_dict))
# 5 largest values in dictionary
# Using sorted() + itemgetter() + items()
res = dict(sorted(test_dict.items(), key = itemgetter(1), reverse = True)[:5])
# printing result
print("The top 5 value pairs are " + str(res))

Categories

Resources