Pandas: Need to reconcile 3 data frames

Pandas: Need to reconcile 3 data frames - python

I've been banging my head against this. Here's a sample of data frames I have:
df_users:
Email Roles
johndoe0#example.com
johndoe1#example.com
johndoe2#example.com
johndoe3#example.com
johndoe4#example.com
df_groups:
Group1 Group2 Group3
johndoe0#example.com johndoe4#example.com johndoe0#example.com
johndoe1#example.com johndoe1#example.com
johndoe4#example.com johndoe2#example.com
johndoe3#example.com
df_roles:
Group1 Group2 Group3
Role 1 True False False
Role 2 False True True
Role 3 False False True
Here's what I need the output to be:
Email Roles
johndoe0#example.com Role 1
johndoe0#example.com Role 2
johndoe0#example.com Role 3
johndoe1#example.com Role 1
johndoe2#example.com Role 2
johndoe3#example.com Role 3
johndoe2#example.com Role 2
johndoe2#example.com Role 3
johndoe3#example.com Role 2
johndoe3#example.com Role 3
johndoe4#example.com Role 1
I tried doing this in Excel. One of the challenges it that in the actual data, there are hundreds of emails, >100 roles, and dozens of groups.
What I'm thinking of is something like:
# Prepare yourself for bad pseudo-code
for each user in df_users
for each group in in df_groups
if the user is in the group:
if the row value is 'True' in df_roles:
add the value of the first column in df_roles to the 'Roles' column in df_users...
if a role already exists, append it with a ',' delimiter
Hopefully that shows what I'm going for.

Try This:
# data structure that will hold tuple of user and role
users_roles = []
# function that creates sets in order to optimize the search
def create_sets(users_groups):
list_sets = []
for col in list(users_groups.columns):
s = set(list(users_groups[col]))
if '' in s:
s.remove('')
list_sets.append(s)
return list_sets
all_groups_sets = create_sets(df_groups)
for user in list(df_users['Email']):
for idx, group_set in enumerate(all_groups_sets):
if user in group_set:
for i in range(len(df_roles)):
if df_roles.iloc[i][idx]:
users_roles.append((user, 'Role' + ' ' + str(i+1)))
final = pd.DataFrame(users_roles, columns=['Email', 'Role'])

Related

Extract data from nested JSON | Pandas

I'm dealing with a nested JSON in order to extract data about transactions from my database using pandas.
My JSON can have one of these contents :
{"Data":{"Parties":[{"ID":"JackyID","Role":12}],"NbIDs":1}} #One party identified
{"Data":{"Parties":[{"ID":"JackyID","Role":12},{"ID":"SamNumber","Role":10}],"NbIDs":2}} #Two Parties identified
{"Data":{"Parties":[],"NbIDs":0}} #No parties identified
{"Data": None} #No data
When looking to extract the values of ID (ID of the party - String datatype) and Role (Int datatype - refer to buyers when Role=12 and sellers when Role=10) and write it in a pandas dataframe, I'm using the following code :
for i,row in df.iterrows():
json_data = json.dumps(row['Data'])
data = pd_json.loads(json_data)
data_json = json.loads(data)
df['ID'] = pd.json_normalize(data_json, ['Data', 'Parties'])['ID']
df['Role'] = pd.json_normalize(data_json, ['Data', 'Parties'])['Role']
Now when trying to check its values and give every Role its correspending ID:
for i,row in df.iterrows():
if row['Role'] == 12:
df.at[i,'Buyer'] = df.at[i,'ID']
elif row['Role'] == 10:
df.at[i,'Seller'] = df.at[i,'ID']
df = df[['Buyer', 'Seller']]
The expected df result for the given scenario should be as below :
{"Data":{"Parties":[{"ID":"JackyID","Role":12}],"NbIDs":1}} #Transaction 1
{"Data":{"Parties":[{"ID":"JackyID","Role":12},{"ID":"SamNumber","Role":10}],"NbIDs":2}} #Transaction 2
{"Data":{"Parties":[],"NbIDs":0}} #Transaction 3
{"Data": None} #Transaction 4
>>print(df)
Buyer | Seller
------------------
JackyID| #Transaction 1 we have info about the buyer
JackyID| SamNumber #Transaction 2 we have infos about the buyer and the seller
| #Transaction 3 we don't have any infos about the parties
| #Transaction 4 we don't have any infos about the parties
What is the correct way to do so ?

You can special consider case 4 where there is no Data as empty Parties
df = pd.DataFrame(data['Data']['Parties'] if data['Data'] else [], columns=['ID', 'Role'])
df['Role'] = df['Role'].map({10: 'Seller', 12: 'Buyer'})
Then add possible missing values for Role
df = df.set_index('Role').reindex(['Seller', 'Buyer'], fill_value=pd.NA).T
print(df)
# Case 1
Role Seller Buyer
ID <NA> JackyID
# Case 2
Role Seller Buyer
ID SamNumber JackyID
# Case 3
Role Seller Buyer
ID <NA> <NA>
# Case 4
Role Seller Buyer
ID <NA> <NA>

How to get highest value in django model for each objects

Admin wants to add different challenges. Each challenge has a lot of users. each user may have a lot of likes. I want to show the winner of each challenge. For that, I need to get which candidate gets the highest likes. How can I get it? is there any way like .count .?
how can I use that? in which model.
For example:
challenges
1 first_contest
2 second_contest
candidates
id name contest
1 jhon first_contest
2 sara second_contest
3 abi first_contest
candidates likes
id user_id candidate_id
1 1 1
2 2 2
3 1 1
In this case candidate, 1 = Jhon get 2 likes so in the first contest Jhon wins. Also in the second contest, Sara gets 1 like. So I need to show the winner in the first contest. How is that?
models.py:
class Challenge(models.Model):
title = models.CharField(max_length=50)
class Candidates(models.Model):
owner = models.ForeignKey(User, on_delete=models.CASCADE)
image = models.FileField( upload_to="challenge_candidates/",)
def likes_count(self):
return self.likes.all().count()
class CandidateLikes(models.Model):
like = models.CharField(max_length=10)
user =
models.ForeignKey(User,on_delete=models.CASCADE,related_name='candidate_likes')
contest_candidates = models.ForeignKey(Candidates, on_delete=models.CASCADE,
related_name='likes')
Sorry for my poor English. Thank you.

You first need to have a relationship between your CandidatLike model and Challenge model so that you can filter by challenge. A foreign key relation could be sufficient. Then you can add this query to your views
winner = CandidateLikes.objects.filter(challenge="your_challenge_name").order_by("like")
Notice that challenge should exist in your CandidatLike model, since we are filtering by it.

I think you are missing a relationship between Challenge and Candidates. Let's say you would add a challenge field to Candidate:
class Candidates(models.Model):
owner = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
challenge = models.ForeignKey(Challenge, on_delete=models.CASCADE, related_name='candidates')
Then you can query the winner of each challenge with the highest like-count with a subquery like this:
from django.db.models import Count
from django.db.models import OuterRef, Subquery
cadidates = Candidates.objects.annotate(like_count=Count('likes')).filter(challange=OuterRef('pk')).order_by('-like_count')
queryset = Challenge.objects.annotate(winner=Subquery(cadidates.values('owner__username')[:1]))
This will give you a Challenge result query with an annotated winner like:
{'id': 1, 'title': 'Challange 1', 'winner': 'username'}

Discord py How to add role by name not id

How to add role by name not id. group is string
await message.author.add_roles(message.author, group)

You can use discord.utils.get function for this:
role_name = "role" # specify role name here
role = discord.utils.get(message.guild.roles, name=role_name)
if role is not None:
await message.author.add_roles(role)

Tortoise ORM filter with logical operators

I have two tables
class User(models.Model):
id = fields.BigIntField(pk=True)
name = CharField(max_length=100)
tags: fields.ManyToManyRelation["Tag"] = fields.ManyToManyField(
"models.Tag", related_name="users", through="user_tags"
)
class Tag(models.Model):
id = fields.BigIntField(pk=True)
name = fields.CharField(max_length=100)
value = fields.CharField(max_length=100)
users: fields.ManyToManyRelation[User]
Let's assume this dummy data
#users
bob = await User.create(name="bob")
alice = await User.create(name="alice")
#tags
foo = await Tag.create(name="t1", value="foo")
bar = await Tag.create(name="t2", value="bar")
#m2m
await bob.tags.add(foo)
await alice.tags.add(foo, bar)
Now I want to count users who have both tags foo and bar, which is alice in this case, so it should be 1.
The below query will give me a single level of filtering, but how do I specify that the user should have both foo and bar in their tags ?
u = await User.filter(tags__name="t1", tags__value="foo").count()

Tortoise-ORM provides Q objects for complicated queries with logical operators like |(or) and &(and).
Your query could be made like this:
u = await User.filter(Q(tags__name="t1") &
(Q(tags__value="foo") | Q(tags__value="bar"))).count()

Since you cannot group_by on the annotated field in Tortoise ORM as of now.
Here's the solution using the having clause referred from here
u = await User.filter(Q(tags__value="foo") | Q(tags__value="bar"))
.annotate(count=Count("id"))
.filter(count==2)
The idea is to get the records having a count equal to the number of tags, which is 2 in this case (bar, foo)

Format of Discord.Role

I'm making a Discord bot, but on the part where the bot determines the Permission of the Author, it does not recognise roles, valuables like "owner" and "admin" are the ID of the role, What is the format for discord.Role?
I've tried making classes with an id and name
perms = 0
if moderator in message.author.roles:
perms = 1
if admin in message.author.roles:
perms = 2
if owner in message.author.roles:
perms = 3
if muted in message.author.roles:
perms = -1
Right now it outputs perms as 0 even though it should be 3 as my role is "owner".

According to the discord.py documentation member.roles (or message.author.roles in your case) returns a list of Role class instances, not role IDs (documentation entry).
You can read about the Role class in the documentation as well.
If you want to check if a member has a role with the specified ID, you can get a list of his role IDs first:
perms = 0
ids = [role.id for role in message.author.roles]
if moderator in ids:
perms = 1
if admin in ids:
perms = 2
if owner in ids:
perms = 3
if muted in ids:
perms = -1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas: Need to reconcile 3 data frames - python

Related

Extract data from nested JSON | Pandas

How to get highest value in django model for each objects

Discord py How to add role by name not id

Tortoise ORM filter with logical operators

Format of Discord.Role

Categories

Resources