Plotting names with Chinese letters throws a error

Plotting names with Chinese letters throws a error - python

I have a problem. I want to print the 5 most names. But unfortunately the names are not only Latin letters, but also Chinese letters. As soon as I want to print the plot, I got:
C:\Users\user\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:240: RuntimeWarning: Glyph 32422 missing from current font.
How can I solve this error?
import pandas as pd
import seaborn as sns
d = {'id': [1, 2, 3, 4, 5],
'name': ['Max Power', 'Jessica', '约翰·多伊', '哈拉尔量杯', 'Frank High'],
}
df = pd.DataFrame(data=d)
print(df)
df_count = df['name'].value_counts()[:5]
ax = sns.barplot(x=df_count.index, y=df_count)

Related

How to convert datatype of the columns?

I picked up part of the code from here and expanded a bit. However, I am not able to convert the datatypes of Basket & Count columns for further processing.
for e.g., Basket and Count columns are int64, I would like to change them to float64.
import ipywidgets as widgets
from IPython.display import display, clear_output
# creating a DataFrame
df = pd.DataFrame({'Basket': [1, 2, 3],
'Name': ['Apple', 'Orange',
'Count'],
'id': [111, 222,
333]})
vardict = df.columns
select_variable = widgets.Dropdown(
options=vardict,
value=vardict[0],
description='Select variable:',
disabled=False,
button_style=''
)
def get_and_plot(b):
clear_output
s = select_variable.value
col_dtype = df[s].dtypes
print(col_dtype)
display(select_variable)
select_variable.observe(get_and_plot, names='value')
Thanks in advance.

Python Trouble for ESPN FF

First time posting here, but having trouble with some code that I'm using to pull fantasy football data from ESPN. I pulled this from Steven Morse's blog (https://stmorse.github.io/journal/espn-fantasy-v3.html) and it appears to work EXCEPT for one error that I'm getting. The error is:
File "<ipython-input-65-56a5896c1c3c>", line 3, in <listcomp>
game['away']['teamId'], game['away']['totalPoints'],
KeyError: 'away'
I've looked in the dictionary and found that 'away' is in there. What I can't figure out is why 'home' works but not 'away'. Here is the code I'm using. Any help is appreciated:
import requests
import pandas as pd
url = 'https://fantasy.espn.com/apis/v3/games/ffl/seasons/2020/segments/0/leagues/721579?view=mMatchupScore'
r = requests.get(url,
cookies={"swid": "{1E653FDE-DA4A-4CC6-A53F-DEDA4A6CC663}",
"espn_s2": "AECpfE9Zsvwwsl7N%2BRt%2BAPhSAKmSs%2F2ZmQVuHJeKG8LGgLBDfRl0j88CvzRFsrRjLmjzASAdIUA9CyKpQJYBfn6avgXoPHJgDiCqfDPspruYqHNENjoeGuGfVqtPewVJGv3rBJPFMp1ugWiqlEzKiT9IXTFAIx3V%2Fp2GBuYjid2N%2FFcSUlRlr9idIL66tz2UevuH4F%2FP6ytdM7ABRCTEnrGXoqvbBPCVbtt6%2Fu69uBs6ut08ApLRQc4mffSYCONOqW1BKbAMPPMbwgCn1d5Ruubl"})
d = r.json()
df = [[
game['matchupPeriodId'],
game['away']['teamId'], game['away']['totalPoints'],
game['home']['teamId'], game['home']['totalPoints']
] for game in d['schedule']]
df = pd.DataFrame(df, columns=['Week', 'Team1', 'Score1', 'Team2', 'Score2'])
df['Type'] = ['Regular' if w<=14 else 'Playoff' for w in df['Week']]

Seems like some of the games in the schedule don't have an away team:
{'home': {'adjustment': 0.0,
'cumulativeScore': {'losses': 0, 'statBySlot': None, 'ties': 0, 'wins': 0},
'pointsByScoringPeriod': {'14': 102.7},
'teamId': 1,
'tiebreak': 0.0,
'totalPoints': 102.7},
'id': 78,
'matchupPeriodId': 14,
'playoffTierType': 'WINNERS_BRACKET',
'winner': 'UNDECIDED'}
For nested json data like this, it's often easier to use pandas.json_normalize which flattens the data structure and gives you a data frame with lots of columns with names like home.cumulativeScore.losses etc.
df = pd.json_normalize(r.json()['schedule'])
Then you can reshape the dataframe by dropping columns you don't care about and so on.
df = pd.json_normalize(r.json()['schedule'])
column_names = {
'matchupPeriodId':'Week',
'away.teamId':'Team1',
'away.totalPoints':'Score1',
'home.teamId':'Team2',
'home.totalPoints':'Score2',
}
df = df.reindex(columns=column_names).rename(columns=column_names)
df['Type'] = ['Regular' if w<=14 else 'Playoff' for w in df['Week']]
For the games where there's no away team, pandas will populate those columns with NaN values.
df[df.Team1.isna()]

Python ValueError from np.where create flag based on one condition

If the city has been mentioned in cities_specific I would like to create a flag in the cities_all data. It's just a minimal example and in reality I would like to create multiple of these flags based on multiple data frames. That's why I tried to solve it with isin instead of a join.
However, I am running into ValueError: Length of values (3) does not match length of index (7).
# import packages
import pandas as pd
import numpy as np
# create minimal data
cities_specific = pd.DataFrame({'city': ['Melbourne', 'Cairns', 'Sydney'],
'n': [10, 4, 8]})
cities_all = pd.DataFrame({'city': ['Vancouver', 'Melbourne', 'Athen', 'Vienna', 'Cairns',
'Berlin', 'Sydney'],
'inhabitants': [675218, 5000000, 664046, 1897000, 150041, 3769000, 5312000]})
# get value error
# how can this be solved differently?
cities_all.assign(in_cities_specific=np.where(cities_specific.city.isin(cities_all.city), '1', '0'))
# that's the solution I would like to get
expected_solution = pd.DataFrame({'city': ['Vancouver', 'Melbourne', 'Athen', 'Vienna', 'Cairns',
'Berlin', 'Sydney'],
'inhabitants': [675218, 5000000, 664046, 1897000, 150041, 3769000, 5312000],
'in_cities': [0, 1, 0, 0, 1, 0, 1]})

I think you are changing the position in the condition.
Here you have some alternatives:
cities_all.assign(
in_cities_specific=np.where(cities_all.city.isin(cities_specific.city), '1', '0')
)
or
cities_all["in_cities_specific"] =
cities_all["city"].isin(cities_specific["city"]).astype(int).astype(str)
or
condlist = [cities_all["city"].isin(cities_specific["city"])]
choicelist = ["1"]
cities_all["in_cities_specific"] = np.select(condlist, choicelist,default="0")

Compare two Pandas dataframe for addition of any new rows with respect to the column

I am creating parser of changes on pseudo-table web application to push notification if there any rows were added.
Mechanic of the pseudo-table: Table on the website changes per some time and adds new rows. This page is highly dynamic and sometimes changes the existing rows. Pseudo-table automatically assigns id respecting to the sorting mechanic. So to explain precisely, sorting algorithm is alphabetic so guy ID named Adam would be 1, Bob = 2, Coul=3. But if they will add person with name Caul it would become ID 3, when Coul would become 4. This ruins all the methods I have tried so far.
I am trying right now to compare two Pandas dataframe to detect row addition and return new-added rows. I do not want to return existing rows that were changed. I tried by using concat and removing duplicates but this results in duplicate rows where there was any minor change in the data.
TL;DR EXAMPLE
Input
d1 = {'#': [1, 2, 3], 'Name': ['James Bourne', 'Steve Johns', 'Steve Jobs']}
d2 = {'#': [1, 2, 3, 4], 'Name': ['James Bourne', 'Steve Jobs', 'Great Guy', 'Steve Johns']}
df_1 = pd.DataFrame(data=d1)
df_2 = pd.DataFrame(data=d2)
# ... code
Output should be
3 Great Guy

You could try a simpler solution:
df2[ ~df2.Name.isin(df1.Name)].dropna()
Output:
# Name
2 3 Great Guy

merge dfs with (how = outer), then compare merged df to list of original Names
>>> merged = pd.merge(df_1,df_2,on='Name', how = 'outer')
>>> [x for x in enumerate(merged.Name) if x[1] not in list(df_1.Name)]
Results in: [(3, 'Great Guy')]

I found out the subset parameter in the drop_duplicates.
d1 = {'#': [1, 2, 3], 'Name': ['James Bourne', 'Steve Johns', 'Steve Jobs']}
d2 = {'#': [1, 2, 3, 4], 'Name': ['James Bourne', 'Steve Jobs', 'Great Guy', 'Steve Johns']}
df_1 = pd.DataFrame(data=d1)
df_2 = pd.DataFrame(data=d2)
df_1 = df_1.set_index('#')
df_2 = df_2.set_index('#')
df = pd.concat([df_1,df_2]).drop_duplicates(subset=['Name'], keep=False)
df
results in
Name
#
3 Great Guy
This solves my question.

Making a barchart in pandas with filtered data

I have a csv file the that has a column that a bunch of different columns. the columns thhat i am interested in are the 'Items', 'OrderDate' and 'Units'.
In my IDE I am trying to generate a bar chart of the amount of 'Pencil's sold on each individual 'OrderDate'. What I am trying to do is to look down through the 'Item' columns using pandas and check to see if the item is a pencil and then add it to the graph if it is not then dont do anything.
I think I have made it a bit long winded with the code.
i have the coe going down through the 'Iems' column and checking to see if it is a pencil but i can't figure out what to do next.
import pandas as pd
import matplotlib.pyplot as plt
d = {'item' : pd.Series(['Pencil', 'Marker', 'Pencil', 'Headphones', 'Pencil', 'The moon', 'Wish you were here album']),
'OrderDate' : pd.Series(['5/15/2020', '5/16/2020', '5/16/2020','5/15/2020', \
'5/16/2020', '5/17/2020','5/16/2020','5/16/2020','5/17/2020']),
'Units' : pd.Series([4, 3, 2, 1, 3, 2, 4, 2, 3])}
df = pd.DataFrame.from_dict(d)
df.plot(kind='bar', x='OrderDate', y='Units')
item_col = df['Item']
pencil_binary = item_col.str.count('Pencil')
for entry in item_col:
if entry == 'Pencil':
print("i am a pencil")
else:
print("i am not a pencil")
print(df)
plt.plot()
plt.show()

If I understood correctly you want to plot the number of pencils sold per day. For that, you can just filter the dataframe and keep only rows about pencils, and then use a barchart.
Here's a reproducible code that assumes that all rows have different dates:
import pandas as pd
import matplotlib.pyplot as plt
d = {'item' : pd.Series(['Pencil', 'Marker', 'Pencil', 'Headphones', 'Pencil', 'The moon', 'Wish you were here album']),
'OrderDate' : pd.Series(['5/15/2020', '5/16/2020', '5/16/2020','5/15/2020', \
'5/16/2020', '5/17/2020','5/16/2020','5/16/2020','5/17/2020']),
'Units' : pd.Series([4, 3, 2, 1, 3, 2, 4, 2, 3])}
df = pd.DataFrame.from_dict(d)
#This dataframe only has pencils
df_pencils = df[df.item == 'Pencil']
df_pencils.groupby('OrderDate').agg('Units').sum().plot(kind='bar', x='OrderDate', y='Units')
df.plot(kind='bar', x='OrderDate', y='Units')
The groupby is used for grouping all rows with the same date, and, for each group, add up the Units sold.
In fact, when you do this:
df_pencils.groupby('OrderDate').agg('Units').sum()
this is the output:
OrderDate
5/15/2020 4
5/16/2020 5
Name: Units, dtype: int64
If you want a one liner, it's:
df[df.item == 'Pencil'].groupby('OrderDate').agg('Units').sum().plot(kind='bar', x='OrderDate', y='Units')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plotting names with Chinese letters throws a error - python

Related

How to convert datatype of the columns?

Python Trouble for ESPN FF

Python ValueError from np.where create flag based on one condition

Compare two Pandas dataframe for addition of any new rows with respect to the column

Making a barchart in pandas with filtered data

Categories

Resources