Most efficient way to plot Histogram using dictionary? [duplicate] - python

This question already has answers here:
How to plot keys and values from dictionary in histogram
(3 answers)
Closed 2 years ago.
I am analyzing chess games in python and am trying to generate a histogram of number of moves per game. The goal is to obtain something similar to this visualization, where there seems to be on average 70 moves per game:
Currently I have an unordered dict:
{'106': 38,
'100': 46,
'65': 58,
'57': 47,
'54': 31,
'112': 29,
'93': 35,
'91': 44,
...
'109': 35,
'51': 26}
where the keys denote number of moves, and values denote number of games.
I am finding it stupidly difficult to extract the dictionary data for plotting the histogram. I have tried extracting to a dataframe but am unable to get it in a matplotlib/seaborn readable format.

Does this work for you?
import matplotlib.pyplot as plt
dict = {'106': 38,...}
plt.bar([int(d) for d in dict.keys()], dict.values())
plt.show()

Firstly, we are going to sort your dictionary with dict comprehension:
d = {'106': 38,
'100': 46,
'65': 58,
'57': 47,
'54': 31,
'112': 29,
'93': 35,
'91': 44,
'109': 35,
'51': 26}
sorted_d = {key:d[key] for key in sorted(d, key = lambda x: int(x))}
And now we display it with barplot:
import seaborn as sns
sns.barplot(x=list(sorted_d .keys()), y = list(d.sorted_d ()))
This is the result you get:
If you don't sort dictionary you get this:

Related

Calculating multiple percentages from a column and creating a loop based on that in pandas

I have a masterfile which stores employee data, and they can receive bonus each month based on their salary between (x% to y%).
import pandas as pd
masterfile = {'Name': ['Joe', 'Jack', 'William', 'Avarel'], 'Base Salary': [100, 80, 60, 40],'Min Bonus': [80, 80, 70, 60],'Max Bonus': [120, 120, 110, 100],'Local Currency': ['EURO', 'EURO', '$', '$']}
df = pd.DataFrame(masterfile)
print(masterfile)
df["Base Salary"]=df["Base Salary"].apply('{:,.2f}'.format)
Name=df["Name"].values
Base=df["Base Salary"].values
localc=df["Local Currency"].values
zipped = zip(Name,Base,localc)
for a,b,c in zipped:
doc=DocxTemplate(file)
context = {"Name":a,"Base":b,"localc":c}
doc.render(context)
doc.save('{}.docx'.format(a))
With this code I can export it to different .docx files but only the base salary. What I am trying to do is to calculate each bonus percentage between min and max bonus(i.e. for Joe : 80%,81%,82%etc until 120%) and create a different column for each percentage.

IndexingError: Too many indexers while using iloc

I have a dataframe from which I am trying to add attributes to my graph edges.
dataframe having mean_travel_time which is going to be the attribute for my edge
Plus, I have a data list which consists of source nodes and destination nodes as a tuple, like this.
[(1160, 2399),
(47, 1005)]
Now, while using set_edge_attribute to add attributes, I need my data into a dictionary:
{(1160, 2399):1434.67,
(47, 1005):2286.10,
}
I did something like this:
data_dict={}#Empty dictionary
for i in data:
data_dict[i] = df1['mean_travel_time'].iloc[i]#adding values
But, I am getting error saying too many indexers
Can anyone help me out with the error?
Please provide your data in a format easy to copy:
df = pd.DataFrame({
'index': [1, 9, 12, 18, 26],
'sourceid': [1160, 70, 1190, 620, 1791],
'dstid': [2399, 1005, 4, 103, 1944],
'month': [1] * 5,
'distance': [1434.67, 2286.10, 532.69, 593.20, 779.05]
})
If you are trying to iterate through a list of edges such as (1,2) you need to set an index for your DataFrame first:
df1.set_index(['sourceid', 'dstid'])
You could then access specific edges:
df.set_index(['sourceid', 'dstid']).loc[(1160, 2399)]
Or use a list of edges:
edges = zip(df['sourceid'], df['dstid'])
df.set_index(['sourceid', 'dstid']).loc[edges]
But you don't need to go any of this because, in fact, you can get your entire dict all in one go:
df.set_index(['sourceid', 'dstid'])['mean_travel_time'].to_dict()

How do I get the list in another format? [duplicate]

This question already has answers here:
How to group dataframe rows into list in pandas groupby
(17 answers)
Closed 2 years ago.
I have a data frame where I only want values ​​that contain a certain value. I've already implemented that. What I want now is the list grouped by user. What I get is every single element of the data frame in a list. How do I get this list [[User1.item1, ..., user1, itemn], ..., [Usern.item1, ..., usern, itemn]]
d = {'userid': [0, 0, 0, 1, 2, 2, 3, 3, 4, 4, 4],
'itemid': [715, 845, 98, 12324, 85, 715, 2112, 85, 2112, 852, 102]}
df = pd.DataFrame(data=d)
print(df)
users = df.loc[df.itemid == 715, "userid"]
df_new = df.loc[df.userid.isin(users)]
list_new = df_new[['itemid']].values.tolist()
# What I get
[[715],[845],[98],[85],[715]]
# What I want
[[715,845,98],[85,715]]
You may use a groupby operation
list_new = df_new.groupby("userid")['itemid'].apply(list).tolist()
print(list_new) # [[715, 845, 98], [85, 715]]
The intermediate operation is
list_new = df_new.groupby("userid")['itemid'].apply(list)
print(list_new)
userid
0 [715, 845, 98]
2 [85, 715]
Name: itemid, dtype: object
If you want to do all of your code in one line, you can use list comprehension:
[x for x in [*df.groupby('userid')['itemid'].apply(list)] if 715 in x]
[[715, 845, 98], [85, 715]]
The code:
[*df.groupby('userid')['itemid'].apply(list)]
is equivalent to
df_new.groupby("userid")['itemid'].apply(list).tolist()
and the remaining part is just looping through what is generated from that master list ^^^ to see if 715 is in any of the sublists, where x is the sublists in the code above.
1.We need to group our data by user id. Grouping is very important in many applications, such as in field of Machine learning preprocessing:
Example: Suppose our data is collected from sensors at various stations which are located at various parts of a state. Suppose we are measuring pressure and temperature. Suppose for our understanding let there be three stations Station-1, Station-2 and Station-3. In many practical scenarios we may have missing values in our data. If we use entire data to fill missing values, we may not get good results. But if we only use it's station's data to fill missing values we can get good results(Since conditions are different at different stations. But it is similar at particular station).
ans = df.groupby('userid')['itemid'].apply(list)
userid
0 [715, 845, 98]
1 [12324]
2 [85, 715]
3 [2112, 85]
4 [2112, 852, 102]
Name: itemid, dtype: object
Each row gives each user's all itemid's

Python - How to create an array from a sql command with just the values?

I want to create an array from the data I got from my database.
My database looks something like this:
name, age, height, birth_month
John, 57, 2.11, April
Rico, 57, 1.05, June
Max, 57, 1.50, December
Lisa, 35, 1.23, July
Beth, 21, 1.66, July
Luna, 89, 2.3, July`
Now I want an array excluding name and putting birth_month first.
Here is the array I want to create:
data = [[April, 57, 2.11],
[June, 57, 1.05],
[December, 57, 1.50],
[July, 35, 1.23],
[July, 21, 1.66],
[July, 89, 2.3 ]]
So what I tried to do was this:
mycursor = mydb.cursor()
mycursor.execute("SELECT birth_month, age, height FROM customers")
print(mycursor.fetchall())
But this is not exactly what I want. How can I get an array formatted, just as I want?
This is what I get:
[[(u'April', Decimal('57'), Decimal('2.11')),
(u'June', Decimal('57'), Decimal('1.05')),
(u'December', Decimal('57'), Decimal('1.50')),
(u'July', Decimal('35'), Decimal('1.23')),
(u'July', Decimal('21'), Decimal('1.66')),
(u'July', Decimal('89'), Decimal('2.3'))]]
What I want is that the "Decimal" and "u'" dissapears, and that every row has seperate [].
mycursor.execute("SELECT name,age,height FROM customers")
print(mycursor.fetchall())

Call multiple Rows in Pandas

I am trying to drop multiple rows from my data.
I can drop rows using:
dt=dt.drop([40,41,42,43,44,45])
But I was wondering if there is a simpler way. I tried:
dt=dt.drop([40:45])
But sadly it did not work.
I will recommend np.r_
df.drop(np.r_[40:50+1])
In case you want to drop two range at the same time
np.r_[40:50+1,1:4+1]
Out[719]: array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 1, 2, 3, 4])
Assuming you want to drop a range of positions:
df.drop(df.index[40: 46])
This doesn't assume the indices are integers.
You can use:
dt = dt.drop(range(40,46))
or
dt.drop(range(40,46), inplace=True)
You could generate the list based on a range:
dt=dt.drop([x for x in range(40, 46)])
Or just:
dt=dt.drop(range(40, 46))

Categories

Resources