Combine Bar and line plots in one chart for matplotlib [duplicate] - python

I am trying to plot a chart with the 1st and 2nd columns of data as bars and then a line overlay for the 3rd column of data.
I have tried the following code but this creates 2 separate charts but I would like this all on one chart.
left_2013 = pd.DataFrame({'month': ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec'],
'2013_val': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 9, 6]})
right_2014 = pd.DataFrame({'month': ['jan', 'feb'], '2014_val': [4, 5]})
right_2014_target = pd.DataFrame({'month': ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec'],
'2014_target_val': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]})
df_13_14 = pd.merge(left_2013, right_2014, how='outer')
df_13_14_target = pd.merge(df_13_14, right_2014_target, how='outer')
df_13_14_target[['month','2013_val','2014_val','2014_target_val']].head(12)
plt.figure()
df_13_14_target[['month','2014_target_val']].plot(x='month',linestyle='-', marker='o')
df_13_14_target[['month','2013_val','2014_val']].plot(x='month', kind='bar')
This is what I currently get

The DataFrame plotting methods return a matplotlib AxesSubplot or list of AxesSubplots. (See the docs for plot, or boxplot, for instance.)
You can then pass that same Axes to the next plotting method (using ax=ax) to draw on the same axes:
ax = df_13_14_target[['month','2014_target_val']].plot(x='month',linestyle='-', marker='o')
df_13_14_target[['month','2013_val','2014_val']].plot(x='month', kind='bar',
ax=ax)
import pandas as pd
import matplotlib.pyplot as plt
left_2013 = pd.DataFrame(
{'month': ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep',
'oct', 'nov', 'dec'],
'2013_val': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 9, 6]})
right_2014 = pd.DataFrame({'month': ['jan', 'feb'], '2014_val': [4, 5]})
right_2014_target = pd.DataFrame(
{'month': ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep',
'oct', 'nov', 'dec'],
'2014_target_val': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]})
df_13_14 = pd.merge(left_2013, right_2014, how='outer')
df_13_14_target = pd.merge(df_13_14, right_2014_target, how='outer')
ax = df_13_14_target[['month', '2014_target_val']].plot(
x='month', linestyle='-', marker='o')
df_13_14_target[['month', '2013_val', '2014_val']].plot(x='month', kind='bar',
ax=ax)
plt.show()

Related

Sort by key (Month) using RDDs in Pyspark

I have this RDD and wanna sort it by Month (Jan --> Dec). How can i do it in pyspark?
Note: Don't want to use spark.sql or Dataframe.
+-----+-----+
|Month|count|
+-----+-----+
| Oct| 1176|
| Sep| 1167|
| Dec| 2084|
| Aug| 1126|
| May| 1176|
| Jun| 1424|
| Feb| 1286|
| Nov| 1078|
| Mar| 1740|
| Jan| 1544|
| Apr| 1080|
| Jul| 1237|
+-----+-----+
You can use rdd.sortBy with a helper dictionary available in python's calendar module or create your own month dictionary:
import calendar
d = {i:e for e,i in enumerate(calendar.month_abbr[1:],1)}
#{'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6, 'Jul': 7,
#'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
myrdd.sortBy(keyfunc=lambda x: d.get(x[0])).collect()
[('Jan', 1544),
('Feb', 1286),
('Mar', 1740),
('Apr', 1080),
('May', 1176),
('Jun', 1424),
('Jul', 1237),
('Aug', 1126),
('Sep', 1167),
('Oct', 1176),
('Nov', 1078),
('Dec', 2084)]
myList = myrdd.collect()
my_list_dict = dict(myList)
months = ['Jan', 'Feb', 'Mar', 'Apr', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
newList = []
for m in months:
newList.append((m, my_list_dict[m]))
print(newList)

pass both string and list to pandas .isin method

I am trying to pass both a string and a list to the pandas .isin() method. Here is my code below
overall_months = ['APR', 'JUL', 'NOV', 'MAR', 'FEB', 'AUG', 'SEP', 'OCT', 'JAN', 'DEC', 'MAY',
'JUN', ['APR', 'JUL', 'NOV', 'MAR', 'FEB', 'AUG', 'SEP', 'OCT', 'JAN', 'DEC', 'MAY', 'JUN']]
for mon in overall_months:
temp_df = df.month.isin([[mon]]))
The issue here is the .isin([]) is fine for each iteration of a string, but when i get to overall_months[-1], its a list and you cannot pass a list into .isin([]) syntax. Ive tried this but cannot remove the double quotes because my understanding is strings are immutable:
str(overall_months[-1]).replace('[', '').replace(']','')
This produces: "'APR', 'JUL', 'NOV', 'MAR', 'FEB', 'AUG', 'SEP', 'OCT', 'JAN', 'DEC', 'MAY', 'JUN'"
It could be passed to my syntax if it was: 'APR', 'JUL', 'NOV', 'MAR', 'FEB', 'AUG', 'SEP', 'OCT', 'JAN', 'DEC', 'MAY', 'JUN'
Any help in the best way to accomplish this?
You can check if the element is a list with isinstance:
for mon in overall_months:
if not isinstance(mon, list): mon = [mon]
tmp_df = df.month.isin(mon)

How to create a python dict with a default value from a list?

I have a list of names (say months) in a list. How can I create a dict with same value (say 0) without a comprehension if it is possible in some way?
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
What need is:
months_values = {'Jan': 0, 'Feb': 0, 'Mar': 0, 'Apr': 0, 'May': 0, 'Jun': 0}
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
months_dict = dict.fromkeys(months,0)
month = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
dict1=dict.fromkeys(months,0)
print(dict1)

How to fill missing values in a list that belong to another list using Python?

Considering having this type of lists:
month_list = ['Mar', 'Aug', 'Okt', 'Nov']
value_for_each_month = [4, 10, 8, 5]
So, each value belongs to the month in the month_list, e.g. 'Mar' --> 4, 'Aug' --> 10 and so on..
Now, how to fill both lists in Python to achieve this result:
month_list_new = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
value_for_each_month_new = [0, 0, 4, 0, 0, 0, 0, 10, 0, 8, 5, 0]
Create a dictionary mapping month names to values...
>>> month_list = ['Mar', 'Aug', 'Okt', 'Nov']
>>> value_for_each_month = [4, 10, 8, 5]
>>> month_values = dict(zip(month_list, value_for_each_month))
>>> month_values
{'Aug': 10, 'Mar': 4, 'Nov': 5, 'Okt': 8}
... than use that dict in a list comprehension:
>>> month_list_new = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
>>> value_for_each_month_new = [month_values.get(m, 0) for m in month_list_new]
>>> value_for_each_month_new
[0, 0, 4, 0, 0, 0, 0, 10, 0, 0, 5, 0]

python appending elements to a list from a list

I would like to create a list that adds elements alternately from 2 seperate lists in python .
I had the following idea but it doesn't seem to work:
t1 = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
t2 = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sept', 'Oct', 'Nov', 'Dec']
t3= [len(t1)+len(t2)]
a = 0
while a < len(t1)+len(t2):
t3.extend(t1[a])
t3.extend(t2[a])
a = a + 1
print t3
So basically I would like ['Jan',31,'Feb',28,'Mar',31, ect...]
The shortest solution may be:
list(sum(zip(t2, t1), ()))
In Python you don't need to "reserve capacity" for a list. Just write
t3 = []
In fact, t3 = [len(t1)+len(t2)] doesn't even creates a list with length 24, but creates a list with a single entry [24].
t1[a] and t2[a] are elements you want to add to the list. To add an element, you use the .append method:
t3.append(t1[a])
t3.append(t2[a])
.extend is used to add a list (in fact, any iterable) to a list, e.g.
t3.extend([t1[a], t2[a]])
The problem itself can be solved easily using list comprehensions.
[a for l in zip(t2, t1) for a in l]
There are many other improvements could be made (e.g. use a for loop instead of a while loop). You could take it to http://codereview.stackexchange.com.
(BTW, this code does not handle leap year.)
Here you go:
t1 = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
t2 = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sept', 'Oct', 'Nov', 'Dec']
t3 = list()
for i, j in zip(t1, t2):
t3.append(i)
t3.append(j)
print(t3)
Just zip the lists and flatten the result.
>>> from itertools import chain
>>> t1 = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
>>> t2 = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
... 'Jul', 'Aug', 'Sept', 'Oct', 'Nov', 'Dec']
>>> list(chain(*zip(t2, t1)))
['Jan', 31, 'Feb', 28, 'Mar', 31, 'Apr', 30, 'May', 31, 'Jun', 30, 'Jul', 31, 'Aug', 31, 'Sept', 30, 'Oct', 31, 'Nov', 30, 'Dec', 31]
Without chain:
>>> [x for tup in zip(t2, t1) for x in tup]
['Jan', 31, 'Feb', 28, 'Mar', 31, 'Apr', 30, 'May', 31, 'Jun', 30, 'Jul', 31, 'Aug', 31, 'Sept', 30, 'Oct', 31, 'Nov', 30, 'Dec', 31]
you probably have to read more about python lists and their methods. t3= [len(t1)+len(t2)] this is not necessary at all. I guess you have a C background and trying to initialize the list with size. In python you don't have to initialize the list size (its auto increasing). And the items you have in a list are not stored as per the sequence you have entered them in. Please check tuple in python if you want your items to be in the same sequence.
Happy Coding
t1 = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
t2 = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sept', 'Oct', 'Nov', 'Dec']
arr = []
for i in range(12):
arr.append(t2[i])
arr.append(t1[i])
print(arr)
Output -
['Jan', 31, 'Feb', 28, 'Mar', 31, 'Apr', 30, 'May', 31, 'Jun', 30, 'Jul', 31, 'Aug', 31, 'Sept', 30, 'Oct', 31, 'Nov', 30, 'Dec', 31]
You can alternatively write -
import itertools
arr = list(itertools.chain.from_iterable(zip(t2, t1))
In Python, you can't create lists with a fixed length like you can do with arrays in other languages, so the third line should just be t3 = [].
Also, the extend() function is used to concatenate lists. To add a single new value, you need to use the append() function instead.
Python is dynamic programming language, the type of the identifier is determined when it is assigned value.
so basically you can do in this way:
t1 = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
t2 = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sept', 'Oct', 'Nov', 'Dec']
t3= []
for a in range(len(t1)):
append.append(t1[a])
apppend.append(t2[a])
print t3

Categories

Resources