NOTE FROM BOKEH MAINTAINER: The bokeh.charts API including TimeSeries was deprecated and removed a long time ago. This question is not relevant as-is to any recent or future versions of Bokeh. To plot time series, use the stable and supported bokeh.plotting API. Some examples can be found here.
I am trying to plot a Timeseries plot with categories.
xaxis_values: startTIme
yaxis_values: count
groupby: day
Every day has 24 hours data sets and like this the entire dataset has more than 100 days I am trying to have few type of plots.
Group by day and sum all the counts of every hours from startTime
which will give 7 time series plots in one graph.
Separate by day i.e. every mon, tue, wed and so on whatever the number of days n and plot a 24 hrs time series.
Group by hours irrespective of days. i.e. 00:00:00, 01:00:00 and so on
What is the best way to get the better visualization with bokeh or seaborn .
Input:
2004-01-05,22:00:00,23:00:00,Mon,18944,790
2004-01-05,23:00:00,00:00:00,Mon,17534,750
2004-01-06,00:00:00,01:00:00,Tue,17262,747
2004-01-06,01:00:00,02:00:00,Tue,19072,777
2004-01-06,02:00:00,03:00:00,Tue,18275,785
2004-01-06,03:00:00,04:00:00,Tue,13589,757
2004-01-06,04:00:00,05:00:00,Tue,16053,735
2004-01-06,05:00:00,06:00:00,Tue,11440,636
2004-01-06,06:00:00,07:00:00,Tue,5972,513
2004-01-06,07:00:00,08:00:00,Tue,3424,382
2004-01-06,08:00:00,09:00:00,Tue,2696,303
2004-01-06,09:00:00,10:00:00,Tue,2350,262
2004-01-06,10:00:00,11:00:00,Tue,2309,254
Code: Reference: Here
import numpy as np
import pandas as pd
from bokeh.charts import TimeSeries, show, output_file, vplot
output_file("timeseries.html")
data_one = pd.read_csv('one_hour.csv')
data_one.columns = ['date', 'startTime', 'endTime', 'day', 'count', 'unique']
data = dict(data_one=data_one['count'])
tsline = TimeSeries(data,
x='startTime', y='count',
color=['day'], title="Timeseries", ylabel='count', legend=True)
show(vplot(tsline))
Error:
Traceback (most recent call last):
File "date_graph.py", line 10, in <module>
data = dict(data_one=data_one['count'])
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1997, in __getitem__
return self._getitem_column(key)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2004, in _getitem_column
return self._get_item_cache(key)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1350, in _get_item_cache
values = self._data.get(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 3290, in get
loc = self.items.get_loc(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/indexes/base.py", line 1947, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)
File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)
File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)
File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)
KeyError: 'count
Edit: After changing
data = dict(data_one=data_one['count'].tolist())
Error:
Traceback (most recent call last):
File "date_graph.py", line 12, in <module>
tsline = TimeSeries(data, x='startTime', y='count', color=['startTime'], title="Timeseries", ylabel='count', legend=True)
File "/usr/local/lib/python2.7/dist-packages/bokeh/charts/builders/timeseries_builder.py", line 102, in TimeSeries
return create_and_build(builder_type, data, **kws)
File "/usr/local/lib/python2.7/dist-packages/bokeh/charts/builder.py", line 67, in create_and_build
chart.add_builder(builder)
File "/usr/local/lib/python2.7/dist-packages/bokeh/charts/chart.py", line 149, in add_builder
builder.create(self)
File "/usr/local/lib/python2.7/dist-packages/bokeh/charts/builder.py", line 518, in create
chart.add_renderers(self, renderers)
File "/usr/local/lib/python2.7/dist-packages/bokeh/charts/chart.py", line 144, in add_renderers
self.renderers += renderers
File "/usr/local/lib/python2.7/dist-packages/bokeh/core/property_containers.py", line 18, in wrapper
result = func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/bokeh/core/property_containers.py", line 77, in __iadd__
return super(PropertyValueList, self).__iadd__(y)
File "/usr/local/lib/python2.7/dist-packages/bokeh/charts/builders/line_builder.py", line 230, in yield_renderers
x=group.get_values(self.x.selection),
File "/usr/local/lib/python2.7/dist-packages/bokeh/charts/data_source.py", line 173, in get_values
return self.data[selection]
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1997, in __getitem__
return self._getitem_column(key)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2004, in _getitem_column
return self._get_item_cache(key)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1350, in _get_item_cache
values = self._data.get(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 3290, in get
loc = self.items.get_loc(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/indexes/base.py", line 1947, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)
File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)
File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)
File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)
KeyError: 'startTime'
Related
I am trying to make histogram of an excel data. I used the following code:
import pandas as pd
import matplotlib.pyplot as py
data=pd.read_excel('file.xlsx',header=1, parse_cols="Q")
plt.hist(data, bin=10)
plt.show()
But it gives this error:
Traceback (most recent call last):
File "<ipython-input-1-0ffa7ab287c3>", line 4, in <module>
plt.hist(data, bin=10)
File "C:\Program Files\Anaconda\lib\site-packages\matplotlib\pyplot.py", line 2890, in hist
stacked=stacked, **kwargs)
File "C:\Program Files\Anaconda\lib\site-packages\matplotlib\axes\_axes.py", line 5562, in hist
if isinstance(x, np.ndarray) or not iterable(x[0]):
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\frame.py", line 1678, in __getitem__
return self._getitem_column(key)
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\frame.py", line 1685, in _getitem_column
return self._get_item_cache(key)
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\generic.py", line 1052, in _get_item_cache
values = self._data.get(item)
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\internals.py", line 2565, in get
loc = self.items.get_loc(item)
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\index.py", line 1181, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "index.pyx", line 129, in pandas.index.IndexEngine.get_loc (pandas\index.c:3656)
File "index.pyx", line 149, in pandas.index.IndexEngine.get_loc (pandas\index.c:3534)
File "hashtable.pyx", line 696, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:11911)
File "hashtable.pyx", line 704, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:11864)
KeyError: 0
This is how my data looks like
Does anybody have an idea about how can I fix this?
I have a file with 3'502'379 rows and 3 columns. The following script is supposed to be executed but raises and error in the date handling line:
import matplotlib.pyplot as plt
import numpy as np
import csv
import pandas
path = 'data_prices.csv'
data = pandas.read_csv(path, sep=';')
data['DATE'] = pandas.to_datetime(data['DATE'], format='%Y%m%d')
This is the error that occurs:
Traceback (most recent call last):
File "C:\Program Files\Python35\lib\site-packages\pandas\indexes\base.py", line 1945, in get_loc
return self._engine.get_loc(key)
File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4066)
File "pandas\index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas\index.c:3930)
File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)
File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)
KeyError: 'DATE'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\data\script.py", line 15, in <module>
data['DATE'] = pandas.to_datetime(data['DATE'], format='%Y%m%d')
File "C:\Program Files\Python35\lib\site-packages\pandas\core\frame.py", line 1997, in __getitem__
return self._getitem_column(key)
File "C:\Program Files\Python35\lib\site-packages\pandas\core\frame.py", line 2004, in _getitem_column
return self._get_item_cache(key)
File "C:\Program Files\Python35\lib\site-packages\pandas\core\generic.py", line 1350, in _get_item_cache
values = self._data.get(item)
File "C:\Program Files\Python35\lib\site-packages\pandas\core\internals.py", line 3290, in get
loc = self.items.get_loc(item)
File "C:\Program Files\Python35\lib\site-packages\pandas\indexes\base.py", line 1947, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4066)
File "pandas\index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas\index.c:3930)
File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)
File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)
KeyError: 'DATE'
the '\ufeffDATE' in the first column name shows that your CSV file has a UTF-16 Byte Order Mark (BOM) signature so it must be read accordingly.
so try this when reading your CSV:
df = pd.read_csv(path, sep=';', encoding='utf-8-sig')
or as #EdChum suggested:
df = pd.read_csv(path, sep=';', encoding='utf-16')
both variants should work properly
PS this answer shows how to deal with BOMs
I want to group data in a dataframe I have oo the Column "Count" and by another column "State". I would like to output a list of list, each sub set list would just be the count for each state.
example output: [[120,200], [40, 20, 40], ...]
120 and 200 would be counts for let's say the State California
I tried the following:
df_new = df[['State']].groupby(['Count']).to_list()
I get a keyerror: 'count'
Traceback:
Traceback (most recent call last):
File "C:\Users\Michael\workspace\UCIIntrotoPythonDA\src\Michael_Madani_week3.py", line 84, in <module>
getStateCountsDF(filepath)
File "C:\Users\Michael\workspace\UCIIntrotoPythonDA\src\Michael_Madani_week3.py", line 81, in getStateCountsDF
df_new = df[['State']].groupby(['Count']).to_list()
File "C:\Users\Michael\Anaconda\lib\site-packages\pandas\core\generic.py", line 3159, in groupby
sort=sort, group_keys=group_keys, squeeze=squeeze)
File "C:\Users\Michael\Anaconda\lib\site-packages\pandas\core\groupby.py", line 1199, in groupby
return klass(obj, by, **kwds)
File "C:\Users\Michael\Anaconda\lib\site-packages\pandas\core\groupby.py", line 388, in __init__
level=level, sort=sort)
File "C:\Users\Michael\Anaconda\lib\site-packages\pandas\core\groupby.py", line 2148, in _get_grouper
in_axis, name, gpr = True, gpr, obj[gpr]
File "C:\Users\Michael\Anaconda\lib\site-packages\pandas\core\frame.py", line 1797, in __getitem__
return self._getitem_column(key)
File "C:\Users\Michael\Anaconda\lib\site-packages\pandas\core\frame.py", line 1804, in _getitem_column
return self._get_item_cache(key)
File "C:\Users\Michael\Anaconda\lib\site-packages\pandas\core\generic.py", line 1084, in _get_item_cache
values = self._data.get(item)
File "C:\Users\Michael\Anaconda\lib\site-packages\pandas\core\internals.py", line 2851, in get
loc = self.items.get_loc(item)
File "C:\Users\Michael\Anaconda\lib\site-packages\pandas\core\index.py", line 1572, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "pandas\index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas\index.c:3824)
File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:3704)
File "pandas\hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12280)
File "pandas\hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12231)
KeyError: 'Count'
I feel like this should be a simple line of code, what am I doing wrong here?
It is possible as a one-liner:
import pandas as pd
df = pd.DataFrame.from_dict({"State": ["ny", "or", "ny", "nm"],
"Counts": [100,300,200,400]})
list_new = df.groupby("State")["Counts"].apply(list).tolist()
print(list_new)
[[400], [100, 200], [300]]
You should read the doc of groupby to see what the expected outcome of the grouping is and how to change that (http://pandas.pydata.org/pandas-docs/stable/groupby.html).
I have searched and cannot find anyone else with this problem. I am trying to create a pivot table summarizing a csv file, and then email that pivot to myself. I have already built out the code to perform this process, but it is not working universally. I keep getting a KeyError on my column name, but if I delete all columns and rows that are not part of the table it miraculously works.
Here is my code:
df = pandas.read_csv('/path/to/file'),encoding='utf-8')
pivot = pandas.pivot_table(df,index=['ClientID','ClientName','Branch'],
values=['EmailAddress'],aggfunc='count',margins=True)
pivotlocation = '/path/to/save'
pivot.to_csv(pivotlocation)
For the life of me, I cannot figure out what is going wrong, or why this works on some files and not others.
Also, here is the error that is thrown:
Traceback (most recent call last):
File "C:\Users\rfulton\Desktop\Automation\Reports\UniversalUpload.py", line 86, in create_pivot
pivot = pandas.pivot_table(df,index=columns,values=aggvalue,aggfunc='count',margins=True)
File "C:\Python34\lib\site-packages\pandas\util\decorators.py", line 88, in wrapper
return func(*args, **kwargs)
File "C:\Python34\lib\site-packages\pandas\util\decorators.py", line 88, in wrapper
return func(*args, **kwargs)
File "C:\Python34\lib\site-packages\pandas\tools\pivot.py", line 114, in pivot_table
grouped = data.groupby(keys)
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 2898, in groupby
sort=sort, group_keys=group_keys, squeeze=squeeze)
File "C:\Python34\lib\site-packages\pandas\core\groupby.py", line 1193, in groupby
return klass(obj, by, **kwds)
File "C:\Python34\lib\site-packages\pandas\core\groupby.py", line 383, in __init__
level=level, sort=sort)
File "C:\Python34\lib\site-packages\pandas\core\groupby.py", line 2131, in _get_grouper
in_axis, name, gpr = True, gpr, obj[gpr]
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 1780, in __getitem__
return self._getitem_column(key)
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 1787, in _getitem_column
return self._get_item_cache(key)
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 1068, in _get_item_cache
values = self._data.get(item)
File "C:\Python34\lib\site-packages\pandas\core\internals.py", line 2849, in get
loc = self.items.get_loc(item)
File "C:\Python34\lib\site-packages\pandas\core\index.py", line 1402, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "pandas\index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas\index.c:3807)
File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:3687)
File "pandas\hashtable.pyx", line 696, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12310)
File "pandas\hashtable.pyx", line 704, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12261)
KeyError: 'ClientID'
As I stated above, if I delete all cells outside the bounds of the table, this error is no longer thrown. However, I am not sure of how to do this with the csv or pandas modules.
Turns out that the issue was the encoding of the file.
Setting the encoding to utf-8-sig fixed the issue.
I have the following code in python.
d1=pd.read_excel("data1.xlsx",sheetname=0, index_col = "Time", parse_date=True)
d1=d1.sort_index()
for val in d1.columns:
start_his=d1[str(val)]
The data1.xlsx contains
Time Start High Low End 5
05/12/2014 28,000 31,500 27,400 29,900 29,740.00
28/11/2014 29,450 29,450 27,950 28,250 30,190.00
21/11/2014 30,500 30,500 28,100 29,300 30,840.00
14/11/2014 31,750 31,750 29,600 29,900 31,200.00
07/11/2014 32,250 32,750 30,500 31,350 31,620.00
31/10/2014 31,800 33,000 31,300 32,150 32,230.00
24/10/2014 31,300 32,750 30,800 31,500 32,680.00
17/10/2014 32,000 32,150 30,550 31,100 33,270.00
10/10/2014 34,550 34,550 32,000 32,000 33,920.00
02/10/2014 35,000 35,000 32,400 34,400 34,560.00
26/09/2014 34,600 35,000 33,600 34,400 34,770.00
19/09/2014 34,350 34,750 32,200 34,450 35,070.00
I got the Output for the first 4 columns(Start, High, Low & End) but 5th Columns shows key error. If I change the numeric '5' to any other value (i.e., Alpha numeric) it works correctly but it is not accepting numbers in column header. Please help to solve this error.
and i got the error is
Traceback (most recent call last):
File "ARIMA & Byesian.py", line 94, in <module>
start_his=d1[str(val)]
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1780, in __getitem__
return self._getitem_column(key)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1787, in _getitem_column
return self._get_item_cache(key)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1068, in _get_item_cache
values = self._data.get(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 2849, in get
loc = self.items.get_loc(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/index.py", line 1402, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3812)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3692)
File "pandas/hashtable.pyx", line 696, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12299)
File "pandas/hashtable.pyx", line 704, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12250)
KeyError: '5'
You are using str(val) and 5 will be made '5'. Use val directly as in:
d1=pd.read_excel("data1.xlsx",sheetname=0, index_col = "Time", parse_date=True)
d1=d1.sort_index()
for val in d1.columns:
start_his=d1[val]