ValueError when making histogram of excel data

ValueError when making histogram of excel data - python

I am trying to make histogram of an excel data. I used the following code:
import pandas as pd
import matplotlib.pyplot as py
data=pd.read_excel('file.xlsx',header=1, parse_cols="Q")
plt.hist(data, bin=10)
plt.show()
But it gives this error:
Traceback (most recent call last):
File "<ipython-input-1-0ffa7ab287c3>", line 4, in <module>
plt.hist(data, bin=10)
File "C:\Program Files\Anaconda\lib\site-packages\matplotlib\pyplot.py", line 2890, in hist
stacked=stacked, **kwargs)
File "C:\Program Files\Anaconda\lib\site-packages\matplotlib\axes\_axes.py", line 5562, in hist
if isinstance(x, np.ndarray) or not iterable(x[0]):
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\frame.py", line 1678, in __getitem__
return self._getitem_column(key)
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\frame.py", line 1685, in _getitem_column
return self._get_item_cache(key)
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\generic.py", line 1052, in _get_item_cache
values = self._data.get(item)
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\internals.py", line 2565, in get
loc = self.items.get_loc(item)
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\index.py", line 1181, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "index.pyx", line 129, in pandas.index.IndexEngine.get_loc (pandas\index.c:3656)
File "index.pyx", line 149, in pandas.index.IndexEngine.get_loc (pandas\index.c:3534)
File "hashtable.pyx", line 696, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:11911)
File "hashtable.pyx", line 704, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:11864)
KeyError: 0
This is how my data looks like
Does anybody have an idea about how can I fix this?

Related

Python - zipping and list

I just started coding python a couple weeks ago and getting my hands dirty. However I am not being able to get past the problem with zipping and list.
here's my code:
import pandas as pd
df_reader = pd.read_csv('Indicators.csv', chunksize=1000)
df_urb_pop = next(df_reader)
df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode']=='CEB']
zipped = zip(df_pop_ceb['Population, total'], df_pop_ceb['Urban population (% of total)'])
pops_list = list(zipped)
print(pops_list)
this is the error ive been getting:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/indexes/base.py", line 2134, in get_loc
return self._engine.get_loc(key)
File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 'Population, total'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/mubashirsultan/PycharmProjects/TECH6360/Iterators practice.py", line 9, in <module>
zipped = zip(df_pop_ceb['Population, total'], df_pop_ceb['Urban population (% of total)'])
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/frame.py", line 2059, in __getitem__
return self._getitem_column(key)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/frame.py", line 2066, in _getitem_column
return self._get_item_cache(key)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/generic.py", line 1386, in _get_item_cache
values = self._data.get(item)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/internals.py", line 3543, in get
loc = self.items.get_loc(item)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/indexes/base.py", line 2136, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 'Population, total'
Process finished with exit code 1
Not quite sure what mistake ive made. a little help would be appreciated. thanks

I have download the Indicators.csv from the link you provided, the header of the csv file only contain this keys:
import pandas as pd
df_reader = pd.read_csv('Indicators.csv')
df_reader.keys()
Output:
Index([u'CountryName', u'CountryCode', u'IndicatorName', u'IndicatorCode',
u'Year', u'Value'],
dtype='object')
There is no key 'Population, total' and 'Urban population (% of total)', probably you used wrong data source (csv file).

Fbprophet quickstart example - KeyError: 'ds'

Following the quickstart example & having a problem when trying to fit the model at:
m.fit(df);
The terminal shows:
Traceback (most recent call last):
File "/home/cartier/miniconda2/envs/prophet/lib/python3.6/site-packages/pandas/indexes/base.py", line 2134, in get_loc
return self._engine.get_loc(key)
File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 'ds'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "prophexample.py", line 10, in <module>
m.fit(df);
File "/home/cartier/miniconda2/envs/prophet/lib/python3.6/site-packages/fbprophet/forecaster.py", line 484, in fit
self.history_dates = pd.to_datetime(df['ds']).sort_values()
File "/home/cartier/miniconda2/envs/prophet/lib/python3.6/site-packages/pandas/core/frame.py", line 2059, in __getitem__
return self._getitem_column(key)
File "/home/cartier/miniconda2/envs/prophet/lib/python3.6/site-packages/pandas/core/frame.py", line 2066, in _getitem_column
return self._get_item_cache(key)
File "/home/cartier/miniconda2/envs/prophet/lib/python3.6/site-packages/pandas/core/generic.py", line 1386, in _get_item_cache
values = self._data.get(item)
File "/home/cartier/miniconda2/envs/prophet/lib/python3.6/site-packages/pandas/core/internals.py", line 3543, in get
loc = self.items.get_loc(item)
File "/home/cartier/miniconda2/envs/prophet/lib/python3.6/site-packages/pandas/indexes/base.py", line 2136, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 'ds'
When I print the head, tail or entire df it's fine:
ds y
0 2007-12-10 9.590761
1 2007-12-11 8.519590
2 2007-12-12 8.183677
3 2007-12-13 8.072467
4 2007-12-14 7.893572
Is this because I'm not using a notebook or am I missing something else? Thanks

Found the csv file didn't have all the dates. The dates skip from 7/13/08 to 7/31/08. Once I put in the missing dates with some random y values it was fine. Maybe there's a setting/command to ignore missing dates...

TimeSeries plots with Bokeh

NOTE FROM BOKEH MAINTAINER: The bokeh.charts API including TimeSeries was deprecated and removed a long time ago. This question is not relevant as-is to any recent or future versions of Bokeh. To plot time series, use the stable and supported bokeh.plotting API. Some examples can be found here.
I am trying to plot a Timeseries plot with categories.
xaxis_values: startTIme
yaxis_values: count
groupby: day
Every day has 24 hours data sets and like this the entire dataset has more than 100 days I am trying to have few type of plots.
Group by day and sum all the counts of every hours from startTime
which will give 7 time series plots in one graph.
Separate by day i.e. every mon, tue, wed and so on whatever the number of days n and plot a 24 hrs time series.
Group by hours irrespective of days. i.e. 00:00:00, 01:00:00 and so on
What is the best way to get the better visualization with bokeh or seaborn .
Input:
2004-01-05,22:00:00,23:00:00,Mon,18944,790
2004-01-05,23:00:00,00:00:00,Mon,17534,750
2004-01-06,00:00:00,01:00:00,Tue,17262,747
2004-01-06,01:00:00,02:00:00,Tue,19072,777
2004-01-06,02:00:00,03:00:00,Tue,18275,785
2004-01-06,03:00:00,04:00:00,Tue,13589,757
2004-01-06,04:00:00,05:00:00,Tue,16053,735
2004-01-06,05:00:00,06:00:00,Tue,11440,636
2004-01-06,06:00:00,07:00:00,Tue,5972,513
2004-01-06,07:00:00,08:00:00,Tue,3424,382
2004-01-06,08:00:00,09:00:00,Tue,2696,303
2004-01-06,09:00:00,10:00:00,Tue,2350,262
2004-01-06,10:00:00,11:00:00,Tue,2309,254
Code: Reference: Here
import numpy as np
import pandas as pd
from bokeh.charts import TimeSeries, show, output_file, vplot
output_file("timeseries.html")
data_one = pd.read_csv('one_hour.csv')
data_one.columns = ['date', 'startTime', 'endTime', 'day', 'count', 'unique']
data = dict(data_one=data_one['count'])
tsline = TimeSeries(data,
x='startTime', y='count',
color=['day'], title="Timeseries", ylabel='count', legend=True)
show(vplot(tsline))
Error:
Traceback (most recent call last):
File "date_graph.py", line 10, in <module>
data = dict(data_one=data_one['count'])
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1997, in __getitem__
return self._getitem_column(key)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2004, in _getitem_column
return self._get_item_cache(key)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1350, in _get_item_cache
values = self._data.get(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 3290, in get
loc = self.items.get_loc(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/indexes/base.py", line 1947, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)
File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)
File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)
File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)
KeyError: 'count
Edit: After changing
data = dict(data_one=data_one['count'].tolist())
Error:
Traceback (most recent call last):
File "date_graph.py", line 12, in <module>
tsline = TimeSeries(data, x='startTime', y='count', color=['startTime'], title="Timeseries", ylabel='count', legend=True)
File "/usr/local/lib/python2.7/dist-packages/bokeh/charts/builders/timeseries_builder.py", line 102, in TimeSeries
return create_and_build(builder_type, data, **kws)
File "/usr/local/lib/python2.7/dist-packages/bokeh/charts/builder.py", line 67, in create_and_build
chart.add_builder(builder)
File "/usr/local/lib/python2.7/dist-packages/bokeh/charts/chart.py", line 149, in add_builder
builder.create(self)
File "/usr/local/lib/python2.7/dist-packages/bokeh/charts/builder.py", line 518, in create
chart.add_renderers(self, renderers)
File "/usr/local/lib/python2.7/dist-packages/bokeh/charts/chart.py", line 144, in add_renderers
self.renderers += renderers
File "/usr/local/lib/python2.7/dist-packages/bokeh/core/property_containers.py", line 18, in wrapper
result = func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/bokeh/core/property_containers.py", line 77, in __iadd__
return super(PropertyValueList, self).__iadd__(y)
File "/usr/local/lib/python2.7/dist-packages/bokeh/charts/builders/line_builder.py", line 230, in yield_renderers
x=group.get_values(self.x.selection),
File "/usr/local/lib/python2.7/dist-packages/bokeh/charts/data_source.py", line 173, in get_values
return self.data[selection]
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1997, in __getitem__
return self._getitem_column(key)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2004, in _getitem_column
return self._get_item_cache(key)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1350, in _get_item_cache
values = self._data.get(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 3290, in get
loc = self.items.get_loc(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/indexes/base.py", line 1947, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)
File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)
File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)
File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)
KeyError: 'startTime'

Python pandas producing error when trying to access 'DATE' column on large data set

I have a file with 3'502'379 rows and 3 columns. The following script is supposed to be executed but raises and error in the date handling line:
import matplotlib.pyplot as plt
import numpy as np
import csv
import pandas
path = 'data_prices.csv'
data = pandas.read_csv(path, sep=';')
data['DATE'] = pandas.to_datetime(data['DATE'], format='%Y%m%d')
This is the error that occurs:
Traceback (most recent call last):
File "C:\Program Files\Python35\lib\site-packages\pandas\indexes\base.py", line 1945, in get_loc
return self._engine.get_loc(key)
File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4066)
File "pandas\index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas\index.c:3930)
File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)
File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)
KeyError: 'DATE'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\data\script.py", line 15, in <module>
data['DATE'] = pandas.to_datetime(data['DATE'], format='%Y%m%d')
File "C:\Program Files\Python35\lib\site-packages\pandas\core\frame.py", line 1997, in __getitem__
return self._getitem_column(key)
File "C:\Program Files\Python35\lib\site-packages\pandas\core\frame.py", line 2004, in _getitem_column
return self._get_item_cache(key)
File "C:\Program Files\Python35\lib\site-packages\pandas\core\generic.py", line 1350, in _get_item_cache
values = self._data.get(item)
File "C:\Program Files\Python35\lib\site-packages\pandas\core\internals.py", line 3290, in get
loc = self.items.get_loc(item)
File "C:\Program Files\Python35\lib\site-packages\pandas\indexes\base.py", line 1947, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4066)
File "pandas\index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas\index.c:3930)
File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)
File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)
KeyError: 'DATE'

the '\ufeffDATE' in the first column name shows that your CSV file has a UTF-16 Byte Order Mark (BOM) signature so it must be read accordingly.
so try this when reading your CSV:
df = pd.read_csv(path, sep=';', encoding='utf-8-sig')
or as #EdChum suggested:
df = pd.read_csv(path, sep=';', encoding='utf-16')
both variants should work properly
PS this answer shows how to deal with BOMs

Error creating pivot tables in pandas

I have searched and cannot find anyone else with this problem. I am trying to create a pivot table summarizing a csv file, and then email that pivot to myself. I have already built out the code to perform this process, but it is not working universally. I keep getting a KeyError on my column name, but if I delete all columns and rows that are not part of the table it miraculously works.
Here is my code:
df = pandas.read_csv('/path/to/file'),encoding='utf-8')
pivot = pandas.pivot_table(df,index=['ClientID','ClientName','Branch'],
values=['EmailAddress'],aggfunc='count',margins=True)
pivotlocation = '/path/to/save'
pivot.to_csv(pivotlocation)
For the life of me, I cannot figure out what is going wrong, or why this works on some files and not others.
Also, here is the error that is thrown:
Traceback (most recent call last):
File "C:\Users\rfulton\Desktop\Automation\Reports\UniversalUpload.py", line 86, in create_pivot
pivot = pandas.pivot_table(df,index=columns,values=aggvalue,aggfunc='count',margins=True)
File "C:\Python34\lib\site-packages\pandas\util\decorators.py", line 88, in wrapper
return func(*args, **kwargs)
File "C:\Python34\lib\site-packages\pandas\util\decorators.py", line 88, in wrapper
return func(*args, **kwargs)
File "C:\Python34\lib\site-packages\pandas\tools\pivot.py", line 114, in pivot_table
grouped = data.groupby(keys)
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 2898, in groupby
sort=sort, group_keys=group_keys, squeeze=squeeze)
File "C:\Python34\lib\site-packages\pandas\core\groupby.py", line 1193, in groupby
return klass(obj, by, **kwds)
File "C:\Python34\lib\site-packages\pandas\core\groupby.py", line 383, in __init__
level=level, sort=sort)
File "C:\Python34\lib\site-packages\pandas\core\groupby.py", line 2131, in _get_grouper
in_axis, name, gpr = True, gpr, obj[gpr]
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 1780, in __getitem__
return self._getitem_column(key)
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 1787, in _getitem_column
return self._get_item_cache(key)
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 1068, in _get_item_cache
values = self._data.get(item)
File "C:\Python34\lib\site-packages\pandas\core\internals.py", line 2849, in get
loc = self.items.get_loc(item)
File "C:\Python34\lib\site-packages\pandas\core\index.py", line 1402, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "pandas\index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas\index.c:3807)
File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:3687)
File "pandas\hashtable.pyx", line 696, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12310)
File "pandas\hashtable.pyx", line 704, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12261)
KeyError: 'ClientID'
As I stated above, if I delete all cells outside the bounds of the table, this error is no longer thrown. However, I am not sure of how to do this with the csv or pandas modules.

Turns out that the issue was the encoding of the file.
Setting the encoding to utf-8-sig fixed the issue.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

ValueError when making histogram of excel data - python

Related

Python - zipping and list

Fbprophet quickstart example - KeyError: 'ds'

TimeSeries plots with Bokeh

Python pandas producing error when trying to access 'DATE' column on large data set

Error creating pivot tables in pandas

Categories

Resources