Python - zipping and list - python

I just started coding python a couple weeks ago and getting my hands dirty. However I am not being able to get past the problem with zipping and list.
here's my code:
import pandas as pd
df_reader = pd.read_csv('Indicators.csv', chunksize=1000)
df_urb_pop = next(df_reader)
df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode']=='CEB']
zipped = zip(df_pop_ceb['Population, total'], df_pop_ceb['Urban population (% of total)'])
pops_list = list(zipped)
print(pops_list)
this is the error ive been getting:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/indexes/base.py", line 2134, in get_loc
return self._engine.get_loc(key)
File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 'Population, total'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/mubashirsultan/PycharmProjects/TECH6360/Iterators practice.py", line 9, in <module>
zipped = zip(df_pop_ceb['Population, total'], df_pop_ceb['Urban population (% of total)'])
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/frame.py", line 2059, in __getitem__
return self._getitem_column(key)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/frame.py", line 2066, in _getitem_column
return self._get_item_cache(key)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/generic.py", line 1386, in _get_item_cache
values = self._data.get(item)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/internals.py", line 3543, in get
loc = self.items.get_loc(item)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/indexes/base.py", line 2136, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 'Population, total'
Process finished with exit code 1
Not quite sure what mistake ive made. a little help would be appreciated. thanks

I have download the Indicators.csv from the link you provided, the header of the csv file only contain this keys:
import pandas as pd
df_reader = pd.read_csv('Indicators.csv')
df_reader.keys()
Output:
Index([u'CountryName', u'CountryCode', u'IndicatorName', u'IndicatorCode',
u'Year', u'Value'],
dtype='object')
There is no key 'Population, total' and 'Urban population (% of total)', probably you used wrong data source (csv file).

Related

Get the last value of dataframe's column with indexing

I am trying to get the last value of column named "mass".
df = pd.read_csv('0000.csv', names=["chi", "mass"])
df_1 = pd.read_csv('0001.csv', names=['chi', 'mass'])
x=df['mass'][-1]
print(x)
And it raised error:
Traceback (most recent call last):
File "plot.py", line 22, in <module>
df2=df['mass'][-1]
File "/usr/lib64/python2.7/site-packages/pandas/core/series.py", line 557, in __getitem__
result = self.index.get_value(self, key)
File "/usr/lib64/python2.7/site-packages/pandas/core/index.py", line 1790, in get_value
return self._engine.get_value(s, k)
File "pandas/index.pyx", line 103, in pandas.index.IndexEngine.get_value (pandas/index.c:3204)
File "pandas/index.pyx", line 111, in pandas.index.IndexEngine.get_value (pandas/index.c:2903)
File "pandas/index.pyx", line 157, in pandas.index.IndexEngine.get_loc (pandas/index.c:3843)
File "pandas/hashtable.pyx", line 303, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6525)
File "pandas/hashtable.pyx", line 309, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6463)
KeyError: -1
However, I did succeed for the first value using df['mass'][0]. Why it does not work for index "-1"? I know there are alternative methods like using iloc iat values. But I just think df['mass'][0] would be the simplest one if it works.

Fbprophet quickstart example - KeyError: 'ds'

Following the quickstart example & having a problem when trying to fit the model at:
m.fit(df);
The terminal shows:
Traceback (most recent call last):
File "/home/cartier/miniconda2/envs/prophet/lib/python3.6/site-packages/pandas/indexes/base.py", line 2134, in get_loc
return self._engine.get_loc(key)
File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 'ds'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "prophexample.py", line 10, in <module>
m.fit(df);
File "/home/cartier/miniconda2/envs/prophet/lib/python3.6/site-packages/fbprophet/forecaster.py", line 484, in fit
self.history_dates = pd.to_datetime(df['ds']).sort_values()
File "/home/cartier/miniconda2/envs/prophet/lib/python3.6/site-packages/pandas/core/frame.py", line 2059, in __getitem__
return self._getitem_column(key)
File "/home/cartier/miniconda2/envs/prophet/lib/python3.6/site-packages/pandas/core/frame.py", line 2066, in _getitem_column
return self._get_item_cache(key)
File "/home/cartier/miniconda2/envs/prophet/lib/python3.6/site-packages/pandas/core/generic.py", line 1386, in _get_item_cache
values = self._data.get(item)
File "/home/cartier/miniconda2/envs/prophet/lib/python3.6/site-packages/pandas/core/internals.py", line 3543, in get
loc = self.items.get_loc(item)
File "/home/cartier/miniconda2/envs/prophet/lib/python3.6/site-packages/pandas/indexes/base.py", line 2136, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 'ds'
When I print the head, tail or entire df it's fine:
ds y
0 2007-12-10 9.590761
1 2007-12-11 8.519590
2 2007-12-12 8.183677
3 2007-12-13 8.072467
4 2007-12-14 7.893572
Is this because I'm not using a notebook or am I missing something else? Thanks
Found the csv file didn't have all the dates. The dates skip from 7/13/08 to 7/31/08. Once I put in the missing dates with some random y values it was fine. Maybe there's a setting/command to ignore missing dates...

Python pandas producing error when trying to access 'DATE' column on large data set

I have a file with 3'502'379 rows and 3 columns. The following script is supposed to be executed but raises and error in the date handling line:
import matplotlib.pyplot as plt
import numpy as np
import csv
import pandas
path = 'data_prices.csv'
data = pandas.read_csv(path, sep=';')
data['DATE'] = pandas.to_datetime(data['DATE'], format='%Y%m%d')
This is the error that occurs:
Traceback (most recent call last):
File "C:\Program Files\Python35\lib\site-packages\pandas\indexes\base.py", line 1945, in get_loc
return self._engine.get_loc(key)
File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4066)
File "pandas\index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas\index.c:3930)
File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)
File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)
KeyError: 'DATE'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\data\script.py", line 15, in <module>
data['DATE'] = pandas.to_datetime(data['DATE'], format='%Y%m%d')
File "C:\Program Files\Python35\lib\site-packages\pandas\core\frame.py", line 1997, in __getitem__
return self._getitem_column(key)
File "C:\Program Files\Python35\lib\site-packages\pandas\core\frame.py", line 2004, in _getitem_column
return self._get_item_cache(key)
File "C:\Program Files\Python35\lib\site-packages\pandas\core\generic.py", line 1350, in _get_item_cache
values = self._data.get(item)
File "C:\Program Files\Python35\lib\site-packages\pandas\core\internals.py", line 3290, in get
loc = self.items.get_loc(item)
File "C:\Program Files\Python35\lib\site-packages\pandas\indexes\base.py", line 1947, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4066)
File "pandas\index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas\index.c:3930)
File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)
File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)
KeyError: 'DATE'
the '\ufeffDATE' in the first column name shows that your CSV file has a UTF-16 Byte Order Mark (BOM) signature so it must be read accordingly.
so try this when reading your CSV:
df = pd.read_csv(path, sep=';', encoding='utf-8-sig')
or as #EdChum suggested:
df = pd.read_csv(path, sep=';', encoding='utf-16')
both variants should work properly
PS this answer shows how to deal with BOMs

Error creating pivot tables in pandas

I have searched and cannot find anyone else with this problem. I am trying to create a pivot table summarizing a csv file, and then email that pivot to myself. I have already built out the code to perform this process, but it is not working universally. I keep getting a KeyError on my column name, but if I delete all columns and rows that are not part of the table it miraculously works.
Here is my code:
df = pandas.read_csv('/path/to/file'),encoding='utf-8')
pivot = pandas.pivot_table(df,index=['ClientID','ClientName','Branch'],
values=['EmailAddress'],aggfunc='count',margins=True)
pivotlocation = '/path/to/save'
pivot.to_csv(pivotlocation)
For the life of me, I cannot figure out what is going wrong, or why this works on some files and not others.
Also, here is the error that is thrown:
Traceback (most recent call last):
File "C:\Users\rfulton\Desktop\Automation\Reports\UniversalUpload.py", line 86, in create_pivot
pivot = pandas.pivot_table(df,index=columns,values=aggvalue,aggfunc='count',margins=True)
File "C:\Python34\lib\site-packages\pandas\util\decorators.py", line 88, in wrapper
return func(*args, **kwargs)
File "C:\Python34\lib\site-packages\pandas\util\decorators.py", line 88, in wrapper
return func(*args, **kwargs)
File "C:\Python34\lib\site-packages\pandas\tools\pivot.py", line 114, in pivot_table
grouped = data.groupby(keys)
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 2898, in groupby
sort=sort, group_keys=group_keys, squeeze=squeeze)
File "C:\Python34\lib\site-packages\pandas\core\groupby.py", line 1193, in groupby
return klass(obj, by, **kwds)
File "C:\Python34\lib\site-packages\pandas\core\groupby.py", line 383, in __init__
level=level, sort=sort)
File "C:\Python34\lib\site-packages\pandas\core\groupby.py", line 2131, in _get_grouper
in_axis, name, gpr = True, gpr, obj[gpr]
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 1780, in __getitem__
return self._getitem_column(key)
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 1787, in _getitem_column
return self._get_item_cache(key)
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 1068, in _get_item_cache
values = self._data.get(item)
File "C:\Python34\lib\site-packages\pandas\core\internals.py", line 2849, in get
loc = self.items.get_loc(item)
File "C:\Python34\lib\site-packages\pandas\core\index.py", line 1402, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "pandas\index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas\index.c:3807)
File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:3687)
File "pandas\hashtable.pyx", line 696, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12310)
File "pandas\hashtable.pyx", line 704, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12261)
KeyError: 'ClientID'
As I stated above, if I delete all cells outside the bounds of the table, this error is no longer thrown. However, I am not sure of how to do this with the csv or pandas modules.
Turns out that the issue was the encoding of the file.
Setting the encoding to utf-8-sig fixed the issue.

Key Error from Pandas

I have the following code in python.
d1=pd.read_excel("data1.xlsx",sheetname=0, index_col = "Time", parse_date=True)
d1=d1.sort_index()
for val in d1.columns:
start_his=d1[str(val)]
The data1.xlsx contains
Time Start High Low End 5
05/12/2014 28,000 31,500 27,400 29,900 29,740.00
28/11/2014 29,450 29,450 27,950 28,250 30,190.00
21/11/2014 30,500 30,500 28,100 29,300 30,840.00
14/11/2014 31,750 31,750 29,600 29,900 31,200.00
07/11/2014 32,250 32,750 30,500 31,350 31,620.00
31/10/2014 31,800 33,000 31,300 32,150 32,230.00
24/10/2014 31,300 32,750 30,800 31,500 32,680.00
17/10/2014 32,000 32,150 30,550 31,100 33,270.00
10/10/2014 34,550 34,550 32,000 32,000 33,920.00
02/10/2014 35,000 35,000 32,400 34,400 34,560.00
26/09/2014 34,600 35,000 33,600 34,400 34,770.00
19/09/2014 34,350 34,750 32,200 34,450 35,070.00
I got the Output for the first 4 columns(Start, High, Low & End) but 5th Columns shows key error. If I change the numeric '5' to any other value (i.e., Alpha numeric) it works correctly but it is not accepting numbers in column header. Please help to solve this error.
and i got the error is
Traceback (most recent call last):
File "ARIMA & Byesian.py", line 94, in <module>
start_his=d1[str(val)]
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1780, in __getitem__
return self._getitem_column(key)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1787, in _getitem_column
return self._get_item_cache(key)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1068, in _get_item_cache
values = self._data.get(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 2849, in get
loc = self.items.get_loc(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/index.py", line 1402, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3812)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3692)
File "pandas/hashtable.pyx", line 696, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12299)
File "pandas/hashtable.pyx", line 704, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12250)
KeyError: '5'
You are using str(val) and 5 will be made '5'. Use val directly as in:
d1=pd.read_excel("data1.xlsx",sheetname=0, index_col = "Time", parse_date=True)
d1=d1.sort_index()
for val in d1.columns:
start_his=d1[val]

Categories

Resources