Dummy Variable Column Head not found? - python

I am attempting logistic regression for a class, and I got stuck on something very strange, I haven't seen this error before. I am a complete python noob so I'd appreciate being very specific in details. Please dumb it down for me!
Here is my code:
import pandas as pd
import matplotlib.pyplot as plt
cancer=pd.read_csv('C:\\Users\\kpic1\\Desktop\\Spring 2019\\IDIS 450\\Second exam\\Provided documents\\Cancer_Research_Q2.csv')
cancer.set_index('PatientID', inplace=True)
cancer=pd.get_dummies(cancer, columns=['Class'], drop_first=True)
cancer.head()
from sklearn.model_selection import train_test_split
cancer.rename(columns={'Class_Malignant':'Malignant'}, inplace=True)
cancer.head()
The table that is returned shows the dummy variable on the very end still being named "Class_Malignant."
When I try to print just this column using
print(cancer['Class_Malignant'])
I get the following long error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3077 try:
-> 3078 return self._engine.get_loc(key)
3079 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Class_Malignant'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-15-09d1356e98ac> in <module>
1 #for some reason, "Class_Malignant" is not registering as a column head?
----> 2 print(cancer['Class_Malignant'])
~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2686 return self._getitem_multilevel(key)
2687 else:
-> 2688 return self._getitem_column(key)
2689
2690 def _getitem_column(self, key):
~\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
2693 # get column
2694 if self.columns.is_unique:
-> 2695 return self._get_item_cache(key)
2696
2697 # duplicate columns & possible reduce dimensionality
~\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
2487 res = cache.get(item)
2488 if res is None:
-> 2489 values = self._data.get(item)
2490 res = self._box_item_values(item, values)
2491 cache[item] = res
~\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
4113
4114 if not isna(item):
-> 4115 loc = self.items.get_loc(item)
4116 else:
4117 indexer = np.arange(len(self.items))[isna(self.items)]
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3078 return self._engine.get_loc(key)
3079 except KeyError:
-> 3080 return self._engine.get_loc(self._maybe_cast_indexer(key))
3081
3082 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Class_Malignant'
From what I can tell from this, the column name "Class_Malignant" does not exist in the data set? Anyone have any ideas what's going on?

Related

Not able to display the column of a dataframe

When I am trying to print a single column of my data set it is showing errors
KeyError Traceback (most recent call
last) ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in
get_loc(self, key, method, tolerance) 2645 try:
-> 2646 return self._engine.get_loc(key) 2647 except KeyError:
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas_libs\hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Label'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call
last) in
----> 1 data['Label']
~\anaconda3\lib\site-packages\pandas\core\frame.py in
getitem(self, key) 2798 if self.columns.nlevels > 1: 2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key) 2801 if is_integer(indexer): 2802 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in
get_loc(self, key, method, tolerance) 2646 return
self._engine.get_loc(key) 2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key)) 2649
indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas_libs\hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Label'
data['Label']
It could be possible that the column name is having trailing spaces. Just try to print the column names & verify.
print(data.columns)
or try to print the columns after
data.columns = data.columns.str.strip()
If you have DataFrame and would like to access or select a specific few rows/columns from that DataFrame, you can use square brackets.
Now suppose that you want to select a column from the data(as per your question) DataFrame.
data["Label"]
But if you are unaware of the columns. You can get a column list and then display column data.
columns = data.columns.values.tolist()
data[columns[index]]

i'm a new learner in jupyter and don't understand this error, i'm looking to remove the $ sign

MKT['Income'] = MKT['Income'].str.replace('$', '')
i get this error [enter image description here][1]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2894 try:
-> 2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Income'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-22-5ea6ddc03a8c> in <module>
----> 1 MKT['Income'] = MKT['Income'].str.replace('$', '')
~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2900 if self.columns.nlevels > 1:
2901 return self._getitem_multilevel(key)
-> 2902 indexer = self.columns.get_loc(key)
2903 if is_integer(indexer):
2904 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
-> 2897 raise KeyError(key) from err
2898
2899 if tolerance is not None:
KeyError: 'Income'
[1]: https://i.stack.imgur.com/5WszY.png

Why I can't create two columns with different moving average windows on Pandas?

I'm sincerely out of clue with this one. I've been trying to create a couple of columns from a dataframe but I get the ValueError: Wrong number of items passed 2, placement implies 1 . Somehow I can create one, doesn't matter if it is the window=7 or the window=14 but only allowed to create one. Here's my code:
import pandas as pd
from datetime import datetime, timedelta
suspects_url = 'https://raw.githubusercontent.com/mariorz/covid19-mx-time-series/master/data/covid19_suspects_mx.csv'
suspects = pd.read_csv(suspects_url, index_col=0)
suspects = suspects.loc['Colima']
suspects = pd.DataFrame(suspects)
suspects.index = pd.to_datetime(suspects.index, format='%d-%m-%Y')
suspects['suspects_ma_7'] = suspects.rolling(window=7).mean()
suspects['suspects_ma_14'] = suspects.rolling(window=14).mean()
suspects.columns = ['suspects','suspects_ma_7','suspects_ma_14']
suspects
And this is the error I am getting:
KeyError Traceback (most recent call last)
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2645 try:
-> 2646 return self._engine.get_loc(key)
2647 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'suspects_ma_14'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/managers.py in set(self, item, value)
1070 try:
-> 1071 loc = self.items.get_loc(item)
1072 except KeyError:
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key))
2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'suspects_ma_14'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-6-4138f1be9342> in <module>
5 suspects.index = pd.to_datetime(suspects.index, format='%d-%m-%Y')
6 suspects['suspects_ma_7'] = suspects.rolling(window=7).mean()
----> 7 suspects['suspects_ma_14'] = suspects.rolling(window=14).mean()
8 suspects.columns = ['suspects','suspects_ma_7','suspects_ma_14']
9
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
2936 else:
2937 # set column
-> 2938 self._set_item(key, value)
2939
2940 def _setitem_slice(self, key, value):
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py in _set_item(self, key, value)
2999 self._ensure_valid_index(value)
3000 value = self._sanitize_column(key, value)
-> 3001 NDFrame._set_item(self, key, value)
3002
3003 # check if we are modifying a copy
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py in _set_item(self, key, value)
3622
3623 def _set_item(self, key, value) -> None:
-> 3624 self._data.set(key, value)
3625 self._clear_item_cache()
3626
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/managers.py in set(self, item, value)
1072 except KeyError:
1073 # This item wasn't present, just insert at end
-> 1074 self.insert(len(self.items), item, value)
1075 return
1076
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/managers.py in insert(self, loc, item, value, allow_duplicates)
1179 new_axis = self.items.insert(loc, item)
1180
-> 1181 block = make_block(values=value, ndim=self.ndim, placement=slice(loc, loc + 1))
1182
1183 for blkno, count in _fast_count_smallints(self._blknos[loc:]):
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/blocks.py in make_block(values, placement, klass, ndim, dtype)
3045 values = DatetimeArray._simple_new(values, dtype=dtype)
3046
-> 3047 return klass(values, ndim=ndim, placement=placement)
3048
3049
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/blocks.py in __init__(self, values, placement, ndim)
122
123 if self._validate_ndim and self.ndim and len(self.mgr_locs) != len(self.values):
--> 124 raise ValueError(
125 f"Wrong number of items passed {len(self.values)}, "
126 f"placement implies {len(self.mgr_locs)}"
ValueError: Wrong number of items passed 2, placement implies 1
How can I solve this issue?
I insist, at my attempts with only one suspects_ma_# it works. But when I'm trying to create both, I just get the error.
When you first run suspects['suspects_ma_7'] = suspects.rolling(window=7).mean() you automatically transform your Series into a DataFrame.
So, for running the second rolling approach, use:
suspects['suspects_ma_7'] = suspects.Colima.rolling(window=7).mean()
Note the "suspects.Colima" in the code above.

Money Flow Index keyerror

I've obtained the historical values from an example stock (Apple in this case) and was following an example I saw online, however, when their code succeeded mine failed because of some keyerror?
Could anyone tell/show me what's wrong and hopefully how to fix it? Error is:
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2645 try:
-> 2646 return self._engine.get_loc(key)
2647 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 1
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-60-d89407b24e87> in <module>
3
4 for i in range(1, len(typical_price)):
----> 5 if typical_price[i] > typical_price[i-1]:
6 positive_flow.append(money_flow[i-1])
7 negative_flow.append(0)
~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2798 if self.columns.nlevels > 1:
2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key)
2801 if is_integer(indexer):
2802 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2646 return self._engine.get_loc(key)
2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key))
2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 1
Code is:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import math
import pandas_datareader as pdr
stocks = ['AAPL']
data_close = pdr.get_data_yahoo(stocks, start='2020-01-01')['Close']
data_high = pdr.get_data_yahoo(stocks, start='2020-01-01')['High']
data_low = pdr.get_data_yahoo(stocks, start='2020-01-01')['Low']
data_volume = pdr.get_data_yahoo(stocks, start='2020-01-01')['Volume']
typical_price = (data_close + data_high + data_low)/3;
money_flow = typical_price * data_volume;
positive_flow = []
negative_flow = []
for i in range(1, len(typical_price)):
if typical_price[i] > typical_price[i-1]:
positive_flow.append(money_flow[i-1])
negative_flow.append(0)
elif typical_price[i] < typical_price[i-1]:
positive_flow.append(0)
negative_flow.append(money_flow[i-1])
else:
positive_flow.append(0)
negative_flow.append(0)
Error appears when I run the final part of the code where I try to retrieve the positive and negative moneyflow for my MFI algorithm.
Use iloc indexing
for i in range(1, len(typical_price)):
if typical_price.iloc[i].item() > typical_price.iloc[i-1].item():
positive_flow.append(money_flow.iloc[i-1])
negative_flow.append(0)
elif typical_price.iloc[i].item() < typical_price.iloc[i-1].item():
positive_flow.append(0)
negative_flow.append(money_flow.iloc[i-1])
else:
positive_flow.append(0)
negative_flow.append(0)
typical_price and money_flow probably has datetime as index not integers. If you want access row by integer-location then you can use iloc

When I applied (NLTK) stop words to a data frame it showing an error?

Reviews Label
0 Bromwell High is a cartoon comedy. It ran at t... Positive
1 Homelessness (or Houselessness as George Carli... Positive
2 Brilliant over-acting by Lesley Ann Warren. Be... Positive
The above one is my data frame with columns: Reviews and Label When I excecuted the code below : `
nltk.download('stopwords') This is used to update stop words.
from nltk.corpus import stopwords
stop = stopwords.words('english')
final_without_stopwords = final[['Reviews','Label']].apply(lambda x: ' '.join([word for word in x.split() if word not in (stop)])).str.replace('[^\w\s]','')
print(final_without_stopwords)`
Result:
KeyError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3077 try:
-> 3078 return self._engine.get_loc(key)
3079 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: ('Reviews', 'Label')
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-52-cb4ca290db84> in <module>()
5 #final['Reviews'].apply(lambda x: ' '.join([word for word in x.split() if word not in (stop_words)]))
6
----> 7 final_without_stopwords = final['Reviews','Label'].apply(lambda x: ' '.join([word for word in x.split() if word not in (stop)])).str.replace('[^\w\s]','')
8 print(final_without_stopwords)
~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2686 return self._getitem_multilevel(key)
2687 else:
-> 2688 return self._getitem_column(key)
2689
2690 def _getitem_column(self, key):
~\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
2693 # get column
2694 if self.columns.is_unique:
-> 2695 return self._get_item_cache(key)
2696
2697 # duplicate columns & possible reduce dimensionality
~\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
2487 res = cache.get(item)
2488 if res is None:
-> 2489 values = self._data.get(item)
2490 res = self._box_item_values(item, values)
2491 cache[item] = res
~\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
4113
4114 if not isna(item):
-> 4115 loc = self.items.get_loc(item)
4116 else:
4117 indexer = np.arange(len(self.items))[isna(self.items)]
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3078 return self._engine.get_loc(key)
3079 except KeyError:
-> 3080 return self._engine.get_loc(self._maybe_cast_indexer(key))
3081
3082 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: ('Reviews', 'Label')
enter code here
**
Actually I want to apply stop words to my data frame which only has two columns. When I excecuted this code with single column (Reviews) it worked well
but when I excecuted with two columns (Reviews & Label) it is showing
some error. Any suggestions how to handle this code with both columns.
**
If you want to apply a function elementwise to the dataframe, use applymap:
A simplified example:
import pandas as pd
stop = set(['a','the','i','is'])
df = pd.DataFrame( {'sentence1':['i am a boy','i am a girl'],
'sentence2':['Bromwell High is a cartoon comedy','i am a girl']})
df[['sentence1','sentence2']].applymap(lambda x: ' '.join(i for i in x.split() if i not in stop))
sentence1 sentence2
0 am boy Bromwell High cartoon comedy
1 am girl am girl
If you want to reassign the values without stopwords into your dataframe, use:
df[['sentence1','sentence2']] = df[['sentence1','sentence2']].applymap(lambda x: ' '.join(i for i in x.split() if i not in stop))

Categories

Resources