Pandas - selecting several rows and columns with date as key - python

I try to select several specific rows and columns from this dataframe:
Open High Low Close Volume Dividends Stock Splits
Date
2020-07-17 387.95 388.59 383.36 385.31 23046700 0 0
2020-07-20 385.67 394.00 384.25 393.43 22579500 0 0
2020-07-21 396.69 397.00 386.97 388.00 25911500 0 0
2020-07-22 386.77 391.90 386.41 389.09 22215400 0 0
2020-07-23 387.99 388.31 384.25 385.17 4554225 0 0
It is possible to select some rows which are following each other with one specific column
hist["2020-07-20":"2020-07-22"]["Close"]
Date
2020-07-20 393.43
2020-07-21 388.00
2020-07-22 389.09
Name: Close, dtype: float64
When i try more columns which are following - i get this error:
hist["2020-07-20":"2020-07-22", "Open":"Close"]
TypeError Traceback (most recent call last)
<ipython-input-25-57b43e76004f> in <module>
----> 1 hist["2020-07-20":"2020-07-22", "Open":"Close"]
c:\users\polzi\appdata\local\programs\python\python37\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2798 if self.columns.nlevels > 1:
2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key)
2801 if is_integer(indexer):
2802 indexer = [indexer]
c:\users\polzi\appdata\local\programs\python\python37\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2644 )
2645 try:
-> 2646 return self._engine.get_loc(key)
2647 except KeyError:
2648 return self._engine.get_loc(self._maybe_cast_indexer(key))
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
TypeError: '(slice('2020-07-20', '2020-07-22', None), slice('Open', 'Close', None))' is an invalid key
I also tried to select several rows which are NOT following - does not work either
hist["2020-07-20","2020-07-22"]["Low"]
KeyError Traceback (most recent call last)
c:\users\polzi\appdata\local\programs\python\python37\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2645 try:
-> 2646 return self._engine.get_loc(key)
2647 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: ('2020-07-20', '2020-07-22')
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-26-aefccd2025a5> in <module>
----> 1 hist["2020-07-20","2020-07-22"]["Low"]
c:\users\polzi\appdata\local\programs\python\python37\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2798 if self.columns.nlevels > 1:
2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key)
2801 if is_integer(indexer):
2802 indexer = [indexer]
c:\users\polzi\appdata\local\programs\python\python37\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2646 return self._engine.get_loc(key)
2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key))
2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: ('2020-07-20', '2020-07-22')
How can i select several specific rows and columns - which are not following after each other?

hist[["Open","High","Low","Close"]]["2020-07-20":"2020-07-22"]
will give you a dataframe with the preselected columns.
you can also use :
hist[hist.columns[0:4]]["2020-07-20":"2020-07-22"]
If the rows are not following, you can use :
hist[hist.index.isin(["2020-07-20","2020-07-22"])][hist.columns[0:4]]
if both rows and columns are arbitrary, you can use
hist[hist.index.isin(["2020-07-20","2020-07-22"])][["Open","Close"]]

Related

How to select columns from different tables based on other facture to create a new dataframe python

I have 2 DataFrames both countain countries
1-first have 183 row
2-the second have 156 row
both of them has import information on each other
I need one column from the first and one column from the second
My goal is to create a single Dataframe contain both columns that I need and name of the contain that both datafames commo.
This is what I did and the message that I got
for i in range(183) :
for j in range(156):
if df['Country'][i]==df_happy['Country or region'][j]:
df.drop(i,axis=0,inplace=True)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-25-e078ef71e219> in <module>
1 for i in range(183) :
2 for j in range(156):
----> 3 if df['Country'][i]==df_happy['Country or region'][j]:
4 df.drop(i,axis=0,inplace=True)
/opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages/pandas/core/series.py in __getitem__(self, key)
869 key = com.apply_if_callable(key, self)
870 try:
--> 871 result = self.index.get_value(self, key)
872
873 if not is_scalar(result):
/opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
4403 k = self._convert_scalar_indexer(k, kind="getitem")
4404 try:
-> 4405 return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
4406 except KeyError as e1:
4407 if len(self) > 0 and (self.holds_integer() or self.is_boolean()):
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 1
You can merge both data frames:
newdf=df.merge(df_happy,how='left', left_on='Country', right_on='Country or region')
and then drop the extra columns with:
newdf.drop(columns=['B', 'C'])

KeyError Traceback (most recent call last) ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)

I was trying the house-price-dataset from kaggle. I was trying to clear the NaN values from column named 'Alley'
for column in missing_data.columns.values.tolist():
print(column)
print(missing_data[column].value_counts())
print("")
Id
False 1460
Name: Id, dtype: int64
MSSubClass
False 1460
Name: MSSubClass, dtype: int64
MSZoning
False 1460
Name: MSZoning, dtype: int64
LotFrontage
False 1201
True 259
Name: LotFrontage, dtype: int64
LotArea
False 1460
Name: LotArea, dtype: int64
Street
False 1460
Name: Street, dtype: int64
Alley
True 1369
False 91
Name: Alley, dtype: int64
LotShape
False 1460
Name: LotShape, dtype: int64
LandContour
False 1460
Name: LandContour, dtype: int64
Utilities
False 1460
Name: Utilities, dtype: int64
LotConfig
False 1460
Name: LotConfig, dtype: int64
These are some of the values i got running the above code. I replaced LotFrontage's NaN value with it's mean and wanted to replace the NaN values in 'Alley' with it's frequency.
But when I write this code I get an error.
train['Alley'].value_counts()
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2645 try:
-> 2646 return self._engine.get_loc(key)
2647 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Alley'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-72-8e1e57b44782> in <module>
1 #replace nan values in Alley with frequency
----> 2 train['Alley'].value_counts()
~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2798 if self.columns.nlevels > 1:
2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key)
2801 if is_integer(indexer):
2802 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2646 return self._engine.get_loc(key)
2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key))
2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Alley'
Why am I getting this error? When the column named Alley exists?
It seems that you probably assigned a different value to your dataframe so in the next iteration it will throw an error because dataframe is probably assigned to panda series

Why does DataFrame row selection syntax df[:2] work but not if the syntax df[1]?

I have following data:
data = pd.DataFrame(np.arange(16).reshape(4, 4), index = ['Ohio', 'Colorado', 'Utah', 'New York'], columns = ['one', 'two', 'three', 'four'])
If I run:
data[:2]
the output will be:
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
If I run: data[1], the following error will show up:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2645 try:
-> 2646 return self._engine.get_loc(key)
2647 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 1
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-81-c402bf503b75> in <module>
----> 1 data[1]
~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
2798 if self.columns.nlevels > 1:
2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key)
2801 if is_integer(indexer):
2802 indexer = [indexer]
~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2646 return self._engine.get_loc(key)
2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key))
2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 1
Why could I run data[:2] but not data[1]? It doesn't make sense to me. Thank you in advance:-)
[ ] - indexing operator looks for a column name entered.
In your example data[1], there is no column by name 1. So the key error.
But, when you pass slicing notation : inside the indexing operator, the indexing operator changes the behavior from "searching for columns" to "searching for rows based on the range"
The first portion data[:1] is a slicing operation. and As you set the index = ['Ohio', 'Colorado', 'Utah', 'New York'] so there is no default index (0-9..) that's why it gives you a key error.
if you enter the column name. like data['one'] you will get
Ohio 0
Colorado 4
Utah 8
New York 12
Name: one, dtype: int64

Error in converting datas from string to int

I have a dataframe cleaned_bp['VISITCODE'] which looks like:
0 1
1 2
2 3
3 6
4 9
5 12
6 15
where the non-index column consists of strings.
I wanted to convert them to integers by doing:
for i in range(len(cleaned_bp['VISITCODE'])):
cleaned_bp['VISITCODE'][i] = int(cleaned_bp['VISITCODE'][i])
but I get this error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-42-4d6508c1abda> in <module>()
1 for i in range(len(cleaned_bp['VISITCODE'])):
----> 2 cleaned_bp['VISITCODE'][i] = int(cleaned_bp['VISITCODE'][i])
~/anaconda3/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
599 key = com._apply_if_callable(key, self)
600 try:
--> 601 result = self.index.get_value(self, key)
602
603 if not is_scalar(result):
~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
2475 try:
2476 return self._engine.get_value(s, k,
-> 2477 tz=getattr(series.dtype, 'tz', None))
2478 except KeyError as e1:
2479 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 13
what wrong am I doing here?
Try:
for i in range(len(cleaned_bp['VISITCODE'])):
cleaned_bp['VISITCODE'].iloc[i] = int(cleaned_bp['VISITCODE'].iloc[i])
This will use the position in the index not the index itself.
if you are using pandas you can try:
cleaned_bp.VISITCODE.astype(int)

Numpy TypeError: an integer is required

This will be maybe quite personal question but I don't know who to ask I hope somebody can help and don't skip me THANKS!. I have installed python using Anaconda and using Jupyter notebook. I have 2 csv files of data.
products.head()
ID_FUPID FUPID
0 1 674563
1 2 674597
2 3 674606
3 4 694776
4 5 694788
Products contain id of product and product number.
ratings.head()
ID_CUSTOMER ID_FUPID RATING
0 1 216 1
1 2 390 1
2 3 851 5
3 4 5897 1
4 5 9341 1
Ratings containt id of customer, productID and Rating which customer give to product.
I have created table as:
M = ratings.pivot_table(index=['ID_CUSTOMER'],columns=['ID_FUPID'],values='RATING')
Which is showing data correctly in matrix with productID= columns and customerID as rows.
I wanted to count pearson colleration between products so here is the pearson function:
def pearson(s1, s2):
import numpy as np
"""take two pd.series objects and return a pearson correlation"""
s1_c = s1 - s1.mean()
s2_c = s2 - s2.mean()
return np.sum(s1_c * s2_c) / np.sqrt(np.sum(s1_c ** 2) * np.sum(s2_c ** 2))
When I'm trying to count pearson(M['17'], M['21']) I got following errors:
TypeError Traceback (most recent call last)
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
TypeError: an integer is required
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2441 try:
-> 2442 return self._engine.get_loc(key)
2443 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
KeyError: '17'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
TypeError: an integer is required
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-277-d4ead225b6ab> in <module>()
----> 1 pearson(M['17'], M['21'])
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
1962 return self._getitem_multilevel(key)
1963 else:
-> 1964 return self._getitem_column(key)
1965
1966 def _getitem_column(self, key):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
1969 # get column
1970 if self.columns.is_unique:
-> 1971 return self._get_item_cache(key)
1972
1973 # duplicate columns & possible reduce dimensionality
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
1643 res = cache.get(item)
1644 if res is None:
-> 1645 values = self._data.get(item)
1646 res = self._box_item_values(item, values)
1647 cache[item] = res
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
3588
3589 if not isnull(item):
-> 3590 loc = self.items.get_loc(item)
3591 else:
3592 indexer = np.arange(len(self.items))[isnull(self.items)]
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2442 return self._engine.get_loc(key)
2443 except KeyError:
-> 2444 return self._engine.get_loc(self._maybe_cast_indexer(key))
2445
2446 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
KeyError: '17'
I will really appreciate any help ! thanks a million.
There were two places in the error message with the following line:
KeyError: '17'
This indicates there is no key '17' in M. This is likely because your index is an integer. However, you are currently accessing the DataFrame M with a string. The code to call pearson might be as follows:
pearson(M[17], M[21])

Categories

Resources