Printing mutiple columns in Pandas (Python)

Printing mutiple columns in Pandas (Python) - python

I'm new to Python and the Pandas module, but I can't seem to get this to work.
This is my code. I'm using a csv file containing the month and rainfall for Singapore.
Below is my code: 0
df = pd.read_csv('rainfall-monthly-total.csv')
print ((df['total_rainfall'])[df.total_rainfall == df['total_rainfall'].max()])
print ((df['month'])[df.total_rainfall == df['total_rainfall'].max()])
print ((df['total_rainfall', 'month'])[df.total_rainfall == df['total_rainfall'].max()])
The first two statements work fine. But something is wrong with the third and I can't find out why. Below is the output.
"/Users/xxxx/PycharmProjects/Phyton for Finance/venv/bin/python" "/Users/xxxx/PycharmProjects/Phyton for Finance/Panda Tutorial.py"
299 765.9
Name: total_rainfall, dtype: float64
299 2006-12
Name: month, dtype: object
Traceback (most recent call last):
File "/Users/xxxx/PycharmProjects/Phyton for Finance/venv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: ('total_rainfall', 'month')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/xxxx/PycharmProjects/Phyton for Finance/Panda Tutorial.py", line 16, in <module>
print ((df['total_rainfall', 'month'])[df.total_rainfall == df['total_rainfall'].max()])
File "/Users/xxxx/PycharmProjects/Phyton for Finance/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 2688, in __getitem__
return self._getitem_column(key)
File "/Users/xxxx/PycharmProjects/Phyton for Finance/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 2695, in _getitem_column
return self._get_item_cache(key)
File "/Users/xxxx/PycharmProjects/Phyton for Finance/venv/lib/python3.7/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache
values = self._data.get(item)
File "/Users/xxxx/PycharmProjects/Phyton for Finance/venv/lib/python3.7/site-packages/pandas/core/internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "/Users/xxxx/PycharmProjects/Phyton for Finance/venv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: ('total_rainfall', 'month')
Process finished with exit code 1
I'm using PyCharm with python 3.7.
How do I get python to print out both columns for that particular month?

Try this:
print ((df[['total_rainfall', 'month']])[df.total_rainfall == df['total_rainfall'].max()]
You need to convert single square brackets to double:
['total_rainfall', 'month']
TO
[['total_rainfall', 'month']]

Easy. You need to use use a list of columns you want to print. so use df.loc to filter your data frame with conditions:
print(df.loc[df.total_rainfall == df['total_rainfall'].max(), ['total_rainfall', 'month']])

Related

saving coordinates from Dataframe as Polygons (shapely.geometry) AttributeError

I want to create a Polygon from a list of coordinates:
import pandas as pd
from shapely.geometry import Point, Polygon
data = pd.read_csv('path.csv', sep=';')
the data is in the following format
Suburb
features_geometry_x
features_geometry_y
1
50.941840
6.9595637
1
50.941845
6.9595698
3
50.94182
6.9595632
4
50.9418837
6.9595958
with several rows for suburb 1, 3 and 4
#create a polygon
I = data.loc[data['Suburb'] == 1]
I['coordinates'] = list(zip(I['features_geometry_x'], I['features_geometry_y']))
poly_i = Polygon(I['coordinates'])
the code above works fine but if I do the same thing for suburb 3 and 4 it yields the following error:
L = data.loc[data['Suburb'] == 3]
L['coordinates'] = list(zip(L['features_geometry_x'], L['features_geometry_y']))
poly_l = Polygon(L['coordinates'])
File "shapely/speedups/_speedups.pyx", line 252, in shapely.speedups._speedups.geos_linearring_from_py
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 5487, in getattr
return object.getattribute(self, name)
AttributeError: 'Series' object has no attribute 'array_interface'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 2131, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 2140, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/var/folders/j6/wgg72kmx145f3krf14nzjfq40000gn/T/ipykernel_4092/214655495.py", line 3, in
poly_l = Polygon(Lindenthal['coordinates'])
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/shapely/geometry/polygon.py", line 261, in init
ret = geos_polygon_from_py(shell, holes)
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/shapely/geometry/polygon.py", line 539, in geos_polygon_from_py
ret = geos_linearring_from_py(shell)
File "shapely/speedups/_speedups.pyx", line 344, in shapely.speedups._speedups.geos_linearring_from_py
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py", line 942, in getitem
return self._get_value(key)
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py", line 1051, in _get_value
loc = self.index.get_loc(label)
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 0
Please help :)

I think the issue here is that you need more than one data point to create a polygon where as your suburb 2 and 3 each got only a single point.

Problem accessing pandas data that is represented with commas?

I have line as follows:
data = pd.read_csv("file.csv", sep=";", encoding='ISO-8859-1', engine = 'python')
test = str(data['information'])
I'm trying to access csv column that contains data in a cell like so: "1000,10500,2500"
I get an error:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Vastuualue'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/erik.ilonen/Desktop/Projekti_csv_data/Toinen_testiohjelma/toinen_datan_kasittely_ohjelma.py", line 12, in <module>
test = str(dataAlkuperainen['Vastuualue'])
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/frame.py", line 3024, in __getitem__
indexer = self.columns.get_loc(key)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 'information'

Your separator is not right.
sep should be comma not semicolon, so use sep="," instead of sep=";".

KeyError: 'class_name' in python3.7/site-packages/pandas/core/indexes/base.py

I am trying to use one Github repo and I get the following error in python source files.
I looked at posts like [this][1] but couldn't figure the exact problem.
Here's the error that I see:
File "/home/kgarg8/kgarg8-workspace/few-shot/venv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'class_name'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals)
File "/home/kgarg8/kgarg8-workspace/few-shot/experiments/proto_nets.py", line 62, in <module> background = dataset_class('background')
File "/home/kgarg8/kgarg8-workspace/few-shot/few_shot/datasets.py", line 31, in __init__
self.unique_characters = sorted(self.df['class_name'].unique())
File "/home/kgarg8/kgarg8-workspace/few-shot/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 2688, in __getitem__
return self._getitem_column(key)
File "/home/kgarg8/kgarg8-workspace/few-shot/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 2695, in _getitem_column
return self._get_item_cache(key)
File "/home/kgarg8/kgarg8-workspace/few-shot/venv/lib/python3.7/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache
values = self._data.get(item)
File "/home/kgarg8/kgarg8-workspace/few-shot/venv/lib/python3.7/site-packages/pandas/core/internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "/home/kgarg8/kgarg8-workspace/few-shot/venv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer key))
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'class_name'
Here's the relevant code snippet:
# proto_nets.py
if args.dataset == 'omniglot':
n_epochs = 40
dataset_class = OmniglotDataset
num_input_channels = 1
drop_lr_every = 20
...
background = dataset_class('background')
# datasets.py
class OmniglotDataset(Dataset):
def __init__(self, subset):
if subset not in ('background', 'evaluation'):
raise(ValueError, 'subset must be one of (background, evaluation)')
self.subset = subset
self.df = pd.DataFrame(self.index_subset(self.subset))
self.df = self.df.assign(id=self.df.index.values)
self.unique_characters = sorted(self.df['class_name'].unique())
You can assume me to be neophyte, any pointers to debug further would be appreciated.
I think that the problem is due to Python/ Pandas version problem.
I am running on pandas==0.23.4 and python==3.7.3

The error is due to the way you are handling unique values (self.unique_characters), particulary at df['class_name']. This chunk is looking for a column named class_name, and you clearly don't have such a column. Instead, I believe you can achieve your goal as follows:
self.unique_characters = sorted(self.df.index.values.unique())
Since your problem is not reproducible, my answer is based on my general evaluation of the issue. Please comment if this does not solve the issue.

Python, Panda.read_excel Problem reading multiple sets of Data from one sheet

I have an Excel sheet with two sets of data (picture). I want to plot those with matplotlib in Python and import them with pandas. I narrowed down my script to make it quicker to read.
script:
import matplotlib.pyplot as plt
import pandas as pd
Tabelle = pd.read_excel("C:\\Users\\alexk\\Dropbox\\WW\\WW Master\\1. Semester\\WW2\\WW2 Kernfachpraktikum\\KFP2\\Ergebnisse.xlsx","Tabelle1")
x = Tabelle["Number1"]
y = Tabelle["Value1"]
x2=Tabelle["Number2"]
y2=Tabelle["Value2"]
plt.bar(x, y)
plt.bar(x2,y2)
plt.show()
End of script.
In the script it's possible to plot x and y when x2 and y2 are hashtagged out. When I want to read/plot/whatever x2 and y2 I get an error.
Error code:
Traceback (most recent call last):
File "C:\Users\alexk\PycharmProjects\venv\lib\site-packages\pandas\core\indexes\base.py", line 3078, in get_loc
return self._engine.get_loc(key)
File "pandas\_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Number2'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/alexk/PycharmProjects/Vickers.py", line 8, in <module>
x2=Tabelle["Number2"]
File "C:\Users\alexk\PycharmProjects\venv\lib\site-packages\pandas\core\frame.py", line 2688, in __getitem__
return self._getitem_column(key)
File "C:\Users\alexk\PycharmProjects\venv\lib\site-packages\pandas\core\frame.py", line 2695, in _getitem_column
return self._get_item_cache(key)
File "C:\Users\alexk\PycharmProjects\venv\lib\site-packages\pandas\core\generic.py", line 2489, in _get_item_cache
values = self._data.get(item)
File "C:\Users\alexk\PycharmProjects\venv\lib\site-packages\pandas\core\internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "C:\Users\alexk\PycharmProjects\venv\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Number2'
Process finished with exit code 1
End of error code.
With other excel sheets this process worked fine. What am I missing? Does it have something to do with the excel sheet?

Reading a file with pandas and use correlation coefficients on two columns

I have a file like following with no header
0.000000 0.330001 0.280120
1.000000 0.355590 0.298581
2.000000 0.305945 0.280231
I want to read this file using pandas dataframe and want to perform correlation coefficient between the second and the third column.
I am trying like following:
import pandas as pd
df = pd.read_csv('COLVAR_hbondnohead', header=None)
df['1'].corr(df['2'])
It pops up with a huge error message. Am I not treating the columns properly? Any suggestion or hint?
Error message
Traceback (most recent call last):
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
TypeError: an integer is required
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3063, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 164, in pandas._libs.index.IndexEngine.get_loc
KeyError: '1'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
TypeError: an integer is required
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 2685, in __getitem__
return self._getitem_column(key)
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 2692, in _getitem_column
return self._get_item_cache(key)
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 2486, in _get_item_cache
values = self._data.get(item)
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3065, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 164, in pandas._libs.index.IndexEngine.get_loc
KeyError: '1'

You will have to specify separator which is space while reading file. Then use position to access the columns. Below code should work.
df = pd.read_csv('test.txt', sep=' ', header=None)
df[1].corr(df[2])

Roy what is the file extension? is it .csv ? if it is you should add it to the end of fileName like pd.read_csv('COLVAR_hbondnohead.csv', header=None)

You don't have columns named 1 and 2, So, you have to create those columns first.
import pandas as pd
df = pd.read_csv('COLVAR_hbondnohead', header=None)
df1 = df.reindex(columns=['1','2', '3'])
then
df1['2'].corr(df1['3'])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Printing mutiple columns in Pandas (Python) - python

Try this: print ((df[['total_rainfall', 'month']])[df.total_rainfall == df['total_rainfall'].max()] You need to convert single square brackets to double: ['total_rainfall', 'month'] TO [['total_rainfall', 'month']]

Easy. You need to use use a list of columns you want to print. so use df.loc to filter your data frame with conditions: print(df.loc[df.total_rainfall == df['total_rainfall'].max(), ['total_rainfall', 'month']])

Related

saving coordinates from Dataframe as Polygons (shapely.geometry) AttributeError

Problem accessing pandas data that is represented with commas?

KeyError: 'class_name' in python3.7/site-packages/pandas/core/indexes/base.py

Python, Panda.read_excel Problem reading multiple sets of Data from one sheet

Reading a file with pandas and use correlation coefficients on two columns

Categories

Resources