Query on Pandas DataFrame

Query on Pandas DataFrame - python

I´m new to Python and I'm trying to get a subset of rows/columns from a DataFrame:
In [1]:
from pandas import Series, DataFrame
import pandas as pd
import numpy as np
In [2]:
example=DataFrame(np.random.rand(6,5),columns=['a','b','c','d','e'])
In [3]:
example.a={2,4,6,8,10,12}
In [4]:
example
Out[4]:
a b c d e
0 2 0.225608 0.023888 0.535053 0.953350
1 4 0.803721 0.741708 0.256522 0.062574
2 6 0.354936 0.597274 0.801495 0.763515
3 8 0.204974 0.870951 0.220088 0.446273
4 10 0.673855 0.693210 0.494213 0.842049
5 12 0.516609 0.038669 0.972165 0.183945
In [5]:
example[['a','b','d','e']].query('a==10')
Out[5]:
a b d e
4 10 0.673855 0.494213 0.842049
In [6]:
example[['b','d','e']].query('a==10')
.....
UndefinedVariableError: name 'a' is not defined
The 1st case was ok, but I got error on the 2nd query, do you know why is this error showing up? thank you very much

In example[['b','d','e']] you only have a subset of example that doesn't include column a.
To get values ['b','d','e'] from the row where a==10 you just need to turn the query and index around. So first it queries, returning only the row, and then on that row you use your index:
In[113]: example.query('a==10')[['b','c','d']]
Out[113]:
b c d
4 0.439672 0.181699 0.770421

When you create the second selection example[['b','d','e']], you effectively drop 'a' from the dataframe:
example[['b','d','e']]
b d e
0 0.910757 0.565006 0.284420
1 0.601034 0.697879 0.983803
2 0.516938 0.829621 0.471825
3 0.896217 0.663177 0.093502
4 0.277488 0.796543 0.643166
5 0.594420 0.759634 0.164800
So you're trying to access a column that doesn't exist. In other words, if you want to query a column from a dataframe, you need to include it in your selection before querying it.

Related

How to convert first column of dataframe in to its headers

I have dataframe df:
0
0 a
1 b
2 c
3 d
4 e
O/P should be:
a b c d e
0
1
2
3
4
5
I want column containing(a, b,c,d,e) as header of my dataframe.
Could anyone help?

If your dataframe is pandas and its name is df. Try solving it with pandas:
Firstly convert initial df content to a list, afterwards create a new dataframe defining its columns with the list.
import pandas as pd
list = df[0].tolist() #df[0] is getting the content of first column
dfSolved = pd.DataFrame([], columns = list)

You may provide more details like the index and values of the expected output, the operation you wanna do, etc, so that we could give a specific solution to your case
Here is the solution:
import pandas as pd
import io
import numpy as np
data_string = """ columns_name
0 a
1 b
2 c
3 d
4 e
"""
df = pd.read_csv(io.StringIO(data_string), sep='\s+')
# Solution
df_result = pd.DataFrame(data=[[np.nan]*5],
columns=df['columns_name'].tolist())

New column in dataset based em last value of item

I have this dataset
In [4]: df = pd.DataFrame({'A':[1, 2, 3, 4, 5]})
In [5]: df
Out[5]:
A
0 1
1 2
2 3
3 4
4 5
I want to add a new column in dataset based em last value of item, like this
A
New Column
1
2
1
3
2
4
3
5
4
I tryed to use apply with iloc, but it doesn't worked
Can you help
Thank you

With your shown samples, could you please try following. You could use shift function to get the new column which will move all elements of given column into new column with a NaN in first element.
import pandas as pd
df['New_Col'] = df['A'].shift()
OR
In case you would like to fill NaNs with zeros then try following, approach is same as above for this one too.
import pandas as pd
df['New_Col'] = df['A'].shift().fillna(0)

I have a value in a table but when I try to verify if the value is there it returns false [duplicate]

In Python to check if a value is in a list you can simply do the following:
>>>9 in [1,2,3,6,9]
True
I would like to do the same for a Pandas DataFrame but unfortunately Pandas does not recognise that notation:
>>>import pandas as pd
>>>df = pd.DataFrame([[1,2,3,4],[5,6,7,8]],columns=["a","b","c","d"])
a b c d
0 1 2 3 4
1 5 6 7 8
>>>7 in df
False
How would I achieve this using Pandas DataFrame without iterating through each column/row or anything complicated?

Basically you have to check the matrix without the schema, so:
7 in df.values
x in df checks if x is in the columns:
for x in df:
print x,
out: a b c d

See if a value exists in a DataFrame

In Python to check if a value is in a list you can simply do the following:
>>>9 in [1,2,3,6,9]
True
I would like to do the same for a Pandas DataFrame but unfortunately Pandas does not recognise that notation:
>>>import pandas as pd
>>>df = pd.DataFrame([[1,2,3,4],[5,6,7,8]],columns=["a","b","c","d"])
a b c d
0 1 2 3 4
1 5 6 7 8
>>>7 in df
False
How would I achieve this using Pandas DataFrame without iterating through each column/row or anything complicated?

Basically you have to check the matrix without the schema, so:
7 in df.values
x in df checks if x is in the columns:
for x in df:
print x,
out: a b c d

Python pandas Reading specific values from HDF5 files using read_hdf and HDFStore.select

So I created hdf5 file with a simple dataset that looks like this
>>> pd.read_hdf('STORAGE2.h5', 'table')
A B
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
Using this script
import pandas as pd
import scipy as sp
from pandas.io.pytables import Term
store = pd.HDFStore('STORAGE2.h5')
df_tl = pd.DataFrame(dict(A=list(range(5)), B=list(range(5))))
df_tl.to_hdf('STORAGE2.h5','table',append=True)
I know I can select columns using
x = pd.read_hdf('STORAGE2.h5', 'table', columns=['A'])
or
x = store.select('table', where = 'columns=A')
How would I select all values in column 'A' that equals 3 or specific or indicies with strings in column 'A' like 'foo'? In pandas dataframes I would use df[df["A"]==3] or df[df["A"]=='foo']
Also does it make a difference in efficiency if I use read_hdf() or store.select()?

You need to specify data_columns= (you can use True as well to make all columns searchable)
(FYI, the mode='w' will start the file over, and is just for my example)
In [50]: df_tl.to_hdf('STORAGE2.h5','table',append=True,mode='w',data_columns=['A'])
In [51]: pd.read_hdf('STORAGE2.h5','table',where='A>2')
Out[51]:
A B
3 3 3
4 4 4

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Query on Pandas DataFrame - python

Related

How to convert first column of dataframe in to its headers

New column in dataset based em last value of item

I have a value in a table but when I try to verify if the value is there it returns false [duplicate]

See if a value exists in a DataFrame

Python pandas Reading specific values from HDF5 files using read_hdf and HDFStore.select

Categories

Resources