I am trying to select a value from a dataframe. But the problem is the output is with data type and column name.
Here is my data frame which i am reading from a csv file,
Name,Code
blackberry,1
wineberry,2
rasberry,1
blueberry,1
mulberry,2
And here is my testing code-
dataFrame=pd.read_csv("test.csv")
value = dataFrame.loc[dataFrame['Name'] == 'rasberry']['Code']
print(value)
strvalue=str(value)
if(strvalue=="1"):
print("got it")
The expected ouput of value would be 1 but it is
2 1\nName: Code, dtype: int64
and that's why the if condition is not working. How can I get the specific value?
I am using pandas
The value you get is a Series object. You can use .iloc to extract the value from it:
value.iloc[0]
# 1
Or you can use .values to extract the underlying numpy array and then use index to extract the value:
value.values[0]
# 1
Break It Down
dataFrame['Name'] returns a pd.Series
dataFrame['Name'] == 'rasberry' returns a pd.Series with dtype bool
dataFrame.loc[dataFrame['Name'] == 'rasberry'] uses the boolean pd.Series to slice dataFrame returning a pd.DataFrame that is a subset of dataFrame
dataFrame.loc[dataFrame['Name'] == 'rasberry']['code'] is a pd.Series that is the column named 'code' in the sliced dataframe from step 3.
If you expect the elements in the 'Name' column to be unique, then this will be a one row pd.Series.
You want the element inside but at this point it's the difference between 'value' and ['value']
Setup
from io import StringIO
txt = """Name,Code
blackberry,1
wineberry,2
rasberry,1
blueberry,1
mulberry,2"""
Solution(s)
use iloc to grab first value
dataFrame=pd.read_csv(StringIO(txt))
value = dataFrame.query('Name == "rasberry"').Code.iloc[0]
print(value)
use iat to grab first value
dataFrame=pd.read_csv(StringIO(txt))
value = dataFrame.query('Name == "rasberry"').Code.iat[0]
print(value)
specify index column when reading in csv and use loc
dataFrame=pd.read_csv(StringIO(txt), index_col='Name')
value = dataFrame.loc['rasberry', 'Code']
print(value)
specify index column when reading in csv and use at
dataFrame=pd.read_csv(StringIO(txt), index_col='Name')
value = dataFrame.at['rasberry', 'Code']
print(value)
specify index column when reading in csv and use get_value
dataFrame=pd.read_csv(StringIO(txt), index_col='Name')
value = dataFrame.get_value('rasberry', 'Code')
print(value)
specify the index column when reading the csv and squeeze into a series if only one non index column exists
series=pd.read_csv(StringIO(txt), index_col='Name', squeeze=True)
value = series.rasberry
print(value)
Related
I have pandas dataframe which contains value in below format. How to filter dataframe which matches the 'd6d4e77e-b8ec-467a-ba06-1c6079aa2d82' in any of the value of type list part of PathDSC column
i tried
def get_projects_belongs_to_root_project(project_df, root_project_id):
filter_project_df = project_df.query("root_project_id in PathDSC")
it didn't work i got empty dataframe
Assuming the values of PathDSC column are lists of strings, you can check row-wise if each list contains the wanted value and mask those rows using Series.apply. Then select only those rows using boolean indexing.
def get_projects_belongs_to_root_project(project_df, root_project_id):
mask = project_df['PathDSC'].apply(lambda lst: root_project_id in lst)
filter_project_df = project_df[mask]
# ...
root_project_id = 'd6d4e77e-b8ec-467a-ba06-1c6079aa2d82'
df = df[df['PathDSC'].str.contains(root_project_id)]
I have following problem with a dataframe in python:
I have a dataframe with an ID column (which is not the index) and other columns.
Now I want to write a code that gives back a new dataframe with all rows that have the same value in columnx, as the requested item ID. It should also contain all columns of the dataframe df.
def subset(itemID):
columnxValue = df[df['ID'] == itemID]['columnx']
subset = df[df['columnx'] == columnxValue]
return subset
If I do it like this I always get the Error "Can only compare identically-labeled Series Objects
I changed the question to be more clear.
You can use .loc as follows:
def subset(itemID):
columnValueRequest = df.loc[df['ID'] == itemID, 'columnx'].iloc[0]
subset1 = df[df['columnx'] == columnValueRequest]
return subset1
As you want to get a value, instead of a Series for the variable columnValueRequest, you have to further use .iloc[0] to get the (first) value.
Do you mean something like this?
You give the ItemID as an argument to the subset function. Then it checks if the ItemID corresponds to the value in the ID column. It returns the value from columnx from the row where ItemID is equal to the value in the ID column.
def subset(itemID):
columnValueRequest = df[df['ID'] == itemID][columnx]
subset = df[df[columnx] == columnValueRequest]
return subset
I am trying to extract a value out of a dataframe and put it into a variable. Then later I will record that value into an Excel workbook.
First I run a SQL query and store into a df:
df = pd.read_sql(strSQL, conn)
I am looping through another list of items and looking them up in the df. They are connected by MMString in the df and MMConcat from the list of items I'm looping through.
dftemp = df.loc[df['MMString'] == MMConcat]
Category = dftemp['CategoryName'].item()
I get the following error at the last line of code above. ValueError: can only convert an array of size 1 to a Python scalar
In the debug console, when I run that last line of code but not store it to a variable, I get what looks like a string value. For example, 'Pickup Truck'.
How can I simply store the value that I'm looking up in the df to a variable?
Index by row and column with loc to return a series, then extract the first value via iat:
Category = df.loc[df['MMString'] == MMConcat, 'CategoryName'].iat[0]
Alternatively, get the first value from the NumPy array representation:
Category = df.loc[df['MMString'] == MMConcat, 'CategoryName'].values[0]
The docs aren't helpful, but pd.Series.item just calls np.ndarray.item and only works for a series with one value:
pd.Series([1]).item() # 1
pd.Series([1, 2]).item() # ValueError: can only convert an array of size 1
I'm using Pandas 0.20.3 in my python 3.X. I want to add one column in a pandas data frame from another pandas data frame. Both the data frame contains 51 rows. So I used following code:
class_df['phone']=group['phone'].values
I got following error message:
ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series
class_df.dtypes gives me:
Group_ID object
YEAR object
Terget object
phone object
age object
and type(group['phone']) returns pandas.core.series.Series
Can you suggest me what changes I need to do to remove this error?
The first 5 rows of group['phone'] are given below:
0 [735015372, 72151508105, 7217511580, 721150431...
1 []
2 [735152771, 7351515043, 7115380870, 7115427...
3 [7111332015, 73140214, 737443075, 7110815115...
4 [718218718, 718221342, 73551401, 71811507...
Name: phoen, dtype: object
In most cases, this error comes when you return an empty dataframe. The best approach that worked for me was to check if the dataframe is empty first before using apply()
if len(df) != 0:
df['indicator'] = df.apply(assign_indicator, axis=1)
You have a column of ragged lists. Your only option is to assign a list of lists, and not an array of lists (which is what .value gives).
class_df['phone'] = group['phone'].tolist()
The error of the Question-Headline
"ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series"
might as well occur if for what ever reason the table does not have any rows.
Instead of using an if-statement, you can use set result_type argument of apply() function to "reduce".
df['new_column'] = df.apply(func, axis=1, result_type='reduce')
The data assigned to a column in the DataFrame must be a single dimension array. For example, consider a num_arr to be added to a DataFrame
num_arr.shape
(1, 126)
For this num_arr to be added to a DataFrame column, It should be reshaped....
num_arr = num_arr.reshape(-1, )
num_arr.shape
(126,)
Now I could set this arr as a DataFrame column
df = pd.DataFrame()
df['numbers'] = num_arr
I am trying to use pandas data-frame as a parameter table which is loaded in the beginning of my application run.
Structure of the csv that is being loaded into the data-frame is as below :
param_name,param_value
source_dir,C:\Users\atiwari\Desktop\EDIFACT\source_dir
So the column names would be param_name and param_values.
How do i go about selecting the value from param_value where param_name == 'source_dir'?
I tried the below but it returns a data-frame with index not a string value:
param_df.loc[param_df['param_name']=='source_dir']['param_value']
It return Series:
s = param_df.loc[param_df['param_name']=='source_dir', 'param_value']
But if need DataFrame:
df = param_df.loc[param_df['param_name']=='source_dir', ['param_value']]
For scalar need convert Series by selecting by [] - select first value by 0. Also works iat.
Series.item need Series with values else get error if empty Series:
val = s.values[0]
val = s.iat[0]
val = s.item()