I am trying to extract a value out of a dataframe and put it into a variable. Then later I will record that value into an Excel workbook.
First I run a SQL query and store into a df:
df = pd.read_sql(strSQL, conn)
I am looping through another list of items and looking them up in the df. They are connected by MMString in the df and MMConcat from the list of items I'm looping through.
dftemp = df.loc[df['MMString'] == MMConcat]
Category = dftemp['CategoryName'].item()
I get the following error at the last line of code above. ValueError: can only convert an array of size 1 to a Python scalar
In the debug console, when I run that last line of code but not store it to a variable, I get what looks like a string value. For example, 'Pickup Truck'.
How can I simply store the value that I'm looking up in the df to a variable?
Index by row and column with loc to return a series, then extract the first value via iat:
Category = df.loc[df['MMString'] == MMConcat, 'CategoryName'].iat[0]
Alternatively, get the first value from the NumPy array representation:
Category = df.loc[df['MMString'] == MMConcat, 'CategoryName'].values[0]
The docs aren't helpful, but pd.Series.item just calls np.ndarray.item and only works for a series with one value:
pd.Series([1]).item() # 1
pd.Series([1, 2]).item() # ValueError: can only convert an array of size 1
Related
Normally when splitting a value which is a string, one would simply do:
string = 'aabbcc'
small = string[0:2]
And that's simply it. I thought it would be the same thing for a dataframe by doing:
df = df['Column'][Range][Length of Desired value]
df = df['Date'][0:4][2:4]
Note: Every string in the column have the same length and are all integers written as a string data type
If I use the code above the program just throws the Range and takes [2:4] as the range which is weird.
When doing this individually it works:
df2 = df['Column'][index][2:4]
So right now I had to make a loop that goes one by one and append it to a new Dataframe.
To do the operation element wise, you can use apply (see link):
df['Column'][0:4].apply(lambda x : x[2:4])
When you did df2 = df['Column'][0:4][2:4], you are doing the same as df2 = df['Column'][2:4].
You're getting the indexes 0 to 4 of df and then getting the indexes 2 to 4 of that one.
I have a certain data to clean, it's some keys where the keys have six leading zeros that I want to get rid of, and if the keys are not ending with "ABC" or it's not ending with "DEFG", then I need to clean the currency code in the last 3 indexes. If the key doesn't start with leading zeros, then just return the key as it is.
To achieve this I wrote a function that deals with string as below:
def cleanAttainKey(dirtyAttainKey):
if dirtyAttainKey[0] != "0":
return dirtyAttainKey
else:
dirtyAttainKey = dirtyAttainKey.strip("0")
if dirtyAttainKey[-3:] != "ABC" and dirtyAttainKey[-3:] != "DEFG":
dirtyAttainKey = dirtyAttainKey[:-3]
cleanAttainKey = dirtyAttainKey
return cleanAttainKey
Now I build a dummy data frame to test it but it's reporting errors:
data frame
df = pd.DataFrame({'dirtyKey':["00000012345ABC","0000012345DEFG","0000023456DEFGUSD"],'amount':[100,101,102]},
columns=["dirtyKey","amount"])
I need to get a new column called "cleanAttainKey" in the df, then modify each value in the "dirtyKey" using the "cleanAttainKey" function, then assign the cleaned key to the new column "cleanAttainKey", however it seems pandas doesn't support this type of modification.
# add a new column in df called cleanAttainKey
df['cleanAttainKey'] = ""
# I want to clean the keys and get into the new column of cleanAttainKey
dirtyAttainKeyList = df['dirtyKey'].tolist()
for i in range(len(df['cleanAttainKey'])):
df['cleanAttainKey'][i] = cleanAttainKey(vpAttainKeyList[i])
I am getting the below error message:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
The result should be the same as the df2 below:
df2 = pd.DataFrame({'dirtyKey':["00000012345ABC","0000012345DEFG","0000023456DEFGUSD"],'amount':[100,101,102],
'cleanAttainKey':["12345ABC","12345DEFG","23456DEFG"]},
columns=["dirtyKey","cleanAttainKey","amount"])
df2
Is there any better way to modify the dirty keys and get a new column with the clean keys in Pandas?
Thanks
Here is the culprit:
df['cleanAttainKey'][i] = cleanAttainKey(vpAttainKeyList[i])
When you use extract of the dataframe, Pandas reserves the ability to choose to make a copy or a view. It does not matter if you are just reading the data, but it means that you should never modify it.
The idiomatic way is to use loc (or iloc or [i]at):
df.loc[i, 'cleanAttainKey'] = cleanAttainKey(vpAttainKeyList[i])
(above assumes a natural range index...)
I'm using Pandas 0.20.3 in my python 3.X. I want to add one column in a pandas data frame from another pandas data frame. Both the data frame contains 51 rows. So I used following code:
class_df['phone']=group['phone'].values
I got following error message:
ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series
class_df.dtypes gives me:
Group_ID object
YEAR object
Terget object
phone object
age object
and type(group['phone']) returns pandas.core.series.Series
Can you suggest me what changes I need to do to remove this error?
The first 5 rows of group['phone'] are given below:
0 [735015372, 72151508105, 7217511580, 721150431...
1 []
2 [735152771, 7351515043, 7115380870, 7115427...
3 [7111332015, 73140214, 737443075, 7110815115...
4 [718218718, 718221342, 73551401, 71811507...
Name: phoen, dtype: object
In most cases, this error comes when you return an empty dataframe. The best approach that worked for me was to check if the dataframe is empty first before using apply()
if len(df) != 0:
df['indicator'] = df.apply(assign_indicator, axis=1)
You have a column of ragged lists. Your only option is to assign a list of lists, and not an array of lists (which is what .value gives).
class_df['phone'] = group['phone'].tolist()
The error of the Question-Headline
"ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series"
might as well occur if for what ever reason the table does not have any rows.
Instead of using an if-statement, you can use set result_type argument of apply() function to "reduce".
df['new_column'] = df.apply(func, axis=1, result_type='reduce')
The data assigned to a column in the DataFrame must be a single dimension array. For example, consider a num_arr to be added to a DataFrame
num_arr.shape
(1, 126)
For this num_arr to be added to a DataFrame column, It should be reshaped....
num_arr = num_arr.reshape(-1, )
num_arr.shape
(126,)
Now I could set this arr as a DataFrame column
df = pd.DataFrame()
df['numbers'] = num_arr
I am trying to select a value from a dataframe. But the problem is the output is with data type and column name.
Here is my data frame which i am reading from a csv file,
Name,Code
blackberry,1
wineberry,2
rasberry,1
blueberry,1
mulberry,2
And here is my testing code-
dataFrame=pd.read_csv("test.csv")
value = dataFrame.loc[dataFrame['Name'] == 'rasberry']['Code']
print(value)
strvalue=str(value)
if(strvalue=="1"):
print("got it")
The expected ouput of value would be 1 but it is
2 1\nName: Code, dtype: int64
and that's why the if condition is not working. How can I get the specific value?
I am using pandas
The value you get is a Series object. You can use .iloc to extract the value from it:
value.iloc[0]
# 1
Or you can use .values to extract the underlying numpy array and then use index to extract the value:
value.values[0]
# 1
Break It Down
dataFrame['Name'] returns a pd.Series
dataFrame['Name'] == 'rasberry' returns a pd.Series with dtype bool
dataFrame.loc[dataFrame['Name'] == 'rasberry'] uses the boolean pd.Series to slice dataFrame returning a pd.DataFrame that is a subset of dataFrame
dataFrame.loc[dataFrame['Name'] == 'rasberry']['code'] is a pd.Series that is the column named 'code' in the sliced dataframe from step 3.
If you expect the elements in the 'Name' column to be unique, then this will be a one row pd.Series.
You want the element inside but at this point it's the difference between 'value' and ['value']
Setup
from io import StringIO
txt = """Name,Code
blackberry,1
wineberry,2
rasberry,1
blueberry,1
mulberry,2"""
Solution(s)
use iloc to grab first value
dataFrame=pd.read_csv(StringIO(txt))
value = dataFrame.query('Name == "rasberry"').Code.iloc[0]
print(value)
use iat to grab first value
dataFrame=pd.read_csv(StringIO(txt))
value = dataFrame.query('Name == "rasberry"').Code.iat[0]
print(value)
specify index column when reading in csv and use loc
dataFrame=pd.read_csv(StringIO(txt), index_col='Name')
value = dataFrame.loc['rasberry', 'Code']
print(value)
specify index column when reading in csv and use at
dataFrame=pd.read_csv(StringIO(txt), index_col='Name')
value = dataFrame.at['rasberry', 'Code']
print(value)
specify index column when reading in csv and use get_value
dataFrame=pd.read_csv(StringIO(txt), index_col='Name')
value = dataFrame.get_value('rasberry', 'Code')
print(value)
specify the index column when reading the csv and squeeze into a series if only one non index column exists
series=pd.read_csv(StringIO(txt), index_col='Name', squeeze=True)
value = series.rasberry
print(value)
I have a data set with several dozen columns and am sorting two columns in question by Max value and storing the result in a variable to print it later to a report. How do I only return the Two columns so that they are on the same as my string "Max". Below is the method I am using which returns the ID # in my variable also.
#Create DF
prim1 = mru[['Time', 'Motion:MRU']]
# Sort
prim1 = prim1.sort(['Motion:MRU'], ascending=True)
primmin = prim1['Motion:MRU'].min()
print 'Max: ', prim1[:1]
Basically what you see printed will be a pandas Series in the form of :
<index> <value>
If you want just the value then you access the numpy array data attribute by doing this:
print 'Max: ', prim1[:1].values[0]
This will return a numpy array with a single element and then to access the scalar value you subscript the single value using [0]