I am having below scenario in python code:
valuesarray = ['None','FULL'] --> values of the column 'GLASS' of dataframe df
keysarray = ['GLASS','SUBJECT']
I have value 'GLASS' coming from some upper code. I am assigning it to keyvalue
keyvalue = keysarray[0]
I am trying to get all the records for glass matching None and Full from the dataframe.
df= df[df.keyvalue.isin(valuesarray)]
But, getting error --> keyvalue is not a column for the dataframe.
How to access value of variable keyvalue in this scenario? any idea?
Related
So I have a pyspark dataframe that I want to add another column to using the value from the Section_1 column and find its corresponding value in a python dictionary. So basically use the value from the Section_1 cell as the key and then fill in the value from the python dictionary in the new column like below.
Original dataframe
DataId
ObjId
Name
Object
Section_1
My data
Data name
Object name
rd.111
rd.123
Python Dictionary
object_map= {'rd.123' : 'rd.567'}
Where section 1 has a value of rd.123 and I will search in the dictionary for the key 'rd.123' and want to return that value of rd.567 and place that in the new column
Desired DataFrame
DataId
ObjId
Name
Object
Section_1
Section_2
My data
Data name
Object name
rd.111
rd.123
rd.567
Right now I got this error with my current code and I dont really know what I did wrong as I am not to familiar with pyspark
There is an incorrect call to a Column object in your code. Please
review your code.
Here is my code that I am currently using where object_map is the python dictionary.
test_df = output.withColumn('Section_2', object_map.get(output.Section_1.collect()))
You can try this (adapted from this answer with added null handling):
from itertools import chain
from pyspark.sql.functions import create_map, lit, when
object_map = {'rd.123': 'rd.567'}
mapping_expr = create_map([lit(x) for x in chain(*object_map.items())])
df1 = df.filter(df['Section_1'].isNull()).withColumn('Section_2', F.lit(None))
df2 = df.filter(df['Section_1'].isNotNull()).withColumn(
'Section_2',
when(
df['Section_1'].isNotNull(),
mapping_expr[df['Section_1']]
)
)
result = df1.unionAll(df2)
I have a certain data to clean, it's some keys where the keys have six leading zeros that I want to get rid of, and if the keys are not ending with "ABC" or it's not ending with "DEFG", then I need to clean the currency code in the last 3 indexes. If the key doesn't start with leading zeros, then just return the key as it is.
To achieve this I wrote a function that deals with string as below:
def cleanAttainKey(dirtyAttainKey):
if dirtyAttainKey[0] != "0":
return dirtyAttainKey
else:
dirtyAttainKey = dirtyAttainKey.strip("0")
if dirtyAttainKey[-3:] != "ABC" and dirtyAttainKey[-3:] != "DEFG":
dirtyAttainKey = dirtyAttainKey[:-3]
cleanAttainKey = dirtyAttainKey
return cleanAttainKey
Now I build a dummy data frame to test it but it's reporting errors:
data frame
df = pd.DataFrame({'dirtyKey':["00000012345ABC","0000012345DEFG","0000023456DEFGUSD"],'amount':[100,101,102]},
columns=["dirtyKey","amount"])
I need to get a new column called "cleanAttainKey" in the df, then modify each value in the "dirtyKey" using the "cleanAttainKey" function, then assign the cleaned key to the new column "cleanAttainKey", however it seems pandas doesn't support this type of modification.
# add a new column in df called cleanAttainKey
df['cleanAttainKey'] = ""
# I want to clean the keys and get into the new column of cleanAttainKey
dirtyAttainKeyList = df['dirtyKey'].tolist()
for i in range(len(df['cleanAttainKey'])):
df['cleanAttainKey'][i] = cleanAttainKey(vpAttainKeyList[i])
I am getting the below error message:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
The result should be the same as the df2 below:
df2 = pd.DataFrame({'dirtyKey':["00000012345ABC","0000012345DEFG","0000023456DEFGUSD"],'amount':[100,101,102],
'cleanAttainKey':["12345ABC","12345DEFG","23456DEFG"]},
columns=["dirtyKey","cleanAttainKey","amount"])
df2
Is there any better way to modify the dirty keys and get a new column with the clean keys in Pandas?
Thanks
Here is the culprit:
df['cleanAttainKey'][i] = cleanAttainKey(vpAttainKeyList[i])
When you use extract of the dataframe, Pandas reserves the ability to choose to make a copy or a view. It does not matter if you are just reading the data, but it means that you should never modify it.
The idiomatic way is to use loc (or iloc or [i]at):
df.loc[i, 'cleanAttainKey'] = cleanAttainKey(vpAttainKeyList[i])
(above assumes a natural range index...)
this is my dictionary with dataI have two dataframes A & B having one column in common of type string (str) i.e. country_name
I converted dataframe B into a dictionary object which has only two columns of string type. that means both key and value columns are of string type.
My task is to find the value based on the key from the dataframe B. The key belongs to dataframe A since I have a common column.
Here is my code...
I have tried multiple options but nothing works for me like
name = count_list.get(key, "")
name = count_list['Value'][key]
A= pd.DataFrame(columns:{'index','name','score'})
B= pd.DataFrame(columns:{'name','Value'})
B= B.to_dict()
for index, row in A.iterrows():
try:
key = str(row['name']).lower()
name = B.get(key, "")
score1.append(row['score'])
country_list.append(name)
except TypeError:
print(row)
except IndexError:
print(row)
I want to get the exact value against a key in the data frame B.
both key and Value columns are of string type.
I am trying to extract a value out of a dataframe and put it into a variable. Then later I will record that value into an Excel workbook.
First I run a SQL query and store into a df:
df = pd.read_sql(strSQL, conn)
I am looping through another list of items and looking them up in the df. They are connected by MMString in the df and MMConcat from the list of items I'm looping through.
dftemp = df.loc[df['MMString'] == MMConcat]
Category = dftemp['CategoryName'].item()
I get the following error at the last line of code above. ValueError: can only convert an array of size 1 to a Python scalar
In the debug console, when I run that last line of code but not store it to a variable, I get what looks like a string value. For example, 'Pickup Truck'.
How can I simply store the value that I'm looking up in the df to a variable?
Index by row and column with loc to return a series, then extract the first value via iat:
Category = df.loc[df['MMString'] == MMConcat, 'CategoryName'].iat[0]
Alternatively, get the first value from the NumPy array representation:
Category = df.loc[df['MMString'] == MMConcat, 'CategoryName'].values[0]
The docs aren't helpful, but pd.Series.item just calls np.ndarray.item and only works for a series with one value:
pd.Series([1]).item() # 1
pd.Series([1, 2]).item() # ValueError: can only convert an array of size 1
I am trying to use pandas data-frame as a parameter table which is loaded in the beginning of my application run.
Structure of the csv that is being loaded into the data-frame is as below :
param_name,param_value
source_dir,C:\Users\atiwari\Desktop\EDIFACT\source_dir
So the column names would be param_name and param_values.
How do i go about selecting the value from param_value where param_name == 'source_dir'?
I tried the below but it returns a data-frame with index not a string value:
param_df.loc[param_df['param_name']=='source_dir']['param_value']
It return Series:
s = param_df.loc[param_df['param_name']=='source_dir', 'param_value']
But if need DataFrame:
df = param_df.loc[param_df['param_name']=='source_dir', ['param_value']]
For scalar need convert Series by selecting by [] - select first value by 0. Also works iat.
Series.item need Series with values else get error if empty Series:
val = s.values[0]
val = s.iat[0]
val = s.item()