Using json_normalize I created a dataframe, the data I parsed includes a list of dictionaries, hence I used the parameters "record_path" and "meta".
The issue is that I get a KeyError when a calling a column (that exists) from the dataframe.
The column exists, in fact I do .columns it prints, but when I call it, it throws the error.
I'm using Pandas:1.3.5
audiences = response['audiences']
audiences_df =pd.json_normalize(response['audiences'],
record_path=['fees'], meta=['audience_id ','audience_name '], errors='ignore')
print(audiences_df.columns)
print(audiences_df["audience_id"]
The response I get is
if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 'audience_id'
The KeyError you're encountering is likely because of the extra spaces in the meta parameter.
try this instead
meta=['audience_id','audience_name']
Related
I have a dictionary that's an output from an NFL play-by-play API and to clean this up, I wanted to use the json_normalize function as follows:
pd.json_normalize(data, record_path = ['periods', 'pbp'])
When I do this, the table looks like this:
As you can see, there's another events layer that I'd like to access in the dataframe. However, when I add 'events' to the above code, I get the following KeyError:
What am I missing about this? Is there not a way to get deeper into the full dictionary? Any suggestions would be great!
This question already has answers here:
Get key name from Python KeyError exception
(5 answers)
Closed last year.
I am writing a helper function were I need to know what key is missing when a KeyError is raised.
Consider this:
def foo(x:dict):
try:
y = x['A']
except KeyError as ke:
return ke
foo(dict())
As intended, the code above raises a KeyError. More specifically, the repr is KeyError('A').
I want to access that "A" that is missing, but it does not seem to be an attribute of the KeyError object.
Any ideas?
Ps. The reason I'm doing this is because I want to write a function of the form
def check_calculation_dependency_on_data(calculation:callable, data:pd.DataFrame)->list[str]:
'''Returns the mandatory columns data needs to contain for the calculation to work'''
I have a version of the above where data starts with all columns and progressively removes them one at a time (inside a try-except block) to list the columns that raise a KeyError if missing. However, the code would be much more efficient if it starts with an empty dataframe and I add the missing columns one by one until no KeyError is raised.
Alredy answered here Link
map1 = {}
try:
x = map1['key2']
except KeyError as e:
print(e.args[0])
I am trying to use Python DataFrame.Get_Value(Index,ColumnName) to get value of column and it keep throwing following Error
"'[10004]' is an invalid key" where 10004 is index value.
This is how Dataframe looks:
I have successfully used get_value before.. I dont know whats wrong with this dataframe.
First, pandas.DataFrame.get_value is deprecated (and should have been get_value, as opposed to Get_Value). It's better to use a non-deprecated method such as .loc or .at instead:
df.loc[10004, 'Column_Name']
# Or:
df.at[10004, 'Column_Name']
Your issue with might be that you have 10004 stored as a string instead of an integer. Try surrounding the index by quotes (df.loc['10004', 'Column_Name']). You can check this easily by saying: df.index.dtype, and seeing if it returns dtype('O')
Very simple code using spark + python:
df = spark.read.option("header","true").csv(file_name)
df = df_abnor_matrix.fillna(0)
but error occured:
pyspark.sql.utils.AnalysisException: u'Cannot resolve column name
"cp_com.game.shns.uc" among (ProductVersion, IMEI, FROMTIME, TOTIME,
STATISTICTIME, TimeStamp, label, MD5, cp_com.game.shns.uc,
cp_com.yunchang....
What's wrong with it? cp_com.game.shns.uc is among the list.
Spark does not support dot character in column names, check issue, so you need to replace dots with underscore before working on the csv.
I get the following error when attempting to read in a json file that was written using DataFrame.to_json().
ValueError: If using all scalar values, you must must pass an index
The format of the json is {index: value, index: value, ...} and came from a one column dataframe.
Here is the json file on dropbox:
https://www.dropbox.com/s/md1awxetkri0nb3/pandas_json_ulines.json
The call I've tried is:
pd.read_json('pandas_json_ulines.json')
with various "orient"'s, but since I did not explicitly set the orient on the to_json call I don't think it should be necessary.
Does anyone have an idea what I'm doing wrong? Thanks.
Turns out it was not a DF afterall, rather a Series. Needed to specifiy typ='series' in from_json call.