i need to reshape the time series table
ex) A => B
A
no,A,B,B_sub
1,start,val_s,val_s_sub
2,study,val_st,val_st_sub
3,work,val_w,val_w_sub
4,end,val_e,val_e_sub
5,start,val_s1,val_s1_sub
6,end,val_e1,val_e1_sub
7,start,val_s2,val_s2_sub
8,work,val_w1,val_w1_sub
9,end,val_e2,val_e2_sub
B
,start,,study,,work,,end,
,B,B_sub,B,B_sub,B,B_sub,B,B_sub
4-1,val_s,val_s_sub,val_st,val_st_sub,val_w,val_w_sub,val_e,val_e_sub
6-5,val_s1,val_s1_sub,,,,,val_e1,val_e1_sub
9-7,val_s2,val_s2_sub,,,val_w1,val_w1_sub,val_e2,val_e2_sub
I tried to use the pivot table function of the python - pandas library,
but there is no common string to use as index in my table
can i get a hint?
i'm lost.. help me plz..
Does this get you close enough?
df_a['grp'] = (df_a['A'] == 'start').cumsum()
df_a.set_index(['grp','A']).unstack('A')
Output:
no B B_sub
A end start study work end start study work end start study work
grp
1 4.0 1.0 2.0 3.0 val_e val_s val_st val_w val_e_sub val_s_sub val_st_sub val_w_sub
2 6.0 5.0 NaN NaN val_e1 val_s1 NaN NaN val_e1_sub val_s1_sub NaN NaN
3 9.0 7.0 NaN 8.0 val_e2 val_s2 NaN val_w1 val_e2_sub val_s2_sub NaN val_w1_sub
Going a little further with reshaping and renaming and shaping:
df_r = df_a.set_index(['grp','A']).unstack('A')
steps = df_r[('no', 'end')].astype(int).astype(str).str.cat(df_r[('no', 'start')].astype(int).astype(str), sep='-')
df_r.set_index(steps)[['B', 'B_sub']].swaplevel(0,1, axis=1).sort_index(level=0, axis=1)
Output:
A end start study work
B B_sub B B_sub B B_sub B B_sub
(no, end)
4-1 val_e val_e_sub val_s val_s_sub val_st val_st_sub val_w val_w_sub
6-5 val_e1 val_e1_sub val_s1 val_s1_sub NaN NaN NaN NaN
9-7 val_e2 val_e2_sub val_s2 val_s2_sub NaN NaN val_w1 val_w1_sub
Related
I manually create a DataFrame:
import pandas as pd
df_articles1 = pd.DataFrame({'Id' : [4,5,8,9],
'Class':[
{'encourage': 1, 'contacting': 1},
{'cardinality': 16, 'subClassOf': 3},
{'get-13.5.1': 1},
{'cardinality': 12, 'encourage': 1}
]
})
I export it to a csv file to import after splitting it:
df_articles1.to_csv(f"""{path}articles_split.csv""", index = False, sep=";")
I can split it with pd.json_normalize():
df_articles1 = pd.json_normalize(df_articles1['Class'])
I import its csv file to a DataFrame:
df_articles2 = pd.read_csv(f"""{path}articles_split.csv""", sep=";")
But this fails with:
AttributeError: 'str' object has no attribute 'values' pd.json_normalize(df_articles2['Class'])
that was because when you save by to_csv() the data in your 'Class' column is stored as string not as dictionary/json so after loading that saved data:
df_articles2 = pd.read_csv(f"""{path}articles_split.csv""", sep=";")
Then to make it back in original form make use of eval() method and apply() method:-
df_articles2['Class']=df_articles2['Class'].apply(lambda x:eval(x))
Finally:
resultdf=pd.json_normalize(df_articles2['Class'])
Now If you print resultdf you will get your desired output
While the accepted answer works, using eval is bad practice.
To parse a string column that looks like JSON/dict, use one of the following options (last one is best, if possible).
ast.literal_eval (better)
import ast
objects = df2['Class'].apply(ast.literal_eval)
normed = pd.json_normalize(objects)
df2[['Id']].join(normed)
# Id encourage contacting cardinality subClassOf get-13.5.1
# 0 4 1.0 1.0 NaN NaN NaN
# 1 5 NaN NaN 16.0 3.0 NaN
# 2 8 NaN NaN NaN NaN 1.0
# 3 9 1.0 NaN 12.0 NaN NaN
json.loads (even better)
import json
objects = df2['Class'].apply(json.loads)
normed = pd.json_normalize(objects)
df2[['Id']].join(normed)
# encourage contacting cardinality subClassOf get-13.5.1
# 0 1.0 1.0 NaN NaN NaN
# 1 NaN NaN 16.0 3.0 NaN
# 2 NaN NaN NaN NaN 1.0
# 3 1.0 NaN 12.0 NaN NaN
If the strings are single quoted, use str.replace to convert them to double quotes (and thus valid JSON) before applying json.loads:
objects = df2['Class'].str.replace("'", '"').apply(json.loads)
normed = pd.json_normalize(objects)
df2[['Id']].join(normed)
pd.json_normalize before pd.to_csv (recommended)
If possible, when you originally save to CSV, just save the normalized JSON (not raw JSON objects):
df1 = df1[['Id']].join(pd.json_normalize(df1['Class']))
df1.to_csv('df1_normalized.csv', index=False, sep=';')
# Id;encourage;contacting;cardinality;subClassOf;get-13.5.1
# 4;1.0;1.0;;;
# 5;;;16.0;3.0;
# 8;;;;;1.0
# 9;1.0;;12.0;;
This is a more natural CSV workflow (rather than storing/loading object blobs):
df2 = pd.read_csv('df1_normalized.csv', sep=';')
# Id encourage contacting cardinality subClassOf get-13.5.1
# 0 4 1.0 1.0 NaN NaN NaN
# 1 5 NaN NaN 16.0 3.0 NaN
# 2 8 NaN NaN NaN NaN 1.0
# 3 9 1.0 NaN 12.0 NaN NaN
I am trying to read a deliminated text file into a dataframe in python. The deliminator is not being identified when I use pd.read_table. If I explicitly set sep = ' ', I get an error: Error tokenizing data. C error. Notably the defaults work when I use np.loadtxt().
Example:
pd.read_table('http://berkeleyearth.lbl.gov/auto/Global/Land_and_Ocean_complete.txt',
comment = '%',
header = None)
0
0 1850 1 -0.777 0.412 NaN NaN...
1 1850 2 -0.239 0.458 NaN NaN...
2 1850 3 -0.426 0.447 NaN NaN...
3 1850 4 -0.680 0.367 NaN NaN...
4 1850 5 -0.687 0.298 NaN NaN...
If I set sep = ' ', I get another error:
pd.read_table('http://berkeleyearth.lbl.gov/auto/Global/Land_and_Ocean_complete.txt',
comment = '%',
header = None,
sep = ' ')
ParserError: Error tokenizing data. C error: Expected 2 fields in line 78, saw 58
Looking up this error, people suggest using header = None (already done) and setting sep = explicitly, but that is causing the problem: Python Pandas Error tokenizing data. I looked up line 78 and can't see any problems. If I set error_bad_lines=False i get an empty df suggesting there is a problem with every entry.
Notably this works when I use np.loadtxt():
pd.DataFrame(np.loadtxt('http://berkeleyearth.lbl.gov/auto/Global/Land_and_Ocean_complete.txt',
comments = '%'))
0 1 2 3 4 5 6 7 8 9 10 11
0 1850.0 1.0 -0.777 0.412 NaN NaN NaN NaN NaN NaN NaN NaN
1 1850.0 2.0 -0.239 0.458 NaN NaN NaN NaN NaN NaN NaN NaN
2 1850.0 3.0 -0.426 0.447 NaN NaN NaN NaN NaN NaN NaN NaN
3 1850.0 4.0 -0.680 0.367 NaN NaN NaN NaN NaN NaN NaN NaN
4 1850.0 5.0 -0.687 0.298 NaN NaN NaN NaN NaN NaN NaN NaN
This suggests to me that there isn't something wrong with the file, but rather with how I am calling pd.read_table(). I looked through the documentation for np.loadtxt() in the hope of setting the sep to the same value, but that just shows: delimiter=None (https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html).
I'd prefer to be able to import this as a pd.DataFrame, setting the names, rather than having to import as a matrix and then convert to pd.DataFrame.
What am I getting wrong?
This one is quite tricky. Please try out the snippet code below:
import pandas as pd
url = 'http://berkeleyearth.lbl.gov/auto/Global/Land_and_Ocean_complete.txt'
df = pd.read_csv(url,
sep='\s+',
comment='%',
usecols=(0, 1, 2, 3, 4, 5, 7, 8, 9, 10, 11),
names=('Year', 'Month', 'M.Anomaly', 'M.Unc.', 'A.Anomaly',
'A.Unc.','5y.Anomaly', '5y.Unc.' ,'10y.Anomaly', '10y.Unc.',
'20y.Anomaly', '20y.Unc.'))
The issue is the file has 77 rows of commented text, for 'Global Average Temperature Anomaly with Sea Ice Temperature Inferred from Air Temperatures'
Two of the rows are headers
There's a bunch of data, then there are two more headers, and a new set of data for 'Global Average Temperature Anomaly with Sea Ice Temperature Inferred from Water Temperatures'
This solution separates the two tables in the file into separate dataframes.
This is not as nice as the other answer, but the data is properly separated into different dataframes.
The headers were a pain, it would probably be easier to manually create a custom header, and skip the lines of code for separating the headers from the text.
The important point separating air and ice data.
import requests
import pandas as pd
import math
# read the file with requests
url = 'http://berkeleyearth.lbl.gov/auto/Global/Land_and_Ocean_complete.txt'
response = requests.get(url)
data = response.text
# convert data into a list
data = [d.strip().replace('% ', '') for d in data.split('\n')]
# specify the data from the ranges in the file
air_header1 = data[74].split() # not used
air_header2 = [v.strip() for v in data[75].split(',')]
# combine the 2 parts of the header into a single header
air_header = air_header2[:2] + [f'{air_header1[math.floor(i/2)]}_{v}' for i, v in enumerate(air_header2[2:])]
air_data = [v.split() for v in data[77:2125]]
h2o_header1 = data[2129].split() # not used
h2o_header2 = [v.strip() for v in data[2130].split(',')]
# combine the 2 parts of the header into a single header
h2o_header = h2o_header2[:2] + [f'{h2o_header1[math.floor(i/2)]}_{v}' for i, v in enumerate(h2o_header2[2:])]
h2o_data = [v.split() for v in data[2132:4180]]
# create the dataframes
air = pd.DataFrame(air_data, columns=air_header)
h2o = pd.DataFrame(h2o_data, columns=h2o_header)
Without the header code
Simplify the code, by using a manual header list.
import pandas as pd
import requests
# read the file with requests
url = 'http://berkeleyearth.lbl.gov/auto/Global/Land_and_Ocean_complete.txt'
response = requests.get(url)
data = response.text
# convert data into a list
data = [d.strip().replace('% ', '') for d in data.split('\n')]
# manually created header
headers = ['Year', 'Month', 'Monthly_Anomaly', 'Monthly_Unc.',
'Annual_Anomaly', 'Annual_Unc.',
'Five-year_Anomaly', 'Five-year_Unc.',
'Ten-year_Anomaly', 'Ten-year_Unc.',
'Twenty-year_Anomaly', 'Twenty-year_Unc.']
# separate the air and h2o data
air_data = [v.split() for v in data[77:2125]]
h2o_data = [v.split() for v in data[2132:4180]]
# create the dataframes
air = pd.DataFrame(air_data, columns=headers)
h2o = pd.DataFrame(h2o_data, columns=headers)
air
Year Month Monthly_Anomaly Monthly_Unc. Annual_Anomaly Annual_Unc. Five-year_Anomaly Five-year_Unc. Ten-year_Anomaly Ten-year_Unc. Twenty-year_Anomaly Twenty-year_Unc.
0 1850 1 -0.777 0.412 NaN NaN NaN NaN NaN NaN NaN NaN
1 1850 2 -0.239 0.458 NaN NaN NaN NaN NaN NaN NaN NaN
2 1850 3 -0.426 0.447 NaN NaN NaN NaN NaN NaN NaN NaN
h2o
Year Month Monthly_Anomaly Monthly_Unc. Annual_Anomaly Annual_Unc. Five-year_Anomaly Five-year_Unc. Ten-year_Anomaly Ten-year_Unc. Twenty-year_Anomaly Twenty-year_Unc.
0 1850 1 -0.724 0.370 NaN NaN NaN NaN NaN NaN NaN NaN
1 1850 2 -0.221 0.430 NaN NaN NaN NaN NaN NaN NaN NaN
2 1850 3 -0.443 0.419 NaN NaN NaN NaN NaN NaN NaN NaN
I am having a little problem with pandas.concat
Namely, I am concatenating a dataframe with 3 series. The 1 dataframe and 2 of the series are concatenating as expected. One series, however is being attached to the bottom of my new data frame instead of as a column.
Here is my minimal working example. To get the output below, run it on the titanic Kaggle dataset.
#INCLUDED ONLY SO MY CODE WILL RUN ON YOUR MACHINE. IGNORE.
def bin_dump(data, increment):
if data <= increment:
return f'0 - {increment}'
if data % increment == 0:
return f'{data - increment} - {data}'
else:
m = data % increment
a = data - m
b = data + (increment - m)
return f'{a} - {b}'
#INCLUDED SO MY CODE WILL RUN ON YOUR MACHINE. IGNORE
train_df['AgeGroup'] = train_df.apply(lambda x: bin_dump(x.Age, 3), axis=1)
# THE PROBLEM IS ACTUALLY IN THIS METHOD:
def plot_dists(X, Y, input_df, percent_what):
totals = input_df[X].value_counts()
totals.name = 'totals'
df = pd.Series(totals.index).str.extract(r'([0-9]+)').astype('int64')
df.columns=['index']
values = pd.Series(totals.index, name=X)
percentages = []
for group, total in zip(totals.index, totals):
x = input_df.loc[(input_df[X] == group)&(input_df[Y] == 1), Y].sum()
percent = 1 - x/total
percentages.append(percent)
percentages = pd.Series(percentages, name='Percentages')
# THE PROBLEM IS HERE:
df = pd.concat([df, values, totals, percentages], axis=1).set_index('index').sort_index(axis=0)
return df
output looks like this:
AgeGroup totals Percentages
index
0.0 0 - 3 NaN 0.333333
3.0 3.0 - 6.0 NaN 0.235294
6.0 6.0 - 9.0 NaN 0.666667
9.0 9.0 - 12.0 NaN 0.714286
12.0 12.0 - 15.0 NaN 0.357143
15.0 15.0 - 18.0 NaN 0.625000
18.0 18.0 - 21.0 NaN 0.738462
21.0 21.0 - 24.0 NaN 0.57534
. . . .
. . . .
. . . .
NaN NaN 11.0 NaN
NaN NaN 15.0 NaN
NaN NaN 9.0 NaN
NaN NaN 6.0 NaN
So, the 'totals' are being appended as a dataframe on the bottom.
In addition to trying to fix this concat/append issue, I'd welcome any suggestions on how to optimize my code. This is my first go at building my own tool for visualizing data (I cut out the plotting part because it's not really part of the question).
Check this out. Did you try to change from concat to merge?
When I run the code below I get the error:
TypeError: 'NoneType' object has no attribute 'getitem'
import pyarrow
import pandas
import pyarrow.parquet as pq
df = pq.read_table("file.parquet").to_pandas()
df = df.iloc[1:,:]
df = df.dropna (how="any", inplace = True) # modifies it in place, creates new dataset without NAN
average_age = df["_c2"].mean()
print average_age
The dataframe looks like this:
_c0 _c1 _c2
0 RecId Class Age
1 1 1st 29
2 2 1st NA
3 3 1st 30
If I print the df after calling the dropna method, I get 'None'.
Shouldn't it be creating a new dataframe without the 'NA' in it, which would then allow me to get the average age without throwing an error?
As per OP’s comment, the NA is a string rather than NaN. So dropna() is no good here. One of many possible options for filtering out the string value ‘NA’ is:
df = df[df["_c2"] != "NA"]
A better option to catch inexact matches (e.g. with trailing spaces) as suggested by #DJK in the comments:
df = df[~df["_c2"].str.contains('NA')]
This one should remove any strings rather than only ‘NA’:
df = df[df[“_c2”].apply(lambda x: x.isnumeric())]
This will work, also if you the NA in your df is NaN (np.nan), this will not affect your getting the mean of the column, only if your NA is 'NA', which is string
(df.apply(pd.to_numeric,errors ='coerce',axis=1)).describe()
Out[9]:
_c0 _c1 _c2
count 3.0 0.0 2.000000
mean 2.0 NaN 29.500000
std 1.0 NaN 0.707107
min 1.0 NaN 29.000000
25% 1.5 NaN 29.250000
50% 2.0 NaN 29.500000
75% 2.5 NaN 29.750000
max 3.0 NaN 30.000000
More info
df.apply(pd.to_numeric,errors ='coerce',axis=1)# all object change to NaN and will not affect getting mean
Out[10]:
_c0 _c1 _c2
0 NaN NaN NaN
1 1.0 NaN 29.0
2 2.0 NaN NaN
3 3.0 NaN 30.0
Given the following DataFrame:
Category Area Country Code Function Last Name LanID Spend1 Spend2 Spend3 Spend4 Spend5
0 Bisc EE RU02,UA02 Mk Smith df3432 1.0 NaN NaN NaN NaN
1 Bisc EE RU02 Mk Bibs fdss34 1.0 NaN NaN NaN NaN
2 Bisc EE UA02,EURASIA Mk Crow fdsdr43 1.0 NaN NaN NaN NaN
3 Bisc WE FR31 Mk Ellis fdssdf3 1.0 NaN NaN NaN NaN
4 Bisc WE BE32,NL31 Mk Mower TOZ1720 1.0 NaN NaN NaN NaN
5 Bisc WE FR31,BE32,NL31 LKU Elan SKY8851 1.0 1.0 1.0 1.0 1.0
6 Bisc SE IT31 Mk Bobret 3dfsfg 1.0 NaN NaN NaN NaN
7 Bisc SE GR31 Mk Concept MOSGX009 1.0 NaN NaN NaN NaN
8 Bisc SE RU02,IT31,GR31,PT31,ES31 LKU Solar MSS5723 1.0 1.0 1.0 1.0 1.0
9 Bisc SE IT31,GR31,PT31,ES31 Mk Brix fdgd22 NaN 1.0 NaN NaN NaN
10 Choc CE RU02,CZ31,SK31,PL31,LT31 Fin Ocoser 43233d NaN 1.0 NaN NaN NaN
11 Choc CE DE31,AT31,HU31,CH31 Fin Smuth 4rewf NaN 1.0 NaN NaN NaN
12 Choc CE BG31,RO31,EMA Fin Momocs hgghg2 NaN 1.0 NaN NaN NaN
13 Choc WE FR31,BE32,NL31 Fin Bruntly ffdd32 NaN NaN NaN NaN 1.0
14 Choc WE FR31,BE32,NL31 Mk Ofer BROGX011 NaN 1.0 1.0 NaN NaN
15 Choc WE FR31,BE32,NL31 Mk Hem NZJ3189 NaN NaN NaN 1.0 1.0
16 G&C NE UA02,SE31 Mk Cre ORY9499 1.0 NaN NaN NaN NaN
17 G&C NE NO31 Mk Qlyo XVM7639 1.0 NaN NaN NaN NaN
18 G&C NE GB31,NO31,SE31,IE31,FI31 Mk Omny LOX1512 NaN 1.0 1.0 NaN NaN
I would like to get it exported into a nested Dict with the below structure:
{RU02: {Bisc: {EE: {Mkt: {Spend1: {df3432: Smith}
{fdss34: Bibs}
{Bisc: {SE: {LKU: {Spend1: {MSS5723: Solar}
{Spend2: {MSS5723: Solar}
{Spend3: {MSS5723: Solar}
{Spend4: {MSS5723: Solar}
{Spend5: {MSS5723: Solar}
{Choc: {CE: {Fin: {Spend2: {43233d: Ocoser}
.....
{UA02: {Bisc: {EE: {Mkt: {Spend1: {df3432: Smith}
{ffdsdr43: Crow}
{G&C: {NE: {Mkt: {Spend1: {ORY9499: Cre}
.....
So essentially, in this Dict I'm trying to track for each CountryCode, what is the list of LastNames+LandIDs, per Spend category (Spend1,Spend2, etc.) and their attributes (Function, Category, Area).
The DataFrame is not very large (less than 200rows), but it contains almost all types of combinations between Category/Area/Country Code as well as LastNames and their Spend categories (many-to-many).
My challenge is that i'm unable to figure out how to clearly conceptualise the steps i need to do in order to prepare the DataFrame properly for export to Dict....
What i figured so far is that i would need:
a way to slice the contents of the "Country Code" column based on the "," separator: DONE
create new columns based on unique Country Codes, and have 1 in each row where that column code is preset: DONE
set the index of the DataFrame recursively to each of the newly added columns
move into a new DataFrame each rows for each Country Code where there is data
export all the new DataFrames to Dicts, and then merge them
Not sure if steps 3-6 is the best way to go about this though, as i'm having difficulties still to understand how pd.DataFrame.to_dict should be configured for my case (if that's even possible)...
Highly appreciate your help on the coding side, but also in briefly explaining your thought process for each stage.
Here is how far i got on my own..
#keeping track of initial order of columns
initialOrder = list(df.columns.values)
# split the Country Code by ","
CCodeNoCommas= [item for items in df['Country Code'].values for item in items.split(",")]
# add only the UNIQUE Country Codes -via set- as new columns in the DataFrame,
#with NaN for row values
df = pd.concat([df,pd.DataFrame(columns=list(set(CCodeNoCommas)))])
# reordering columns to have the newly added ones at the end
reordered = initialOrder + [c for c in df.columns if c not in initialOrder]
df = df[reordered]
# replace NaN with 1 in the newly added columns (Country Codes), where the same Country code
# exists in the initial column "Country Code"; do this for each row
CCodeUniqueOnly = set(CCodeNoCommas)
for c in CCodeUniqueOnly:
CCodeIsPresent_rowIndex = df.index[df['Country Code'].str.contains(c)]
#print (CCodeIsPresent_rowIndex)
df.loc[CCodeIsPresent_rowIndex, c] = 1
# no clue what do do next ??
If you re-shape your dataframe into the right format, you can use the handy recursive dictionary function from the answer by #DSM to this question. The goal is to get a dataframe where each row contains only one "entry" - a unique combination of the columns you're interested in.
First, you need to split your country code strings into lists:
df['Country Code'] = df['Country Code'].str.split(',')
And then expand those lists into multiple rows (using #RomanPekar's technique from this question):
s = df.apply(lambda x: pd.Series(x['Country Code']),axis=1) \
.stack().reset_index(level=1, drop=True)
s.name = 'Country Code'
df = df.drop('Country Code', axis=1).join(s).reset_index(drop=True)
Then you can reshape the Spend* columns into rows, where there's a row for each Spend* column where the value is not nan.
spend_cols = ['Spend1', 'Spend2', 'Spend3', 'Spend4', 'Spend5']
df = df.groupby('Country Code') \
.apply(lambda g: g.join(pd.DataFrame(g[spend_cols].stack()) \
.reset_index(level=1)['level_1'])) \
.reset_index(drop=True)
Now you have a dataframe where each level in your nested dictionary is its own column. So you can use this recursive dictionary function:
def recur_dictify(frame):
if len(frame.columns) == 1:
if frame.values.size == 1: return frame.values[0][0]
return frame.values.squeeze()
grouped = frame.groupby(frame.columns[0])
d = {k: recur_dictify(g.ix[:,1:]) for k,g in grouped}
return d
And apply it only to the columns you want to produce the nested dictionary, listed in the order in which they should nest:
cols = ['Country Code', 'Category', 'Area', 'Function', 'level_1', 'LanID', 'Last Name']
d = recur_dictify(df[cols])
That should produce your desired result.
All in one piece:
df['Country Code'] = df['Country Code'].str.split(',')
s = df.apply(lambda x: pd.Series(x['Country Code']),axis=1) \
.stack().reset_index(level=1, drop=True)
s.name = 'Country Code'
df = df.drop('Country Code', axis=1).join(s).reset_index(drop=True)
spend_cols = ['Spend1', 'Spend2', 'Spend3', 'Spend4', 'Spend5']
df = df.groupby('Country Code') \
.apply(lambda g: g.join(pd.DataFrame(g[spend_cols].stack()) \
.reset_index(level=1)['level_1'])) \
.reset_index(drop=True)
def recur_dictify(frame):
if len(frame.columns) == 1:
if frame.values.size == 1: return frame.values[0][0]
return frame.values.squeeze()
grouped = frame.groupby(frame.columns[0])
d = {k: recur_dictify(g.ix[:,1:]) for k,g in grouped}
return d
cols = ['Country Code', 'Category', 'Area', 'Function', 'level_1', 'LanID', 'Last Name']
d = recur_dictify(df[cols])