This question is further part of this. So I added it as new question
If my dataframe B would be something like:
ID category words bucket_id
1 audi a4, a6 94
2 bugatti veyron, chiron 86
3 mercedez s-class, e-class 79
4 dslr canon, nikon 69
5 apple iphone,macbook,ipod 51
6 finance sales,loans,sales price 12
7 politics trump, election, votes 77
8 entertainment spiderman,thor, ironmen 88
9 music beiber, rihana,drake 14
........ ..............
......... .........
I want mapped category along with its corresponding column ID as dictionary. Something like:-
{'id': 2, 'term': 'bugatti', 'bucket_id': 86}
{'id': 3, 'term': 'mercedez', 'bucket_id': 79}
{'id': 6, 'term': 'finance', 'bucket_id': 12}
{'id': 7, 'term': 'politics', 'bucket_id': 77}
{'id': 9, 'term': 'music', 'bucket_id': 14}
edit
I just want to map keywords with exact match in between two commas in column words not in between strings or along with any other words.
EDIT:
df = pd.DataFrame({'ID': [1, 2, 3],
'category': ['bugatti', 'entertainment', 'mercedez'],
'words': ['veyron,chiron', 'spiderman,thor,ironmen',
's-class,e-class,s-class'],
'bucket_id': [94, 86, 79]})
print (df)
ID category words bucket_id
0 1 bugatti veyron,chiron 94
1 2 entertainment spiderman,thor,ironmen 86
2 3 mercedez s-class,e-class,s-class 79
A = ['veyron','s-class','derman']
idx = [i for i, x in enumerate(df['words']) for y in x.split(',') if y in A]
print (idx)
[0, 2, 2]
L = (df.loc[idx, ['ID','category','bucket_id']]
.rename(columns={'category':'term'})
.to_dict(orient='r'))
print (L)
[{'ID': 1, 'term': 'bugatti', 'bucket_id': 94},
{'ID': 3, 'term': 'mercedez', 'bucket_id': 79},
{'ID': 3, 'term': 'mercedez', 'bucket_id': 79}]
Related
I have the following list:
a = [{'cluster_id': 0, 'points': [{'id': 1, 'name': 'Alice', 'lat': 52.523955, 'lon': 13.442362}, {'id': 2, 'name': 'Bob', 'lat': 52.526659, 'lon': 13.448097}]}, {'cluster_id': 0, 'points': [{'id': 1, 'name': 'Alice', 'lat': 52.523955, 'lon': 13.442362}, {'id': 2, 'name': 'Bob', 'lat': 52.526659, 'lon': 13.448097}]}, {'cluster_id': 1, 'points': [{'id': 3, 'name': 'Carol', 'lat': 52.525626, 'lon': 13.419246}, {'id': 4, 'name': 'Dan', 'lat': 52.52443559865125, 'lon': 13.41261723049818}]}, {'cluster_id': 1, 'points': [{'id': 3, 'name': 'Carol', 'lat': 52.525626, 'lon': 13.419246}, {'id': 4, 'name': 'Dan', 'lat': 52.52443559865125, 'lon': 13.41261723049818}]}]
I would like to convert this list into a dataframe with the following columns:
cluster_id
id
name
lat
lon
to save it as a csv. I tried a couple of solutions which I found like:
pd.concat([pd.DataFrame(l) for l in a],axis=1).T
But it didn't work as I expected.
What is the mistake I am doing?
Thanks
You can use pd.json_normalize
df = pd.json_normalize(a, record_path='points', meta='cluster_id')
print(df)
id name lat lon cluster_id
0 1 Alice 52.523955 13.442362 0
1 2 Bob 52.526659 13.448097 0
2 1 Alice 52.523955 13.442362 0
3 2 Bob 52.526659 13.448097 0
4 3 Carol 52.525626 13.419246 1
5 4 Dan 52.524436 13.412617 1
6 3 Carol 52.525626 13.419246 1
7 4 Dan 52.524436 13.412617 1
A simple data-frame as below on the left and I want to achieve the right:
I use:
import pandas as pd
data = {'name': ['Jason', 'Molly', 'Tina', 'Jason', 'Amy', 'Jason', 'River', 'Kate', 'David', 'Jack', 'David'],
'Department' : ['Sales', 'Operation', 'Operation', 'Sales', 'Operation', 'Sales', 'Operation', 'Sales', 'Finance', 'Finance', 'Finance'],
'Weight lost': [4, 4, 1, 4, 4, 4, 7, 2, 8, 1, 8],
'Point earned': [2, 2, 1, 2, 2, 2, 4, 1, 4, 1, 4]}
df = pd.DataFrame(data)
final = df.pivot_table(index=['Department','name'], values='Weight lost', aggfunc='count', fill_value=0).stack(dropna=False).reset_index(name='Weight_lost_count')
del final['level_2']
del final['Weight_lost_count']
print (final)
It seems non-necessary steps in the 'final' line.
What would be the better way to write it?
Try groupby with head
out = df.groupby(['Department','name']).head(1)
Isn't this just drop_duplicates:
df[['Department','name']].drop_duplicates()
Output:
Department name
0 Sales Jason
1 Operation Molly
2 Operation Tina
4 Operation Amy
6 Operation River
7 Sales Kate
8 Finance David
9 Finance Jack
And to exactly match the final:
(df[['Department','name']].drop_duplicates()
.sort_values(by=['Department','name'])
)
Output:
Department name
8 Finance David
9 Finance Jack
4 Operation Amy
1 Operation Molly
6 Operation River
2 Operation Tina
0 Sales Jason
7 Sales Kate
I need to iterate (vector operation not possible) over a very large dataframe (10 million x 70). df.iterrows and directly accessing the dataframe using df.loc[i, col] is way too slow. In the past I would first turn the dataframe to a dictionary of dictionaries which allowws me to iterate very quickly. However, this method takes up a lot of memory and is not feasible anymore for my current data.
I need to sacrifice some lookup speed to save memory. What is the best way to do this? Would turning my dataframe into an dictionary of row series {index: Series} work?
Do you mean something like this:
In [1112]: pd.DataFrame(df.reset_index().to_dict(orient='records'))
Out[1112]:
index id block check
0 0 6 25 yes
1 1 6 32 no
2 2 9 18 yes
3 3 12 17 no
4 4 15 23 yes
5 5 15 11 yes
6 6 15 15 yes
In [1113]: df.reset_index().to_dict(orient='records')
Out[1113]:
[{'index': 0, 'id': 6, 'block': 25, 'check': 'yes'},
{'index': 1, 'id': 6, 'block': 32, 'check': 'no'},
{'index': 2, 'id': 9, 'block': 18, 'check': 'yes'},
{'index': 3, 'id': 12, 'block': 17, 'check': 'no'},
{'index': 4, 'id': 15, 'block': 23, 'check': 'yes'},
{'index': 5, 'id': 15, 'block': 11, 'check': 'yes'},
{'index': 6, 'id': 15, 'block': 15, 'check': 'yes'}]
you could just do this (thanks #oppressionslayer for the example df):
df
id block check
0 6 25 yes
1 6 32 no
2 9 18 yes
3 12 17 no
4 15 23 yes
5 15 11 yes
6 15 15 yes
df.to_dict('index')
output:
{0: {'id': 6, 'block': 25, 'check': 'yes'}, 1: {'id': 6, 'block': 32, 'check': 'no'}, 2: {'id': 9, 'block': 18, 'check': 'yes'}, 3: {'id': 12, 'block': 17, 'check': 'no'}, 4: {'id': 15, 'block': 23, 'check': 'yes'}, 5: {'id': 15, 'block': 11, 'check': 'yes'}, 6: {'id': 15, 'block': 15, 'check': 'yes'}}
if you specifically (for some reason) want it to be {index:series} you could do this, which can be accessed the same way (i.e. df_name[i][col])
df.T.to_dict('series')
One of the columns of my pandas dataframe looks like this
>> df
Item
0 [{"id":A,"value":20},{"id":B,"value":30}]
1 [{"id":A,"value":20},{"id":C,"value":50}]
2 [{"id":A,"value":20},{"id":B,"value":30},{"id":C,"value":40}]
I want to expand it as
A B C
0 20 30 NaN
1 20 NaN 50
2 20 30 40
I tried
dfx = pd.DataFrame()
for i in range(df.shape[0]):
df1 = pd.DataFrame(df.item[i]).T
header = df1.iloc[0]
df1 = df1[1:]
df1 = df1.rename(columns = header)
dfx = dfx.append(df1)
But this takes a lot of time as my data is huge. What is the best way to do this?
My original json data looks like this:
{
{
'_id': '5b1284e0b840a768f5545ef6',
'device': '0035sdf121',
'customerId': '38',
'variantId': '31',
'timeStamp': datetime.datetime(2018, 6, 2, 11, 50, 11),
'item': [{'id': A, 'value': 20},
{'id': B, 'value': 30},
{'id': C, 'value': 50}
},
{
'_id': '5b1284e0b840a768f5545ef6',
'device': '0035sdf121',
'customerId': '38',
'variantId': '31',
'timeStamp': datetime.datetime(2018, 6, 2, 11, 50, 11),
'item': [{'id': A, 'value': 20},
{'id': B, 'value': 30},
{'id': C, 'value': 50}
},
.............
}
I agree with #JeffH, you should really look at how you are constructing the DataFrame.
Assuming you are getting this from somewhere out of your control then you can convert to the your desired DataFrame with:
In []:
pd.DataFrame(df['Item'].apply(lambda r: {d['id']: d['value'] for d in r}).values.tolist())
Out[]:
A B C
0 20 30.0 NaN
1 20 NaN 50.0
2 20 30.0 40.0
This question already has answers here:
Comparing List against Dict - return key if value matches list
(2 answers)
Closed 6 months ago.
I have dictionaries inside list as:-
L= [{'id': 3, 'term': 'bugatti', 'bucket_id': 'ad_3'},
{'id': 4, 'term': 'mercedez', 'bucket_id': 'ad_4'},
{'id': 8, 'term': 'entertainment', 'bucket_id': 'ad_8'},
{'id': 8, 'term': 'entertainment', 'bucket_id': 'ad_8'},
{'id': 9, 'term': 'music', 'bucket_id': 'ad_9'}]
and another list as:-
words=['bugatti', 'entertainment', 'music','politics']
All I want to map elements of list words with key term and wants to get corresponding dictionary. Output expected as:
new_list= [{'id': 3, 'term': 'bugatti', 'bucket_id': 'ad_3'},
{'id': 8, 'term': 'entertainment', 'bucket_id': 'ad_8'},
{'id': 8, 'term': 'entertainment', 'bucket_id': 'ad_8'},
{'id': 9, 'term': 'music', 'bucket_id': 'ad_9'}]
What I have tried as:
for d in L:
for k,v in d.items():
for w in words:
if v==w:
print (k,v)
gives me only:
term bugatti
term entertainment
term entertainemnt
term music
Using a list comprehension.
Ex:
L= [{'id': 3, 'term': 'bugatti', 'bucket_id': 'ad_3'},
{'id': 4, 'term': 'mercedez', 'bucket_id': 'ad_4'},
{'id': 8, 'term': 'entertainment', 'bucket_id': 'ad_8'},
{'id': 8, 'term': 'entertainment', 'bucket_id': 'ad_8'},
{'id': 9, 'term': 'music', 'bucket_id': 'ad_9'}]
words=['bugatti', 'entertainment', 'music','politics']
print([i for i in L if i["term"] in words])
Output:
[{'bucket_id': 'ad_3', 'id': 3, 'term': 'bugatti'},
{'bucket_id': 'ad_8', 'id': 8, 'term': 'entertainment'},
{'bucket_id': 'ad_8', 'id': 8, 'term': 'entertainment'},
{'bucket_id': 'ad_9', 'id': 9, 'term': 'music'}]
You can use list comprehension but I included the full loop so you can see the logic more clearly
new_l = [i for i in l if i['term'] in words]
Full loop
new_l = []
for i in l:
if i['term'] in words:
new_l.append(i)
print [dict for dict in L if dict["term"] in words]
The problem is that you are printing (k,v) which is just the key and the value of one dictionary entry. If you want to have the whole dictionary you have to put the whole dictionary in the print Statement.
for d in L:
for k,v in d.items():
for w in words:
if v==w:
print (d)