I'm new to python can anyone help me with this.
For example, I have a data frame of
data = pd.DataFrame({'a': [1,1,2,2,2,3], 'b': [12,22,23,34,44,55], 'c'['a','','','','c',''], 'd':['','b','b','a','a','']})
I want to sum a and ignore the different in b
data = ({'a':[1,2,3],'c':['a','c',''],'d':['b','baa','']})
How can I do this?
You Question is bit difficult to under stand but if I guess correctly, this can be solution.
data = {'a': [1,1,2,2,2,3], 'b': [12,22,23,34,44,55], 'c':['a','','','','c',''], 'd':['','b','b','a','a','']}
new_data = {k: list(set(v)) for k, v in data.items()}
{'a': [1, 2, 3],
'b': [34, 12, 44, 55, 22, 23],
'c': ['', 'a', 'c'],
'd': ['', 'a', 'b']}
You need groupby + agg
data.groupby('a').agg({'b':'sum','c' : lambda x : ''.join(x),'d' : lambda x : ''.join(x)}).reset_index()
Out[54]:
a d c b
0 1 b a 34
1 2 baa c 101
2 3 55
Related
I am trying to convert below dataframe to dictionary.
I want to group via column A and take a list of common sequence. for e.g.
Example 1:
n1 v1 v2
2 A C 3
3 A D 4
4 A C 5
5 A D 6
Expected output:
{'A': [{'C':'3','D':'4'},{'C':'5','D':'6'}]}
Example 2:
n1 n2 v1 v2
s1 A C 3
s1 A D 4
s1 A C 5
s1 A D 6
s1 B P 6
s1 B Q 3
Expected Output:
{'s1': {'A': [{'C': 3, 'D': 4}, {'C': 5, 'D': 6}], 'B': {'P': 6, 'Q': 3}}}
so basically C and D are repeating as a sequence,I want to club C and D in one dictionary and make a list of if it occurs multiple times.
Please note (Currently I am using below code):
def recur_dictify(frame):
if len(frame.columns) == 1:
if frame.values.size == 1: return frame.values[0][0]
return frame.values.squeeze()
grouped = frame.groupby(frame.columns[0])
d = {k: recur_dictify(g.iloc[:,1:]) for k,g in grouped}
return d
This returns :
{s1 : {'A': {'C': array(['3', '5'], dtype=object), 'D': array(['4', '6'], dtype=object),'B':{'E':'5','F':'6'}}
Also, there can be another series of s2 having E,F,G,E,F,G repeating and some X and Y having single values
Lets create a function dictify which create a dictionary with top level keys from name column and club's the repeating occurrences of values in column v1 into different sub dictionaries:
from collections import defaultdict
def dictify(df):
dct = defaultdict(list)
for k, g in df.groupby(['n1', df.groupby(['n1', 'v1']).cumcount()]):
dct[k[0]].append(dict([*g[['v1', 'v2']].values]))
return dict(dct)
dictify(df)
{'A': [{'C': 3, 'D': 4}, {'C': 5, 'D': 6}]}
UPDATE:
In case there can be variable number of primary grouping keys i.e. [n1, n2, ...] we can use a more generic method:
def update(dct, keys, val):
k, *_ = keys
dct[k] = update(dct.get(k, {}), _, val) if _ \
else [*np.hstack([dct[k], [val]])] if k in dct else val
return dct
def dictify(df, keys):
dct = dict()
for k, g1 in df.groupby(keys):
for _, g2 in g1.groupby(g1.groupby('v1').cumcount()):
update(dct, k, dict([*g2[['v1', 'v2']].values]))
return dict(dct)
dictify(df, ['n1', 'n2'])
{'s1': {'A': [{'C': 3, 'D': 4}, {'C': 5, 'D': 6}], 'B': {'P': 6, 'Q': 3}}}
Here is a simple one line statement that solves your problem:
def df_to_dict(df):
return {name: [dict(x.to_dict('split')['data'])
for _, x in d.drop('name', 1).groupby(d.index // 2)]
for name, d in df.groupby('name')}
Here is an example:
df = pd.DataFrame({'name': ['A'] * 4,
'v1': ['C', 'D'] * 2,
'v2': [3, 4, 5, 6]})
print(df_to_dict(df))
Output:
{'A': [{'C': 3, 'D': 4}, {'C': 5, 'D': 6}]}
I have a data set like this:
data = ({'A': ['John', 'Dan', 'Tom', 'Mary'], 'B': [1, 3, 4, 5], 'C': ['Tom', 'Mary', 'Dan', 'Mike'], 'D': [3, 4, 6, 12]})
Where Dan in A has the corresponding number 3 in B, and where Dan in C has the corresponding number 6 in D.
I would like to create 2 new columns, one with the name Dan and the other with 9 (3+6).
Desired Output
data = ({'A': ['John', 'Dan', 'Tom', 'Mary'], 'B': [1, 3, 4, 5], 'C': ['Tom', 'Mary', 'Dan', 'Mike'], 'D': [3, 4, 6, 12], 'E': ['Dan', 'Tom', 'Mary'], 'F': [9, 7, 9], 'G': ['John', 'Mike'], 'H': [1, 12]})
For names, John and Mike 2 different columns with their values unchanged.
I have tried using some for loops and .loc, but I am not anywhere close.
Thanks!
df = data[['A','B']]
_df = data[['C','D']]
_df.columns = ['A','B']
df = pd.concat([df,_df]).groupby(['A'],as_index=False)['B'].sum().reset_index()
df.columns = ['E','F']
data = data.merge(df,how='left',left_on=['A'],right_on=['E'])
Although you can join on column C too, that's something you have choose. Or alternatively if you want just columns E & F, then skip the last line!
You can try this:
import pandas as pd
data = {'A': ['John', 'Dan', 'Tom', 'Mary'], 'B': [1, 3, 4, 5], 'C': ['Tom', 'Mary', 'Dan', 'Mike'], 'D': [3, 4, 6, 12]}
df=pd.DataFrame(data)
df=df.rename(columns={"C": "A", "D": "B"})
df=df.stack().reset_index(0, drop=True).rename_axis("index").reset_index()
df=df.pivot(index=df.index//2, columns="index")
df.columns=map(lambda x: x[1], df.columns)
df=df.groupby("A", as_index=False).sum()
Outputs:
>>> df
A B
0 Dan 9
1 John 1
2 Mary 9
3 Mike 12
4 Tom 7
I am trying to create a dictionary from a DataFrame where the key sometimes has multiple values.
For example:
df
ID value
A 10
B 45
C 20
C 30
D 20
E 10
E 70
E 110
F 20
And I want the dictionary to look like:
dic = {'A': 10,
'B': 45,
'C':[20,30],
'D': 20,
'E': [10,70,110],
'F': 20}
I tried using the following code:
dic=df.set_index('ID').T.to_dict('list')
But it returned a dictionary with only one value per ID:
{'A': 10,
'B': 45,
'C': 30,
'D': 20,
'E': 110,
'F': 20}
I'm assuming the right way to go about it is with some kind of loop appending to an empty dictionary but I'm not sure what the proper syntax would be.
My actual DataFrame is much longer than this, so what would I use to convert the DataFrame to the dictionary?
Thanks!
example dataframe:
df = pd.DataFrame({'ID':['A', 'B', 'B'], 'value': [1,2,3]})
df_tmp = df.groupby('ID')['value'].apply(list).reset_index()
dict(zip(df_tmp['ID'], df_tmp['value']))
outputs
{'A': [1], 'B': [2, 3]}
Let us Suppose, I have created 3 lists and I want to create a dictionary for it. e.g.
a= ['A', 'B', 'C', 'D']
b =[1, 2, 3, 4]
c = [9, 8, 7, 6]
Now What I want is to create a dictionary like this:
{'A':{'1' :'9'} , 'B':{'2':'8'}, 'C':{'3':'7'} , 'D':{'4':'6'}}
is it possible, Can Someone Help me on this?
You can create the dictionary from zip-ed lists and convert the int values to strings - if I understood your question proper
dct = {x: {str(y): str(z)} for x, y, z in zip(a,b,c)}
Output:
{'A': {'1': '9'}, 'C': {'3': '7'}, 'B': {'2': '8'}, 'D': {'4': '6'}}
You can also use map() here:
a = ['A', 'B', 'C', 'D']
b = [1, 2, 3, 4]
c = [9, 8, 7, 6]
dct = dict(map(lambda x, y, z : (x, {str(y): str(z)}), a, b, c))
print(dct)
Which outputs:
{'A': {'1': '9'}, 'B': {'2': '8'}, 'C': {'3': '7'}, 'D': {'4': '6'}}
{ a[x]: {b[x]: c[x]} for x in range(len(a))}
or if you really mean it:
{ a[x]: {str(b[x]): str(c[x])} for x in range(len(a))}
a = ['A', 'B', 'C', 'D'] # don't forget the quotation marks
b = [1, 2, 3, 4]
c = [9, 8, 7, 6]
res = dict()
for i, index_a in enumerate(a):
res[index_a] = {str(b[i]): c[i]}
Edit: Alternatively with list comprehension (mainly for the voters in here, as it's advanced python and harder to understand):
res = dict((a[i], {str(b[i]): c[i]}) for i in range(len(a)))
You can try this:
a= ['A', 'B', 'C', 'D']
b =[1, 2, 3, 4]
c = [9, 8, 7, 6]
new_data = dict([[a, dict([map(str, i)])] for a, i in zip(a, zip(b, c))])
Output:
{'A': {'1': '9'}, 'C': {'3': '7'}, 'B': {'2': '8'}, 'D': {'4': '6'}}
Or
new_data = dict(zip(a, map(lambda x:dict([x]), zip(map(str, b), map(str, c)))))
Assuming what you want is to have a be keys in the outer dictionary, and b and c the key and value element of the inner dicts:
d = {k: {x: y} for k, x, y in zip(a, b, c)}
Update:
However, in your example x and y are strings, so if that's what you want:
d = {k: {str(x): str(y)} for k, x, y in zip(a, b, c)}
Are you looking for something like this ?
a= ['A', 'B', 'C', 'D']
b =[1, 2, 3, 4]
c = [9, 8, 7, 6]
new_dict={}
set(map(lambda x,y,z:(new_dict.__setitem__(x,{y,z})),a,b,c))
print(new_dict)
output:
{'D': {4, 6}, 'A': {9, 1}, 'B': {8, 2}, 'C': {3, 7}}
I have a list of dictionaries in "my_list" as follows:
my_list=[{'Id': '100', 'A': [val1, val2], 'B': [val3, val4], 'C': [val5,val6]},
{'Id': '200', 'A': [val7, val8], 'B': [val9, val10], 'C':
[val11,val12],
{'Id': '300', 'A': [val13, val14], 'B': [val15, val16], 'C':
[val17,val18]}]
I want to write this list into a CSV file as follows:
ID, A, AA, B, BB, C, CC
100, val1, val2, val3, val4, val5, val6
200, val7, val8, val9, val10, val11, val12
300, val13, val14, val15, val16, val17, val18
Does anyone know how can I handle it?
Tablib should do the trick
I leave here the example on their front page (which you can adapt to the .csv format) :
>>> data = tablib.Dataset(headers=['First Name', 'Last Name', 'Age'])
>>> for i in [('Kenneth', 'Reitz', 22), ('Bessie', 'Monke', 21)]:
... data.append(i)
>>> print(data.export('json'))
[{"Last Name": "Reitz", "First Name": "Kenneth", "Age": 22}, {"Last Name": "Monke", "First Name": "Bessie", "Age": 21}]
>>> print(data.export('yaml'))
- {Age: 22, First Name: Kenneth, Last Name: Reitz}
- {Age: 21, First Name: Bessie, Last Name: Monke}
>>> data.export('xlsx')
<censored binary data>
>>> data.export('df')
First Name Last Name Age
0 Kenneth Reitz 22
1 Bessie Monke 21
You could do this... (replacing print with a csv writerow as appropriate)
print(['ID', 'A', 'AA', 'B', 'BB', 'C', 'CC'])
for row in my_list:
out_row = []
out_row.append(row['Id'])
for v in row['A']:
out_row.append(v)
for v in row['B']:
out_row.append(v)
for v in row['C']:
out_row.append(v)
print(out_row)
You can use pandas to do the trick:
my_list = [{'Id': '100', 'A': [val1, val2], 'B': [val3, val4], 'C': [val5, val6]},
{'Id': '200', 'A': [val7, val8], 'B': [val9, val10], 'C': [val11, val12]},
{'Id': '300', 'A': [val13, val14], 'B': [val15, val16], 'C': [val17, val18]}]
index = ['Id', 'A', 'AA', 'B', 'BB', 'C', 'CC']
df = pd.DataFrame(data=my_list)
for letter in ['A', 'B', 'C']:
first = []
second = []
for a in df[letter].values.tolist():
first.append(a[0])
second.append(a[1])
df[letter] = first
df[letter * 2] = second
df = df.reindex_axis(index, axis=1)
df.to_csv('out.csv')
This produces the following output as dataframe:
Id A AA B BB C CC
0 100 1 2 3 4 5 6
1 200 7 8 9 10 11 12
2 300 13 14 15 16 17 18
and this is the out.csv-file:
,Id,A,AA,B,BB,C,CC
0,100,1,2,3,4,5,6
1,200,7,8,9,10,11,12
2,300,13,14,15,16,17,18
See pandas documentation about the csv-feature (csv).
Write DataFrame to a comma-separated values (csv) file