Create different dataframes from a main dataframe and order by frequency

Create different dataframes from a main dataframe and order by frequency - python

I have a dataframe which looks like this
a b name .....
1 1 abc
2 2 xyz
3 3 abc
4 4 dfg
Now I need to create multiple dataframes based on the frequency of the names like df_abc should have all the data for name "abc" and so on. Tried using for loop but I'm new to python and not able to solve it. Thanks!
df_abc
a b name
1 1 abc
3 3 abc

You can use .groupby and list which yields a list of tuples. Using dict-comprehension you can access these dataframes with my_dict["abc"].
df = pd.DataFrame(
{"a": [1, 2, 3, 4], "b": [1, 2, 3, 4], "name": ["abc", "xyz", "abc", "dfg"]}
)
my_dict={name:df for name, df in list(df.groupby("name")) }
for val, df_val in my_dict.items():
print(f"df:{df_val}\n")

You could create a dictionary of dataframes which holds the different sets of data filtered with the unique values in 'name' column. Then you can reference each dataframe as you would reference a dictionary:
See below an example:
import pandas as pd
from io import StringIO
d = """
a b name
1 1 abc
2 2 xyz
3 3 abc
4 4 dfg
"""
df=pd.read_csv(StringIO(d),sep=" ")[['a','b','name']]
dfs = {}
for item in df['name']:
dfs[item] = df.loc[df['name'] == item]
>>> dfs.keys()
dict_keys(['abc', 'xyz', 'dfg'])
>>> dfs['abc']
a b name
0 1 1 abc
2 3 3 abc
>>> dfs['xyz']
a b name
1 2 2 xyz
>>> dfs['dfg']
a b name
3 4 4 dfg

Related

how to transform a dict of lists to a dataframe in python?

I have a dict in python like this:
d = {"a": [1,2,3], "b": [4,5,6]}
I want to transform in a dataframe like this:
letter
number
a
1
a
2
a
3
b
4
b
5
b
6
i have tried this code:
df = pd.DataFrame.from_dict(vulnerabilidade, orient = 'index').T
but this gave me:
a
1
2
3
b
4
5
6

You can always read your data in as you already have and then .melt it:
When passed no id_vars or value_vars, melt turns each of your columns into their own rows.
import pandas as pd
d = {"a": [1,2,3], "b": [4,5,6]}
out = pd.DataFrame(d).melt(var_name='letter', value_name='value')
print(out)
letter value
0 a 1
1 a 2
2 a 3
3 b 4
4 b 5
5 b 6

To use 'letter' and 'number' as column labels you could use:
a2 = [[key, val] for key, x in d.items() for val in x]
dict2 = pd.DataFrame(a2, columns = ['letter', 'number'])
which gives
letter number
0 a 1
1 a 2
2 a 3
3 b 4
4 b 5
5 b 6

Yet another possible solution:
(pd.Series(d, index=d.keys(), name='numbers')
.rename_axis('letters').reset_index()
.explode('numbers', ignore_index=True))
Output:
letters numbers
0 a 1
1 a 2
2 a 3
3 b 4
4 b 5
5 b 6

This will yield what you want (there might be a simpler way though):
import pandas as pd
my_dict = {"a": [1,2,3], "b": [4,5,6]}
my_list = [[key, val] for key in my_dict for val in my_dict[key] ]
df = pd.DataFrame(my_list, columns=['letter','number'])
df
# Out[106]:
# letter number
# 0 a 1
# 1 a 2
# 2 a 3
# 3 b 4
# 4 b 5
# 5 b 6

updating and including two Pandas' DataFrames

I would like to update the Pandas' DataFrame by summation, and if the ID does not exist in the merged DataFrame, then I would like to include the ID's corresponding row. For example, let's say there are two DataFrames like this:
import pandas as pd
d1 = pd.DataFrame({'ID': ["A", "B", "C", "D"], "value": [2, 3, 4, 5]})
d2 = pd.DataFrame({'ID': ["B", "D", "E"], "value": [1, 3, 2]})
Then, the final output that I would like to produce is as follows:
ID value
0 A 2
1 B 4
2 C 4
3 D 8
4 E 2
Do you have any ideas on this? I have tried to do it with update or concat functions, but this is not the way for producing the results that I want to produce. Thanks in advance.

Use concat and aggregate sum:
df = pd.concat([d1, d2]).groupby('ID', as_index=False).sum()
print (df)
ID value
0 A 2
1 B 4
2 C 4
3 D 8
4 E 2
Another idea if unique ID in both DataFrames with convert ID to index and use DataFrame.add:
df = d1.set_index('ID').add(d2.set_index('ID'), fill_value=0).reset_index()
print (df)
ID value
0 A 2.0
1 B 4.0
2 C 4.0
3 D 8.0
4 E 2.0

Pandas get data from a list of column names

I have a pandas dataframe: df and list of column names: columns like so:
df = pd.DataFrame({
'A': ['b','b','c','d'],
'C': ['b1','b2','c1','d2'],
'B': list(range(4))})
columns = ['A','B']
Now I want to get all the data from these columns of the dataframe in one single series like so:
b
0
b
1
c
2
d
4
This is what I tried:
srs = pd.Series()
srs.append(df[column].values for column in columns)
But it is throwing this error:
TypeError: cannot concatenate object of type '<class 'generator'>';
only Series and DataFrame objs are valid
How can I fix this issue?

I think you can use numpy.ravel:
srs = pd.Series(np.ravel(df[columns]))
print (srs)
0 b
1 0
2 b
3 1
4 c
5 2
6 d
7 3
dtype: object
Or DataFrame.stack with Series.reset_index and drop=True:
srs = df[columns].stack().reset_index(drop=True)
If order should be changed is possible use DataFrame.melt:
srs = df[columns].melt()['value']
print (srs)
0 b
1 b
2 c
3 d
4 0
5 1
6 2
7 3
Name: value, dtype: object

You could do:
from itertools import chain
import pandas as pd
df = pd.DataFrame({
'A': ['b','b','c','d'],
'C': ['b1','b2','c1','d2'],
'B': list(range(4))})
columns = ['A','B']
res = pd.Series(chain.from_iterable(df[columns].to_numpy()))
print(res)
Output
0 b
1 0
2 b
3 1
4 c
5 2
6 d
7 3
dtype: object

Creating multiple dataframes from a dictionary in a loop

I have a dictionary like the below
d = {'a':'1,2,3','b':'3,4,5,6'}
I want to create dataframes from it in a loop, such as
a = 1,2,3
b = 3,4,5,6
Creating a single dataframe that can reference dictionary keys such as df['a'] does not work for what I am trying to achieve. Any suggestions?

Try this to get a list of dataframes:
>>> import pandas as pd
>>> import numpy as np
>>> dfs = [pd.DataFrame(np.array(b.split(',')), columns=list(a)) for a,b in d.items()]
gives the following output
>>> dfs[0]
a
0 1
1 2
2 3
>>> dfs[1]
b
0 3
1 4
2 5
3 6

To convert your dictionary into a list of DataFrames, run:
lst = [ pd.Series(v.split(','), name=k).to_frame()
for k, v in d.items() ]
Then, for your sample data, lst[0] contains:
a
0 1
1 2
2 3
and lst[1]:
b
0 3
1 4
2 5
3 6

Hope this helps:
dfs=[]
for key, value in d.items():
df = pd.DataFrame.from_dict((list(filter(None, value))))
dfs.append(df)

grouping rows in a list of lists in pandas

I have a dataframe that looks like this:
ID Description
1 A
1 B
1 C
2 A
2 C
3 A
I would like to group by the ID column and get the description as a list of list like this:
ID Description
1 [["A"],["B"],["C"]]
2 [["A"],["C"]]
3 [["A"]]
The df.groupby('ID')['Description'].apply(list) but this create only the "first level" of lists.

This is slightly different to #jezrael in that the listifying of strings is done via map. In addition call reset_index() adds "Description" explicitly to output.
import pandas as pd
df = pd.DataFrame([[1, 'A'], [1, 'B'], [1, 'C'], [2, 'A'], [2, 'C'], [3, 'A']], columns=['ID', 'Description'])
df.groupby('ID')['Description'].apply(list).apply(lambda x: list(map(list, x))).reset_index()
# ID Description
# 1 [[A], [B], [C]]
# 2 [[A], [C]]
# 3 [[A]]

You need create inner lists:
print (df)
ID Description
0 1 Aas
1 1 B
2 1 C
3 2 A
4 2 C
5 3 A
df = df['Description'].apply(lambda x: [x]).groupby(df['ID']).apply(list).reset_index()
Another solution similar like #jp_data_analysis with one apply:
df = df.groupby('ID')['Description'].apply(lambda x: [[y] for y in x]).reset_index()
And pure python solution:
a = list(zip(df['ID'], df['Description']))
d = {}
for k, v in a:
d.setdefault(k, []).append([v])
df = pd.DataFrame({'ID':list(d.keys()), 'Description':list(d.values())},
columns=['ID','Description'])
print (df)
ID Description
0 1 [[Aas], [B], [C]]
1 2 [[A], [C]]
2 3 [[A]]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create different dataframes from a main dataframe and order by frequency - python

Related

how to transform a dict of lists to a dataframe in python?

updating and including two Pandas' DataFrames

Pandas get data from a list of column names

Creating multiple dataframes from a dictionary in a loop

grouping rows in a list of lists in pandas

Categories

Resources