Create mutliple dataframes from list of dictonaries [duplicate] - python

I want to split the following dataframe based on column ZZ
df =
N0_YLDF ZZ MAT
0 6.286333 2 11.669069
1 6.317000 6 11.669069
2 6.324889 6 11.516454
3 6.320667 5 11.516454
4 6.325556 5 11.516454
5 6.359000 6 11.516454
6 6.359000 6 11.516454
7 6.361111 7 11.516454
8 6.360778 7 11.516454
9 6.361111 6 11.516454
As output, I want a new DataFrame with the N0_YLDF column split into 4, one new column for each unique value of ZZ. How do I go about this? I can do groupby, but do not know what to do with the grouped object.

gb = df.groupby('ZZ')
[gb.get_group(x) for x in gb.groups]

There is another alternative as the groupby returns a generator we can simply use a list-comprehension to retrieve the 2nd value (the frame).
dfs = [x for _, x in df.groupby('ZZ')]

In R there is a dataframe method called split. This is for all the R users out there:
def split(df, group):
gb = df.groupby(group)
return [gb.get_group(x) for x in gb.groups]

Store them in a dict, which allows you access to the group DataFrames based on the group keys.
d = dict(tuple(df.groupby('ZZ')))
d[6]
# N0_YLDF ZZ MAT
#1 6.317000 6 11.669069
#2 6.324889 6 11.516454
#5 6.359000 6 11.516454
#6 6.359000 6 11.516454
#9 6.361111 6 11.516454
If you need only a subset of the DataFrame, in this case just the 'NO_YLDF' Series, you can modify the dict comprehension.
d = dict((idx, gp['N0_YLDF']) for idx, gp in df.groupby('ZZ'))
d[6]
#1 6.317000
#2 6.324889
#5 6.359000
#6 6.359000
#9 6.361111
#Name: N0_YLDF, dtype: float64

Related

How to make new Dataframes by each column values? (outputs should be Dataframes) [duplicate]

I want to split the following dataframe based on column ZZ
df =
N0_YLDF ZZ MAT
0 6.286333 2 11.669069
1 6.317000 6 11.669069
2 6.324889 6 11.516454
3 6.320667 5 11.516454
4 6.325556 5 11.516454
5 6.359000 6 11.516454
6 6.359000 6 11.516454
7 6.361111 7 11.516454
8 6.360778 7 11.516454
9 6.361111 6 11.516454
As output, I want a new DataFrame with the N0_YLDF column split into 4, one new column for each unique value of ZZ. How do I go about this? I can do groupby, but do not know what to do with the grouped object.
gb = df.groupby('ZZ')
[gb.get_group(x) for x in gb.groups]
There is another alternative as the groupby returns a generator we can simply use a list-comprehension to retrieve the 2nd value (the frame).
dfs = [x for _, x in df.groupby('ZZ')]
In R there is a dataframe method called split. This is for all the R users out there:
def split(df, group):
gb = df.groupby(group)
return [gb.get_group(x) for x in gb.groups]
Store them in a dict, which allows you access to the group DataFrames based on the group keys.
d = dict(tuple(df.groupby('ZZ')))
d[6]
# N0_YLDF ZZ MAT
#1 6.317000 6 11.669069
#2 6.324889 6 11.516454
#5 6.359000 6 11.516454
#6 6.359000 6 11.516454
#9 6.361111 6 11.516454
If you need only a subset of the DataFrame, in this case just the 'NO_YLDF' Series, you can modify the dict comprehension.
d = dict((idx, gp['N0_YLDF']) for idx, gp in df.groupby('ZZ'))
d[6]
#1 6.317000
#2 6.324889
#5 6.359000
#6 6.359000
#9 6.361111
#Name: N0_YLDF, dtype: float64

Create different dataframes according to a column value in Pandas [duplicate]

I want to split the following dataframe based on column ZZ
df =
N0_YLDF ZZ MAT
0 6.286333 2 11.669069
1 6.317000 6 11.669069
2 6.324889 6 11.516454
3 6.320667 5 11.516454
4 6.325556 5 11.516454
5 6.359000 6 11.516454
6 6.359000 6 11.516454
7 6.361111 7 11.516454
8 6.360778 7 11.516454
9 6.361111 6 11.516454
As output, I want a new DataFrame with the N0_YLDF column split into 4, one new column for each unique value of ZZ. How do I go about this? I can do groupby, but do not know what to do with the grouped object.
gb = df.groupby('ZZ')
[gb.get_group(x) for x in gb.groups]
There is another alternative as the groupby returns a generator we can simply use a list-comprehension to retrieve the 2nd value (the frame).
dfs = [x for _, x in df.groupby('ZZ')]
In R there is a dataframe method called split. This is for all the R users out there:
def split(df, group):
gb = df.groupby(group)
return [gb.get_group(x) for x in gb.groups]
Store them in a dict, which allows you access to the group DataFrames based on the group keys.
d = dict(tuple(df.groupby('ZZ')))
d[6]
# N0_YLDF ZZ MAT
#1 6.317000 6 11.669069
#2 6.324889 6 11.516454
#5 6.359000 6 11.516454
#6 6.359000 6 11.516454
#9 6.361111 6 11.516454
If you need only a subset of the DataFrame, in this case just the 'NO_YLDF' Series, you can modify the dict comprehension.
d = dict((idx, gp['N0_YLDF']) for idx, gp in df.groupby('ZZ'))
d[6]
#1 6.317000
#2 6.324889
#5 6.359000
#6 6.359000
#9 6.361111
#Name: N0_YLDF, dtype: float64

How do I transform the series by taking a span of 15 elements and averaging them?

I have this code with pandas (sorry for my english...):
dataset = read_csv('data.csv', header=None)
dataset.plot(figsize=(12,6))
In file data.csv is data like this:
0
0 2481.05700
1 2481.05955
2 2481.06895
3 2481.06770
4 2481.06075
3053 2481.80190
3054 2481.78990
3055 2481.79275
3056 2481.78220
3057 2481.76360
I need to transform the series so that each span of 15 elements is averaged.
I've seen the resample method used for similar tasks. But how to use it correctly for this situation, I don't know.
I think you're looking something like this, to group the dataframe in groups of 15:
dataset.groupby(dataset.index//15).mean()
Example:
df = pd.DataFrame({"A":range(10)})
df
# A
#0 0
#1 1
#2 2
#3 3
#4 4
#5 5
#6 6
#7 7
#8 8
#9 9
df.groupby(df.index//2).mean()
# A
#0 0.5
#1 2.5
#2 4.5
#3 6.5
#4 8.5

Pandas merging columns by reverse compliment string

So I am stuck on how to approach a data manipulation technique in pandas. I have an example dataframe below with a sum of 25 counts in each row.
I would like to merge column names by the reverse compliment sequence.
AA CC GG AT TT
4 7 0 9 5
3 8 5 5 2
8 6 2 8 1
The columns "AA" and "TT" are reverse compliments of each other as are "CC" and "GG"
AA/TT CC/GG AT
9 7 9
5 13 5
9 8 8
How can I match the reverse compliment of a column name and merge it with the name of another column.
Note: I already have a function to find the reverse compliment of a string
I'd suggest just creating a new frame using pd.concat:
new_df = pd.concat([df[['AA', 'TT']].sum(1).rename('AA/TT'),
df[['CC', 'GG']].sum(1).rename('CC/GG'),
df['AT']], axis=1)
>>> new_df
AA/TT CC/GG AT
0 9 7 9
1 5 13 5
2 9 8 8
More generally, you could do it in a list comprehension. Given the reverse compliments:
reverse_compliments = [['AA','TT'], ['CC','GG']]
Find those values in your original dataframe columns that are not in reverse compliments (There might be a better way here, but this works):
reverse_compliments.append(df.columns.difference(
pd.np.array(reverse_compliments)
.flatten()))
And use pd.concat with a list comprehension:
new_df = pd.concat([df[x].sum(1).rename('/'.join(x)) for x in reverse_compliments],
axis=1)
>>> new_df
AA/TT CC/GG AT
0 9 7 9
1 5 13 5
2 9 8 8

Pandas DataFrame stack multiple column values into single column

Assuming the following DataFrame:
key.0 key.1 key.2 topic
1 abc def ghi 8
2 xab xcd xef 9
How can I combine the values of all the key.* columns into a single column 'key', that's associated with the topic value corresponding to the key.* columns? This is the result I want:
topic key
1 8 abc
2 8 def
3 8 ghi
4 9 xab
5 9 xcd
6 9 xef
Note that the number of key.N columns is variable on some external N.
You can melt your dataframe:
>>> keys = [c for c in df if c.startswith('key.')]
>>> pd.melt(df, id_vars='topic', value_vars=keys, value_name='key')
topic variable key
0 8 key.0 abc
1 9 key.0 xab
2 8 key.1 def
3 9 key.1 xcd
4 8 key.2 ghi
5 9 key.2 xef
It also gives you the source of the key.
From v0.20, melt is a first class function of the pd.DataFrame class:
>>> df.melt('topic', value_name='key').drop('variable', 1)
topic key
0 8 abc
1 9 xab
2 8 def
3 9 xcd
4 8 ghi
5 9 xef
After trying various ways, I find the following is more or less intuitive, provided stack's magic is understood:
# keep topic as index, stack other columns 'against' it
stacked = df.set_index('topic').stack()
# set the name of the new series created
df = stacked.reset_index(name='key')
# drop the 'source' level (key.*)
df.drop('level_1', axis=1, inplace=True)
The resulting dataframe is as required:
topic key
0 8 abc
1 8 def
2 8 ghi
3 9 xab
4 9 xcd
5 9 xef
You may want to print intermediary results to understand the process in full. If you don't mind having more columns than needed, the key steps are set_index('topic'), stack() and reset_index(name='key').
OK , cause one of the current answer is mark as duplicated of this question, I will answer here.
By Using wide_to_long
pd.wide_to_long(df, ['key'], 'topic', 'age').reset_index().drop('age',1)
Out[123]:
topic key
0 8 abc
1 9 xab
2 8 def
3 9 xcd
4 8 ghi
5 9 xef

Categories

Resources