Transpose Fields csv, Python (Numpy or Pandas) - python

I need to do somthing like this:
Image
ID 20170101 20170106 20170111
A 0.31 0.1 0.2
B 0.3 0.2 0.1
C 0.11 0.12 0.13
D 0.3 0.3 0.4
ID DATES NDVI_mean
A 20170101 0.31
A 20170106 0.1
A 20170111 0.2
B 20170101 0.3
B 20170106 0.2
B 20170111 0.1
C 20170101 0.11
C 20170106 0.12
C 20170111 0.13
D 20170101 0.3
D 20170106 0.3
D 20170111 0.4
Description:I have one column with "id" and a lot of columns with dates, each column contains values od ndvi. I need to transpose every date to one column named "Dates" and values of that dates in other column named "NDVI_mean", the filed id must has to be repeated as many times as columns of dates we have
I canĀ“t use the tool "transpose fields" of arcpy, only free code.
Please, help me.
Thank you

You can use the melt function:
In [1611]: df
Out[1617]:
ID 20170101 20170106 20170111
0 A 0.31 0.10 0.20
1 B 0.30 0.20 0.10
2 C 0.11 0.12 0.13
3 D 0.30 0.30 0.40
In [1613]: pd.melt(df, id_vars='ID', var_name='Date', value_name="NDVI_mean").sort_values('ID')
Out[1614]:
ID Date NDVI_mean
0 A 20170101 0.31
4 A 20170106 0.10
8 A 20170111 0.20
1 B 20170101 0.30
5 B 20170106 0.20
9 B 20170111 0.10
2 C 20170101 0.11
6 C 20170106 0.12
10 C 20170111 0.13
3 D 20170101 0.30
7 D 20170106 0.30
11 D 20170111 0.40
Let me know if it works.

Related

Reallocate the fraction of weights above threshold to the other weights while maintaining the sum per group

I have a dataframe df1 with Date and ID as index and the Weight. I want to set an upper weight limit (30%) of the weights per date. The weights on each day add up to 100% and if I set an upper weight limit, it is the case that the next biggest weight is then bigger than the weight limit of 30%. Is there a way to account for that without doing several iterations? The remaining weight sum which are not bigger than the max weight add up to: 100% - number of max weights reached.
df1:
Date ID Weight
2023-01-30 A 0.45 <-- over max weight of 30%
2023-01-30 B 0.25
2023-01-30 C 0.15
2023-01-30 D 0.10
2023-01-30 E 0.05
2023-01-31 A 0.55
2023-01-31 B 0.25
2023-01-31 C 0.20
2023-01-31 D 0.00
2023-01-31 E 0.00
df1:
Date ID Weight Weight_upper
2023-01-30 A 0.45 0.300 <-- set to max weight
2023-01-30 B 0.25 0.318 <-- bigger than max weight
2023-01-30 C 0.15 0.191
2023-01-30 D 0.10 0.127 (ex calculation: 0.1 * (1 - 0.3)/(0.25+0.15+0.1+0.05)
2023-01-30 E 0.05 0.060
2023-01-31 A 0.55 0.300
2023-01-31 B 0.25 0.389
2023-01-31 C 0.20 0.311
2023-01-31 D 0.00 0.000
2023-01-31 E 0.00 0.000
For reproducibility:
df = pd.DataFrame({
'Date':['2023-01-30', '2023-01-30', '2023-01-30', '2023-01-30', '2023-01-30', '2023-01-31', '2023-01-31', '2023-01-31', '2023-01-31', '2023-01-31'],
'ID':['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E'],
'Weight':[0.45, 0.25, 0.15, 0.1, 0.05, 0.55, 0.25, 0.2, 0, 0]})
df.set_index('Date')
Many thanks for your help!
The logic is unclear, so I'll assume you want to allocate the fraction of the weights above the desired max (0.3) to the other weights in a way that doesn't cause any other weight to become above threshold.
I would compute the difference to the threshold, then split the values into above/below and allocate the extra weight proportionally to the available space for each weight below threshold:
max_weight = 0.3
df2 = df.assign(diff=df['Weight'].sub(max_weight),
mask=lambda d: d['diff'].gt(0),
above=lambda d: d['diff'].where(d['mask']),
below=lambda d: d['diff'].mask(d['mask']),
)
g = df2.groupby('Date')
df['Weight_upper'] = (df2['below']
.div(g['below'].transform('sum'))
.mul(g['above'].transform('sum'))
.add(df['Weight'])
.fillna(max_weight)
)
print(df)
Output:
Date ID Weight Weight_upper
0 2023-01-30 A 0.45 0.300000
1 2023-01-30 B 0.25 0.261538
2 2023-01-30 C 0.15 0.184615
3 2023-01-30 D 0.10 0.146154
4 2023-01-30 E 0.05 0.107692
5 2023-01-31 A 0.55 0.300000
6 2023-01-31 B 0.25 0.266667
7 2023-01-31 C 0.20 0.233333
8 2023-01-31 D 0.00 0.100000
9 2023-01-31 E 0.00 0.100000
Intermediates:
Date ID Weight diff mask above below Weight_upper
0 2023-01-30 A 0.45 0.15 True 0.15 NaN 0.300000
1 2023-01-30 B 0.25 -0.05 False NaN -0.05 0.261538
2 2023-01-30 C 0.15 -0.15 False NaN -0.15 0.184615
3 2023-01-30 D 0.10 -0.20 False NaN -0.20 0.146154
4 2023-01-30 E 0.05 -0.25 False NaN -0.25 0.107692
5 2023-01-31 A 0.55 0.25 True 0.25 NaN 0.300000
6 2023-01-31 B 0.25 -0.05 False NaN -0.05 0.266667
7 2023-01-31 C 0.20 -0.10 False NaN -0.10 0.233333
8 2023-01-31 D 0.00 -0.30 False NaN -0.30 0.100000
9 2023-01-31 E 0.00 -0.30 False NaN -0.30 0.100000

Convert pandas dateframe from row to column [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 12 months ago.
I have the following data frame:
ID
value
freq
A
a
0.1
A
b
0.12
A
c
0.19
B
a
0.15
B
b
0.2
B
c
0.09
C
a
0.39
C
b
0.15
C
c
0.01
and I would like to get the following
ID
freq_a
freq_b
freq_c
A
0.1
0.12
0.19
B
0.15
0.2
0.09
C
0.39
0.15
0.01
Any ideas how to easily do this?
using pivot:
df.pivot(index='ID', columns='value', values='freq').add_prefix('freq_').reset_index()
output:
>>
value ID freq_a freq_b freq_c
0 A 0.10 0.12 0.19
1 B 0.15 0.20 0.09
2 C 0.39 0.15 0.01
Use pivot_table:
out = df.pivot_table('freq', 'ID', 'value').add_prefix('freq_') \
.rename_axis(columns=None).reset_index()
print(out)
# Output
ID freq_a freq_b freq_c
0 A 0.10 0.12 0.19
1 B 0.15 0.20 0.09
2 C 0.39 0.15 0.01

Converting all object columns to float except for one column

Let's say in the dataframe df there is:
a b c d
ana 31% 26% 29%
bob 52% 45% 9%
cal 11% 6% 23%
dan 29% 12% 8%
where all data types under a, b c and d are objects. I want to convert b, c and d to their decimal forms with:
df.columns = df.columns.str.rstrip('%').astype('float') / 100.0
but I don't know how to not include column a
Let us do update with to_numeric
df.update(df.apply(lambda x : pd.to_numeric(x.str.rstrip('%'),errors='coerce'))/100)
df
Out[128]:
a b c d
0 ana 0.31 0.26 0.29
1 bob 0.52 0.45 0.09
2 cal 0.11 0.06 0.23
3 dan 0.29 0.12 0.08
Use Index.drop for all columns without a with DataFrame.replace, convert to floats and divide by 100:
cols = df.columns.drop('a')
df[cols] = df[cols].replace('%', '', regex=True).astype('float') / 100.0
print (df)
a b c d
0 ana 0.31 0.26 0.29
1 bob 0.52 0.45 0.09
2 cal 0.11 0.06 0.23
3 dan 0.29 0.12 0.08
Or you can convert first column to index by DataFrame.set_index, so all columns without a should be processing:
df = df.set_index('a').replace('%', '', regex=True).astype('float') / 100.0
print (df)
b c d
a
ana 0.31 0.26 0.29
bob 0.52 0.45 0.09
cal 0.11 0.06 0.23
dan 0.29 0.12 0.08

sum column based on level selected in column header

I have a pd.dataframe and it looks like this. Note column names represent level.
df
PC 0 1 2 3
0 PC_1 0.74 0.25 0.1 0.0
1 PC_1 0.72 0.26 0.1 0.1
2 PC_2 0.80 0.18 0.2 0.0
3 PC_3 0.79 0.19 0.1 0.1
I want to create another 4 columns next to the existing columns and shift the values based on the condition assigned.
For example: if level =1, df should look like this:
df
PC 0 1 2 3 0_1 1_1 2_1 3_1
0 PC_1 0.74 0.25 0.1 0.0 0.0 (0.72+0.25) 0.1 0.0
1 PC_1 0.72 0.26 0.1 0.1 0.0 (0.72+0.26) 0.1 0.1
2 PC_2 0.80 0.18 0.2 0.0 0.0 (0.80+0.18) 0.2 0.0
3 PC_3 0.79 0.19 0.1 0.1 0.0 (0.79+0.19) 0.1 0.0
If level=3,
df
PC 0 1 2 3 0_3 1_3 2_3 3_3
0 PC_1 0.74 0.25 0.1 0.0 0.0 0.0 0.0 sum(0.74+0.25+0.1+0.0)
1 PC_1 0.72 0.26 0.1 0.1 0.0 0.0 0.0 sum(0.72+0.26+0.1+0.1)
2 PC_2 0.80 0.18 0.2 0.0 0.0 0.0 0.0 sum(0.80+0.18+0.20+0.0)
3 PC_3 0.79 0.19 0.1 0.1 0.0 0.0 0.0 sum(0.79+0.19+0.1+0.1)
I don't know how to solve the problem and am looking for help.
Thank you in advance.
Set 'PC' to the index to make things easier. We zero everything before your column, cumsum up to the column, and keep everything as is after your column.
df = df.set_index('PC')
def add_sum(df, level):
i = df.columns.get_loc(level)
df_add = (pd.concat([pd.DataFrame(0, index=df.index, columns=df.columns[:i]),
df.cumsum(1).iloc[:, i],
df.iloc[:, i+1:]],
axis=1)
.add_suffix(f'_{level}'))
return pd.concat([df, df_add], axis=1)
add_sum(df, '1') # 1 if columns labels are int
0 1 2 3 0_1 1_1 2_1 3_1
PC
PC_1 0.74 0.25 0.1 0.0 0 0.99 0.1 0.0
PC_1 0.72 0.26 0.1 0.1 0 0.98 0.1 0.1
PC_2 0.80 0.18 0.2 0.0 0 0.98 0.2 0.0
PC_3 0.79 0.19 0.1 0.1 0 0.98 0.1 0.1
add_sum(df, '3')
0 1 2 3 0_3 1_3 2_3 3_3
PC
PC_1 0.74 0.25 0.1 0.0 0 0 0 1.09
PC_1 0.72 0.26 0.1 0.1 0 0 0 1.18
PC_2 0.80 0.18 0.2 0.0 0 0 0 1.18
PC_3 0.79 0.19 0.1 0.1 0 0 0 1.18
As you wrote based on level selected in column header in the title,
I understand that:
there is no "external" level variable,
the level (how many columns to sum) results just from
the source column name.
So the task is actually to "concatenate" your both expected results (you presented only how to compute column 1_1 and 3_1) and compute other
new columns the same way.
The solution to do it is surprisingly concise.
Run the following one-liner:
df = df.join(df.iloc[:, 1:].cumsum(axis=1)
.rename(lambda name: str(name) + '_1', axis=1))
Details:
df.iloc[:, 1:] - Take all rows, starting from column 1 (column
numbers from 0).
cumsum(axis=1) - Compute cumulative sum, horizontally.
rename(..., axis=1) - Rename columns.
lambda name: str(name) + '_1' - Lambda function to compute new
column name.
The result so far - new columns.
df = df.join(...) - Join with the original DataFrame and save the
result back under df.

python pandas change dataframe to pivoted columns

I have a dataframe that looks as following:
Type Month Value
A 1 0.29
A 2 0.90
A 3 0.44
A 4 0.43
B 1 0.29
B 2 0.50
B 3 0.14
B 4 0.07
I want to change the dataframe to following format:
Type A B
1 0.29 0.29
2 0.90 0.50
3 0.44 0.14
4 0.43 0.07
Is this possible ?
Use set_index + unstack
df.set_index(['Month', 'Type']).Value.unstack()
Type A B
Month
1 0.29 0.29
2 0.90 0.50
3 0.44 0.14
4 0.43 0.07
To match your exact output
df.set_index(['Month', 'Type']).Value.unstack().rename_axis(None)
Type A B
1 0.29 0.29
2 0.90 0.50
3 0.44 0.14
4 0.43 0.07
Pivot solution:
In [70]: df.pivot(index='Month', columns='Type', values='Value')
Out[70]:
Type A B
Month
1 0.29 0.29
2 0.90 0.50
3 0.44 0.14
4 0.43 0.07
In [71]: df.pivot(index='Month', columns='Type', values='Value').rename_axis(None)
Out[71]:
Type A B
1 0.29 0.29
2 0.90 0.50
3 0.44 0.14
4 0.43 0.07
You're having a case of long format table which you want to transform to a wide format.
This is natively handled in pandas:
df.pivot(index='Month', columns='Type', values='Value')

Categories

Resources