i got .csv file with lines like this :
result,table,_start,_stop,_time,_value,_field,_measurement,device
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:35Z,44.61,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:40Z,17.33,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:45Z,41.2,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:51Z,33.49,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:56Z,55.68,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:57Z,55.68,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:02Z,25.92,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:08Z,5.71,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
I need to make them look like this:
time value
0 2022-10-24T12:12:35Z 44.61
1 2022-10-24T12:12:40Z 17.33
2 2022-10-24T12:12:45Z 41.20
3 2022-10-24T12:12:51Z 33.49
4 2022-10-24T12:12:56Z 55.68
I will need that for my anomaly detection code so I dont have to manualy delete columns and so on. At least not all of them. I cant do it with the program that works with the mashine that collect wattage info.
I tried this but it doeasnt work enough:
df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv')
df['_time'] = pd.to_datetime(df['_time'], format='%Y-%m-%dT%H:%M:%SZ')
df = pd.pivot(df, index = '_time', columns = '_field', values = '_value')
df.interpolate(method='linear') # not neccesary
It gives this output:
0
9 83.908
10 80.342
11 79.178
12 75.621
13 72.826
... ...
73522 10.726
73523 5.241
Here is the canonical way to project down to a subset of columns in the pandas ecosystem.
df = df[['_time', '_value']]
You can simply use the keyword argument usecols of pandas.read_csv :
df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv', usecols=["_time", "_value"])
NB: If you need to read the entire data of your (.csv) and only then select a subset of columns, Pandas core developers suggest you to use pandas.DataFrame.loc. Otherwise, by using df = df[subset_of_cols] synthax, the moment you'll start doing some operations on the (new?) sub-dataframe, you'll get a warning :
SettingWithCopyWarning:
A value is trying to be set on a copy of a
slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] =
value instead
So, in your case you can use :
df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv')
df = df.loc[:, ["_time", "_value"]] #instead of df[["_time", "_value"]]
Another option is pandas.DataFrame.copy,
df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv')
df = df[["_time", "_value"]].copy()
.read_csv has a usecols parameter to specify which columns you want in the DataFrame.
df = pd.read_csv(f,header=0,usecols=['_time','_value'] )
print(df)
_time _value
0 2022-10-24T12:12:35Z 44.61
1 2022-10-24T12:12:40Z 17.33
2 2022-10-24T12:12:45Z 41.20
3 2022-10-24T12:12:51Z 33.49
4 2022-10-24T12:12:56Z 55.68
5 2022-10-24T12:12:57Z 55.68
6 2022-10-24T12:13:02Z 25.92
7 2022-10-24T12:13:08Z 5.71
I am trying to append rows by column for every set of 4 rows/values.
I have 11 values the first 4 values should be in one concatenated row and row-5 to row-8 as one value and last 3 rows as one value even if the splitted values are not four.
df_in = pd.DataFrame({'Column_IN': ['text 1','text 2','text 3','text 4','text 5','text 6','text 7','text 8','text 9','text 10','text 11']})
and my expected output is as follows
df_out = pd.DataFrame({'Column_OUT': ['text 1&text 2&text 3&text 4','text 5&text 6&text 7&text 8','text 9&text 10&text 11']})
I have tried to get my desired output df_out as below.
df_2 = df_1.iloc[:-7].agg('&'.join).to_frame()
Any modification required to get required output?
Try using groupby and agg:
>>> df_in.groupby(df_in.index // 4).agg('&'.join)
Column_IN
0 text 1&text 2&text 3&text 4
1 text 5&text 6&text 7&text 8
2 text 9&text 10&text 11
>>>
For ex. consider the data frame given below:
Timestamp in_speed
1625638530 268.78
1625638590 262.75
1625638650 265.43
1625638710 270.67
1625638770 261.13
1625638830 265.49
1625638890 266.51
1625638950 270.54
1625639010 275.12
1625639070 267.62
1625639130 267.20
1625639190 265.29
1625639250 261.95
1625639310 264.39
1625639370 270.76
1625639430 291.18
I want to extract the whole row containing the maximum value for every 7 rows. Hence, desired output will be:
1625638710 270.67
1625639010 275.12
1625639430 291.18
Use DataFrameGroupBy.idxmax for indices by maximal values and select by DataFrame.loc:
df = df.loc[df.groupby(df.index // 7)['in_speed'].idxmax()]
#alternative for not default index
#df = df.loc[df.groupby(np.arange(len(df)) // 7)['in_speed'].idxmax()]
print (df)
Timestamp in_speed
3 1625638710 270.67
8 1625639010 275.12
15 1625639430 291.18
I was looking for a similar question but I did not find a solution for what I want to do. any help is welcome
so here is the code to get an example of my Dataframe :
import pandas as pd
L = [[0.1998,'IN TIME,IN TIME','19708,19708','MR SD#5 W/Z SD#6 X/Y',20.5],
[0.3983,'LATE,IN TIME','11206,18054','MR SD#4 A/B SD#1 C/D',19.97]]
df = pd.DataFrame(L,columns=['Time','status','F_nom','info','Delta'])
output :
I would like to create two new rows for each row in my main dataframe based on 'Info' column
as we can see on the column 'Info' in my main dataframe each row contains two different SD#
i would like to have only one SD# per row
Also i would like to keep the corresponding values of the columns : Time , Status , F_norm ,Delta
Finaly create a new column 'type info' that contains the specific string for each SD# (W/Z or A/B etc.) and all this by keeping the index of my main data_frame !
Here is the desired result :
I hope i was clear enough, waiting for your returns thank you.
Use:
#split values by comma or whitespace
df['status'] = df['status'].str.split(',')
df['F_nom'] = df['F_nom'].str.split(',')
info = df.pop('info').str.split()
#select values by indexing
df['info'] = info.str[1::2]
df['type_info'] = info.str[2::2]
#reshape to Series
s = df.set_index(['Time','Delta']).stack()
#create new DataFrame and reshape to expected output
df1 = (pd.DataFrame(s.values.tolist(), index=s.index)
.stack()
.unstack(2)
.reset_index(level=2, drop=True)
.reset_index())
print (df1)
Time Delta status F_nom info type_info
0 0.1998 20.50 IN TIME 19708 SD#5 W/Z
1 0.1998 20.50 IN TIME 19708 SD#6 X/Y
2 0.3983 19.97 LATE 11206 SD#4 A/B
3 0.3983 19.97 IN TIME 18054 SD#1 C/D
Another solution:
df['status'] = df['status'].str.split(',')
df['F_nom'] = df['F_nom'].str.split(',')
info = df.pop('info').str.split()
df['info'] = info.str[1::2]
df['type_info'] = info.str[2::2]
from itertools import chain
lens = df['status'].str.len()
df = pd.DataFrame({
'Time' : df['Time'].values.repeat(lens),
'status' : list(chain.from_iterable(df['status'].tolist())),
'F_nom' : list(chain.from_iterable(df['F_nom'].tolist())),
'info' : list(chain.from_iterable(df['info'].tolist())),
'Delta' : df['Delta'].values.repeat(lens),
'type_info' : list(chain.from_iterable(df['type_info'].tolist())),
})
print (df)
Time status F_nom info Delta type_info
0 0.1998 IN TIME 19708 SD#5 20.50 W/Z
1 0.1998 IN TIME 19708 SD#6 20.50 X/Y
2 0.3983 LATE 11206 SD#4 19.97 A/B
3 0.3983 IN TIME 18054 SD#1 19.97 C/D
My data is looking like this:
pd.read_csv('/Users/admin/desktop/007538839.csv').head()
105586.18
0 105582.910
1 105585.230
2 105576.445
3 105580.016
4 105580.266
I want to move that 105568.18 to the 0 index because now it is the column name. And after that I want to name this column 'flux'. I've tried
pd.read_csv('/Users/admin/desktop/007538839.csv', sep='\t', names = ["flux"])
but it did not work, probably because the dataframe is not in the right format.
How can I achieve that?
For me your code working very nice:
import pandas as pd
temp=u"""105586.18
105582.910
105585.230
105576.445
105580.016
105580.266"""
#after testing replace 'pd.compat.StringIO(temp)' to '/Users/admin/desktop/007538839.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep='\t', names = ["flux"])
print (df)
flux
0 105586.180
1 105582.910
2 105585.230
3 105576.445
4 105580.016
5 105580.266
For overwrite original file with same data with new header flux:
df.to_csv('/Users/admin/desktop/007538839.csv', index=False)
Try this:
df=pd.read_csv('/Users/admin/desktop/007538839.csv',header=None)
df.columns=['flux']
header=None is the friend of yours.