FIlrer csv table to have just 2 columns. Python pandas pd .pd

FIlrer csv table to have just 2 columns. Python pandas pd .pd - python

i got .csv file with lines like this :
result,table,_start,_stop,_time,_value,_field,_measurement,device
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:35Z,44.61,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:40Z,17.33,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:45Z,41.2,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:51Z,33.49,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:56Z,55.68,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:57Z,55.68,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:02Z,25.92,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:08Z,5.71,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
I need to make them look like this:
time value
0 2022-10-24T12:12:35Z 44.61
1 2022-10-24T12:12:40Z 17.33
2 2022-10-24T12:12:45Z 41.20
3 2022-10-24T12:12:51Z 33.49
4 2022-10-24T12:12:56Z 55.68
I will need that for my anomaly detection code so I dont have to manualy delete columns and so on. At least not all of them. I cant do it with the program that works with the mashine that collect wattage info.
I tried this but it doeasnt work enough:
df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv')
df['_time'] = pd.to_datetime(df['_time'], format='%Y-%m-%dT%H:%M:%SZ')
df = pd.pivot(df, index = '_time', columns = '_field', values = '_value')
df.interpolate(method='linear') # not neccesary
It gives this output:
0
9 83.908
10 80.342
11 79.178
12 75.621
13 72.826
... ...
73522 10.726
73523 5.241

Here is the canonical way to project down to a subset of columns in the pandas ecosystem.
df = df[['_time', '_value']]

You can simply use the keyword argument usecols of pandas.read_csv :
df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv', usecols=["_time", "_value"])
NB: If you need to read the entire data of your (.csv) and only then select a subset of columns, Pandas core developers suggest you to use pandas.DataFrame.loc. Otherwise, by using df = df[subset_of_cols] synthax, the moment you'll start doing some operations on the (new?) sub-dataframe, you'll get a warning :
SettingWithCopyWarning:
A value is trying to be set on a copy of a
slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] =
value instead
So, in your case you can use :
df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv')
df = df.loc[:, ["_time", "_value"]] #instead of df[["_time", "_value"]]
Another option is pandas.DataFrame.copy,
df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv')
df = df[["_time", "_value"]].copy()

.read_csv has a usecols parameter to specify which columns you want in the DataFrame.
df = pd.read_csv(f,header=0,usecols=['_time','_value'] )
print(df)
_time _value
0 2022-10-24T12:12:35Z 44.61
1 2022-10-24T12:12:40Z 17.33
2 2022-10-24T12:12:45Z 41.20
3 2022-10-24T12:12:51Z 33.49
4 2022-10-24T12:12:56Z 55.68
5 2022-10-24T12:12:57Z 55.68
6 2022-10-24T12:13:02Z 25.92
7 2022-10-24T12:13:08Z 5.71

Related

Extract values within the quotes signs into two separate columns with python

How can i extract the values within the quotes signs into two separate columns with python. The dataframe is given below:
df = pd.DataFrame(["'FRH02';'29290'", "'FRH01';'29300'", "'FRT02';'29310'", "'FRH03';'29340'",
"'FRH05';'29350'", "'FRG02';'29360'"], columns = ['postcode'])
df
postcode
0 'FRH02';'29290'
1 'FRH01';'29300'
2 'FRT02';'29310'
3 'FRH03';'29340'
4 'FRH05';'29350'
5 'FRG02';'29360'
i would like to get an output like the one below:
postcode1 postcode2
FRH02 29290
FRH01 29300
FRT02 29310
FRH03 29340
FRH05 29350
FRG02 29360
i have tried several str.extract codes but havent been able to figure this out. Thanks in advance.

Finishing Quang Hoang's solution that he left in the comments:
import pandas as pd
df = pd.DataFrame(["'FRH02';'29290'",
"'FRH01';'29300'",
"'FRT02';'29310'",
"'FRH03';'29340'",
"'FRH05';'29350'",
"'FRG02';'29360'"],
columns = ['postcode'])
# Remove the quotes and split the strings, which results in a Series made up of 2-element lists
postcodes = df['postcode'].str.replace("'", "").str.split(';')
# Unpack the transposed postcodes into 2 new columns
df['postcode1'], df['postcode2'] = zip(*postcodes)
# Delete the original column
del df['postcode']
print(df)
Output:
postcode1 postcode2
0 FRH02 29290
1 FRH01 29300
2 FRT02 29310
3 FRH03 29340
4 FRH05 29350
5 FRG02 29360

You can use Series.str.split:
p1 = []
p2 = []
for row in df['postcode'].str.split(';'):
p1.append(row[0])
p2.append(row[1])
df2 = pd.DataFrame()
df2["postcode1"] = p1
df2["postcode2"] = p2

Querying a list object from API and returning it into dataframe - issues with format

I have the below script that returns data in a list format per quote of (i). I set up an empty list, and then query with the API function get_kline_data, and pass each output into my klines_list with the .extend function
klines_list = []
a = ["REQ-ETH","REQ-BTC","XLM-BTC"]
for i in a:
klines = client.get_kline_data(i, '5min', 1619317366, 1619317606)
klines_list.extend([i,klines])
klines_list
klines_list then returns data in this format;
['REQ-ETH',
[['1619317500',
'0.0000491',
'0.0000491',
'0.0000491',
'0.0000491',
'5.1147',
'0.00025113177']],
'REQ-BTC',
[['1619317500',
'0.00000219',
'0.00000219',
'0.00000219',
'0.00000219',
'19.8044',
'0.000043371636']],
'XLM-BTC',
[['1619317500',
'0.00000863',
'0.00000861',
'0.00000863',
'0.00000861',
'653.5693',
'0.005629652673']]]
I then try to convert it into a dataframe;
import pandas as py
df = py.DataFrame(klines_list)
And this is the result;
0
0 REQ-ETH
1 [[1619317500, 0.0000491, 0.0000491, 0.0000491,...
2 REQ-BTC
3 [[1619317500, 0.00000219, 0.00000219, 0.000002...
4 XLM-BTC
5 [[1619317500, 0.00000863, 0.00000861, 0.000008..
The structure of the DF is incorrect and it seems to be due to the way I have put my list together.
I would like the quantitative data in a column corresponding to the correct entry in list a, not in rows. Also, the ticker data, or list a, ("REQ-ETH/REQ-BTC") etc should be in a separate column. What would be a good way to go about restructuring this?
Edit: #Ynjxsjmh
This is the output when following the suggestion below for appending a dictionary within the for loop
REQ-ETH REQ-BTC XLM-BTC
0 [1619317500, 0.0000491, 0.0000491, 0.0000491, ... NaN NaN
1 NaN [1619317500, 0.00000219, 0.00000219, 0.0000021... NaN
2 NaN NaN [1619317500, 0.00000863, 0.00000861, 0.0000086...

pandas.DataFrame() can accept a dict. It will construct the dict key as column header, dict value as column values.
import pandas as pd
a = ["REQ-ETH","REQ-BTC","XLM-BTC"]
klines_data = {}
for i in a:
klines = client.get_kline_data(i, '5min', 1619317366, 1619317606)
klines_data[i] = klines[0]
# ^
# |
# Add a key to klines_data
df = pd.DataFrame(klines_data)
print(df)
REQ-ETH REQ-BTC XLM-BTC
0 1619317500 1619317500 1619317500
1 0.0000491 0.00000219 0.00000863
2 0.0000491 0.00000219 0.00000861
3 0.0000491 0.00000219 0.00000863
4 0.0000491 0.00000219 0.00000861
5 5.1147 19.8044 653.5693
6 0.00025113177 0.000043371636 0.005629652673
If the length of klines is not equal, you can use
df = pd.DataFrame.from_dict(klines_data, orient='index').T

How to extract values from a list in Python and put into a dataframe

I have trained a model and have asked the model to produce the coefficients:
modelcoeffs = model.fit(X_train, y_train).coef_
coeffslist = list(modelcoeffs)
which yiels me for example:
print(coeffslist):
[0.17005542 0.72965947 0.6833308 0.02509676]
I am trying to split these 4 coefficients out however they dont seem to be individual elements?
does anyone know how to split these into four numbers?
I am trying to get:
df['1'] = coeffslist[0]
df['2'] = coeffslist[1]
df['3'] = coeffslist[2]
df['4'] = coeffslist[3]
But it gives me NaN in the df. Does anyone have any ideas? thanks!
UPDATE
I am basically trying to get the coeffs to append to a df
print(df)
1 2 3 4
.... ..... ..... .....
0.17005542 0.72965947 0.6833308 0.02509676

This coeffslist doesn't look like a valid Python structure, it's missing commas.
But you might try this:
import pandas as pd
df = pd.DataFrame([0.17005542, 0.72965947, 0.6833308, 0.02509676])
print(df)
Output:
0
0 0.170055
1 0.729659
2 0.683331
3 0.025097
To get the coefs as row try this:
import pandas as pd
df = pd.DataFrame(columns=list("1234"))
df.loc[len(df)] = [0.17005542, 0.72965947, 0.6833308, 0.02509676]
print(df)
Output:
1 2 3 4
0 0.170055 0.729659 0.683331 0.025097
And if you want to add another row (append) of coefs, just do this:
df.loc[1] = [0.17005542, 0.72965947, 0.6833308, 0.02509676]
print(df)
Output:
1 2 3 4
0 0.170055 0.729659 0.683331 0.025097
1 0.170055 0.729659 0.683331 0.025097

you can convert [0.17005542 0.72965947 0.6833308 0.02509676] to a sting, split it on space, convert to float again and then append to a dataframe.
str_list= str(coeffslist[0])
float_list= [float(x) for x in str_list.split()]
df=pd.DataFrame(columns=['1','2','3','4'])
a_series = pd.Series(float_list, index = df.columns)
df = df.append(a_series, ignore_index=True)

Shift down one row then rename the column

My data is looking like this:
pd.read_csv('/Users/admin/desktop/007538839.csv').head()
105586.18
0 105582.910
1 105585.230
2 105576.445
3 105580.016
4 105580.266
I want to move that 105568.18 to the 0 index because now it is the column name. And after that I want to name this column 'flux'. I've tried
pd.read_csv('/Users/admin/desktop/007538839.csv', sep='\t', names = ["flux"])
but it did not work, probably because the dataframe is not in the right format.
How can I achieve that?

For me your code working very nice:
import pandas as pd
temp=u"""105586.18
105582.910
105585.230
105576.445
105580.016
105580.266"""
#after testing replace 'pd.compat.StringIO(temp)' to '/Users/admin/desktop/007538839.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep='\t', names = ["flux"])
print (df)
flux
0 105586.180
1 105582.910
2 105585.230
3 105576.445
4 105580.016
5 105580.266
For overwrite original file with same data with new header flux:
df.to_csv('/Users/admin/desktop/007538839.csv', index=False)

Try this:
df=pd.read_csv('/Users/admin/desktop/007538839.csv',header=None)
df.columns=['flux']
header=None is the friend of yours.

Name a column added with pandas dataframe

l have the following csv file that l process as follow
import pandas as pd
df = pd.read_csv('file.csv', sep=',',header=None)
id ocr raw_value
00037625-4706-4dfe-a7b3-de8c47e3a28d A 3
000a7b30-4c4f-4756-a757-f688ccc55d5d A /c
000b08e3-4129-4fd2-8ec0-23d00fe38a45 A yes
00196436-12bc-4024-b623-25bac586d314 A know
001b8c43-3e73-43c1-ba4f-df5edb10dfac A hi
002882ca-48bb-4161-a75a-cf0ec984d650 A fd
003b2890-3727-4c79-955a-f74ec6945ed7 A Sensible
004d9025-86f0-4f8c-9720-01e3385c5e77 A 2015
Now l want to add a new column :
df['val']=None
for img in images:
id, ext = img.rsplit('.',1)
idx = df[df[0] ==id].index.values
df.loc[df.index[idx], 'val'] = id
When l write df in a new file as follow :
df.to_csv('new_file.csv', sep=',',encoding='utf-8')
l noticed that the column is correctly added and filled. But the column remains without name and it's supposed to be named val
id ocr raw_value
00037625-4706-4dfe-a7b3-de8c47e3a28d A 3 4
000a7b30-4c4f-4756-a757-f688ccc55d5d A /c 3
000b08e3-4129-4fd2-8ec0-23d00fe38a45 A yes 1
00196436-12bc-4024-b623-25bac586d314 A know 8
001b8c43-3e73-43c1-ba4f-df5edb10dfac A hi 9
002882ca-48bb-4161-a75a-cf0ec984d650 A fd 10
003b2890-3727-4c79-955a-f74ec6945ed7 A Sensible 14
How to set set to the last column added ?
EDIT1:
print(df.head())
0 1 2 3
0 id ocr raw_value manual_raw_value
1 00037625-4706-4dfe-a7b3-de8c47e3a28d ABBYY 03 03
2 000a7b30-4c4f-4756-a757-f688ccc55d5d ABBYY y/c y/c
3 000b08e3-4129-4fd2-8ec0-23d00fe38a45 ABBYY armoire armoire
4 00196436-12bc-4024-b623-25bac586d314 ABBYY point point
val
0 None
1 93
2 yic
3 armoire
4 point

Need only read_csv, because sep=',' is by default and can be omit and header=None is used if csv have no header:
df = pd.read_csv('file.csv')
Problem is your first row was not parsed to columns names, but to first data row.

df = pd.read_csv('file.csv', sep=',', header=0, index_col=0)
should allow you to simplify the next portion to
df['val']=None
for img in images:
image_id, ext = img.rsplit('.',1)
df.loc[image_id, 'val'] = image_id
If you don't need the image_id as index afterwards, use df.reset_index(inplace=True)

one easy way...
before to_csv:
df.columns.value[3]="val"

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

FIlrer csv table to have just 2 columns. Python pandas pd .pd - python

Here is the canonical way to project down to a subset of columns in the pandas ecosystem. df = df[['_time', '_value']]

Related

Extract values within the quotes signs into two separate columns with python

Querying a list object from API and returning it into dataframe - issues with format

How to extract values from a list in Python and put into a dataframe

Shift down one row then rename the column

Name a column added with pandas dataframe

Categories

Resources