Sum of columns from the same file

Sum of columns from the same file - python

I need to calculate a moment (Mz,correct) given by the sum of a moment (Mz) and a force (Fx) multiplied by its arm (300.56) because I need to make a change of reference system and mobilize everything on the new system of reference. This is the script I tried to write that Fx and Mz the same starting file (.dat):
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#-----------------------input file-------------------------------
filename = 'drag_time_series' #nome file sorgente
# specifica percorso file DAT
df = pd.read_csv(rf"C:\Users\suemack528\Desktop\OneDrive - Università degli Studi di Padova\deme\unipd\magistrale\TESI\impalcato\drag\{filename}.dat",
header=1, delim_whitespace=True)
df = df.round(decimals=3)
#-----for "cycle" to obtain correct moment-----
for i in range(len('Mz')):
Mz,correct[i]=df('Mz') + df('Fx')*300.56
I think that is not correct. How can I write this script better? I'm doing it with spyder
error obtained

To get a column from a pandas dataframe you can use [] (instead of ()). For more information regarding the use of indexes and selecting data from a dataframe you can check the pandas documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#basics

Related

Adding tooltip to folium.features.GeoJson from a geopandas dataframe

I am having issues adding tooltips to my folium.features.GeoJson. I can't get columns to display from the dataframe when I select them.
feature = folium.features.GeoJson(df.geometry,
name='Location',
style_function=style_function,
tooltip=folium.GeoJsonTooltip(fields= [df.acquired],aliases=["Time"],labels=True))
ax.add_child(feature)
For some reason when I run the code above it responds with
Name: acquired, Length: 100, dtype: object is not available in the data. Choose from: ().
I can't seem to link the data to my tooltip.

have made your code a MWE by including some data
two key issues with your code
need to pass properties not just geometry to folium.features.GeoJson() Hence passed df instead of df.geometry
folium.GeoJsonTooltip() takes a list of properties (columns) not an array of values. Hence passed ["acquired"] instead of array of values from a dataframe column
implied issue with your code. All dataframe columns need to contain values that can be serialised to JSON. Hence conversion of acquired to string and drop()
import geopandas as gpd
import pandas as pd
import shapely.wkt
import io
import folium
df = pd.read_csv(io.StringIO("""ref;lanes;highway;maxspeed;length;name;geometry
A3015;2;primary;40 mph;40.68;Rydon Lane;MULTILINESTRING ((-3.4851169 50.70864409999999, -3.4849879 50.7090007), (-3.4857269 50.70693379999999, -3.4853034 50.7081574), (-3.488620899999999 50.70365289999999, -3.4857269 50.70693379999999), (-3.4853034 50.7081574, -3.4851434 50.70856839999999), (-3.4851434 50.70856839999999, -3.4851169 50.70864409999999))
A379;3;primary;50 mph;177.963;Rydon Lane;MULTILINESTRING ((-3.4763853 50.70886769999999, -3.4786112 50.70811229999999), (-3.4746017 50.70944449999999, -3.4763853 50.70886769999999), (-3.470350900000001 50.71041779999999, -3.471219399999999 50.71028909999998), (-3.465049699999999 50.712158, -3.470350900000001 50.71041779999999), (-3.481215600000001 50.70762499999999, -3.4813909 50.70760109999999), (-3.4934747 50.70059599999998, -3.4930204 50.7007898), (-3.4930204 50.7007898, -3.4930048 50.7008015), (-3.4930048 50.7008015, -3.4919513 50.70168349999999), (-3.4919513 50.70168349999999, -3.49137 50.70213669999998), (-3.49137 50.70213669999998, -3.4911565 50.7023015), (-3.4911565 50.7023015, -3.4909108 50.70246919999999), (-3.4909108 50.70246919999999, -3.4902349 50.70291189999999), (-3.4902349 50.70291189999999, -3.4897693 50.70314579999999), (-3.4805021 50.7077218, -3.4806265 50.70770150000001), (-3.488620899999999 50.70365289999999, -3.4888806 50.70353719999999), (-3.4897693 50.70314579999999, -3.489176800000001 50.70340539999999), (-3.489176800000001 50.70340539999999, -3.4888806 50.70353719999999), (-3.4865751 50.70487679999999, -3.4882604 50.70375799999999), (-3.479841700000001 50.70784459999999, -3.4805021 50.7077218), (-3.4882604 50.70375799999999, -3.488620899999999 50.70365289999999), (-3.4806265 50.70770150000001, -3.481215600000001 50.70762499999999), (-3.4717096 50.71021009999998, -3.4746017 50.70944449999999), (-3.4786112 50.70811229999999, -3.479841700000001 50.70784459999999), (-3.471219399999999 50.71028909999998, -3.4717096 50.71021009999998))"""),
sep=";")
df = gpd.GeoDataFrame(df, geometry=df["geometry"].apply(shapely.wkt.loads), crs="epsg:4326")
df["acquired"] = pd.date_range("8-feb-2022", freq="1H", periods=len(df))
def style_function(x):
return {"color":"blue", "weight":3}
ax = folium.Map(
location=[sum(df.total_bounds[[1, 3]]) / 2, sum(df.total_bounds[[0, 2]]) / 2],
zoom_start=12,
)
# data time is not JSON serializable...
df["tt"] = df["acquired"].dt.strftime("%Y-%b-%d %H:%M")
feature = folium.features.GeoJson(df.drop(columns="acquired"),
name='Location',
style_function=style_function,
tooltip=folium.GeoJsonTooltip(fields= ["tt"],aliases=["Time"],labels=True))
ax.add_child(feature)

How to read columns from different files and plot?

I have data of concentrations for every day of year 2005 until 2018. I want to read three columns of three different files and combine them into one, so I can plot them.
Data:file 1
time, mean_OMNO2d_003_ColumnAmountNO2CloudScreened
2005-01-01,-1.267651e+30
2005-01-02,4.90778397e+15
...
2018-12-31,-1.267651e+30
Data:file 2
time, OMNO2d_003_ColumnAmountNO2TropCloudScreened
2005-01-01,-1.267651e+30
2005-01-02,3.07444147e+15
...
Data:file 3
time, OMSO2e_003_ColumnAmountSO2_PBL
2005-01-01,-1.267651e+30
2005-01-02,-0.0144000314
...
I want to plot time and mean_OMNO2d_003_ColumnAmountNO2CloudScreened, OMNO2d_003_ColumnAmountNO2TropCloudScreened, OMSO2e_003_ColumnAmountSO2_PBL into one graph.
import glob
import pandas as pd
file_list = glob.glob('*.csv')
no= []
no2=[]
so2=[]
for f in file_list:
df= pd.read_csv(f, skiprows=8, parse_dates =['time'], index_col ='time')
df.columns=['no','no2','so2']
no.append([df["no"]])
no2.append([df["no2"]])
so2.append([df["so2"]])
How do I solve the problem?

This is very doable. I had a similar problem with 3 files all in one plot. My understanding is that you want to compare levels of NO, NO2, and SO2, that each column is in comparable order, and that you want to compare across rows. If you are ok with importing matplotlib and numpy, something like this may work for you:
import numpy as np
import matplotlib as plt
NO = np.asarray(df["no1"])
NO2 = np.asarray(df["no2"]))
SO2 = np.asarray(df["so2"))
timestamp = np.asarray(df["your_time_stamp"])
plt.plot(timestamp, NO)
plt.plot(timestamp, NO2)
plt.plot(timestamp, SO2)
plt.savefig(name_of_plot)
This will need some adjusting for your specific data frame, but I hope you see what I am getting at!

Loop for multiple dataframes with a function

I tried to run a function through multiple data frames, but I have a problem with it. My main questions are:
1) I tried to run a defined function with zip(df1, df2, df3,...) and the outputs are new DF1, DF2, DF3,...; however, I failed. Is it possible to run a function through multiple dataframes and outputs are also dataframes by "zip"?
2) If zip() is not a choice, how do I do to make my function running in a loop? Currently, I just have three dataframes and they are easy to be done separately. But I would like to know how to handle it when I have 50, 100, or even more dataframes.
Here are my codes:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
#import scipy.stats as ss
# *********** 3 City Temperature files from NOAA ***********
# City 1
df1 = pd.pandas.read_csv('https://docs.google.com/spreadsheets/d/1Uj5N363dEVJZ9WVy2a_kkbJKJnyyE5qnEqOfzO0UCQE/gviz/tq?tqx=out:csv')
# City 2
df2 = pd.pandas.read_csv('https://docs.google.com/spreadsheets/d/13CgTdDCDzB_3WIYIRVMeLu6E36xzHSzRR5T_Ku0vThA/gviz/tq?tqx=out:csv')
# City 3
df3 = pd.pandas.read_csv('https://docs.google.com/spreadsheets/d/17pNZFIaV_NpQfSed-msIGu9jzzqF6JBvCZrBRiU2ZkQ/gviz/tq?tqx=out:csv')
def CleanDATA(data):
data = data.drop(columns=['Annual'])
data = data.drop(data.index[29:-1])
data = data.drop(data.index[-1])
monthname=[]
Temp=[]
for row in range(0,len(data)):
for col in range(1,13):
#monthname.append(str(col)+"-"+str(data['Year'][row]))
monthname.append(str(data['Year'][row])+str(col))
Temp.append(data.iloc[row,col])
df0=pd.DataFrame()
df0['Month']=monthname
df0['Temperature']=Temp
df0['Month']=pd.to_datetime(df0['Month'],format='%Y.0%m') #change the date form
df0['Month'] = pd.to_datetime(df0['Month']).dt.date # remove time, only keep date
data =df0[df0.applymap(np.isreal).all(1)] # remove non-numerical
return data
data1 = CleanDATA(df1)
data2 = CleanDATA(df2)
data3 = CleanDATA(df3)
Also, I found an issue with Pandas while reading the following excel file:
https://drive.google.com/file/d/1V9fKpACbLrSi0NfB0FHSgc96PQerKkUF/view?usp=sharing (This is city 1 temperature data from 1990-2019)
2019 is ongoing, hence, NOAA stations only provide information till this May. The excel data labels all missing data by "M". I noticed that once the column comes with an "M", I cannot use boxplot directly even I already drop 2019 row. Spyder console will say "items [Jun to Dec]" are missing (and the wired thing is I can use the same data to plot XY line plot). To plot the boxplot, I have to manually remove 2019 information (1 row) in excel than read the new file.

I would do it using dictionaries (or lists or other iterable).
cities = {'city1': 'https://...', 'city2': 'https://...', 'city3': 'https://...'}
df = {}
data = {}
for city, url in iteritems(cities):
df[city] = pd.pandas.read_csv(url)
data[city] = CleanDATA(df[city])

Tracking Error on a number of benchmarks

I'm trying to calculate tracking error for a number of different benchmarks versus a fund that I'm looking at (tracking error is defined as the standard deviation of the percent difference between the fund and benchmark). The time series for the fund and all the benchmarks are all in a data frame that I'm reading from an excel on file and what I have so far is this (with the idea that arg1 represents all the benchmarks and is then applied using applymap), but it's returning a KeyError, any suggestions?
import pandas as pd
import numpy as np
data = pd.read_excel('File_Path.xlsx')
def index_analytics(arg1):
tracking_err = np.std((data['Fund'] - data[arg1]) / data[arg1])
return tracking_err
data.applymap(index_analytics)

There are a few things that need fixed. First,applymap passes each individual value for all the columns to your calling function (index_analytics). So arg1 is the individual scalar value for all the values in your dataframe. data[arg1] is always going to return a key error unless all your values are also column names.
You also shouldn't need to use apply to do this. Assuming your benchmarks are in the same dataframe then you should be able to do something like this for each benchmark. Next time include a sample of your dataframe.
df['Benchmark1_result'] = (df['Fund'] - data['Benchmark1']) / data['Benchmark1']
And if you want to calculate all the standard deviations for all the benchmarks you can do this
# assume you have a dataframe with a list of all the benchmark columns
benchmark_columns = [list, of, benchmark, columns]
np.std((df['Fund'].values - df[benchmark_columns].values) / df['Fund'].values, axis=1)

Assuming you're following the definition of Tracking Error below:
import pandas as pd
import numpy as np
# Example DataFrame
df = pd.DataFrame({'Portfolio_Returns': [5.00, 1.67], 'Bench_Returns': [2.89, .759]})
df['Active_Return'] = df['Portfolio_Returns'] - df['Bench_Returns']
print(df.head())
list_ = df['Active_Return']
temp_ = []
for val in list_:
x = val**2
temp_.append(x)
tracking_error = np.sqrt(sum(temp_))
print(f"Tracking Error is: {tracking_error}")
Or if you want it more compact (because apparently the cool kids do it):
df = pd.DataFrame({'Portfolio_Returns': [5.00, 1.67], 'Bench_Returns': [2.89, .759]})
tracking_error = np.sqrt(sum([val**2 for val in df['Portfolio_Returns'] - df['Bench_Returns']]))
print(f"Tracking Error is: {tracking_error}")

ProgrammingError: (psycopg2.ProgrammingError) can't adapt type 'numpy.ndarray'

I am creating a python script using pandas to read through a file which has multiple row values.
Once read, I need to build an array of these values and then assign it to a dataframe row value.
The code I have used is
import re
import numpy as np
import pandas as pd
master_data = pd.DataFrame()
temp_df = pd.DataFrame()
new_df = pd.DataFrame()
for f in data:
##Reading the file in pandas which is in excel format
#
file_df = pd.read_excel(f)
filename = file_df['Unnamed: 1'][2]
##Skipping first 24 rows to get the required reading values
column_names = ['start_time','xxx_value']
data_df = pd.read_excel(f, names=column_names, skiprows=25)
array =np.array([])
for i in data_df.iterrows():
array = np.append(array,i[1][1])
temp_df['xxx_value'] = [array]
temp_df['Filename'] = filename
temp_df['sub_id']=
temp_df['Filename'].str.split('_',1).str[1].str.strip()
temp_df['sen_site']=
temp_df['Filename'].str.split('_',1).str[0].str.strip()
temp_df['sampling_interval'] = 15
temp_df['start_time'] = data_df['start_time'][2]
new_df= new_df.append(xxx_df)
new_df.index = new_df.index + 1
new_df=new_df.sort_index()
new_df.index.name='record_id'
new_df = new_df.drop("Filename",1) ##dropping the Filename as it
is not needed to be loaded in postgresql
##Rearrange to postgresql format
column_new_df = new_df.columns.tolist()
column_new_df.
insert(4,column_new_df.pop(column_new_df.index('xxx_value')))
new_df = new_df.reindex(columns = column_new_df)
print(new_df)
This code is not working when I try to insert the array data into Postgresql.
It gives me an error stating:
ProgrammingError: (psycopg2.ProgrammingError) can't adapt type
'numpy.ndarray'

I am not sure where the problem is, as I can't see in your code the part where you insert the data into Postgres.
My guess though is that you are giving Postgres a Numpy array: psycopg2 can't handle Numpy data types, but it should be fairly easy to convert it to native Python types that work with psycopg2 (e.g. by using the .tolist(method), it is difficult to give more precise information without the code).

In my opinion, the most effective way would be to make psycopg2 always aware of np.ndarray(s). One could do that by registering an adapter:
import numpy as np
from psycopg2.extensions import register_adapter, AsIs
def addapt_numpy_array(numpy_array):
return AsIs(tuple(numpy_array))
register_adapter(np.ndarray, addapt_numpy_array)
To help working with numpy in general, my default addon to scripts/libraries dependent on psycopg2 is:
import numpy as np
from psycopg2.extensions import register_adapter, AsIs
def addapt_numpy_float64(numpy_float64):
return AsIs(numpy_float64)
def addapt_numpy_int64(numpy_int64):
return AsIs(numpy_int64)
def addapt_numpy_float32(numpy_float32):
return AsIs(numpy_float32)
def addapt_numpy_int32(numpy_int32):
return AsIs(numpy_int32)
def addapt_numpy_array(numpy_array):
return AsIs(tuple(numpy_array))
register_adapter(np.float64, addapt_numpy_float64)
register_adapter(np.int64, addapt_numpy_int64)
register_adapter(np.float32, addapt_numpy_float32)
register_adapter(np.int32, addapt_numpy_int32)
register_adapter(np.ndarray, addapt_numpy_array)
otherwise there would be some issues even with numerical types.
I got the adapter trick from this other stackoverflow entry.

Convert each numpy array element to its equivalent list using apply and tolist first, and then you should be able to write the data to Postgres:
df['column_name'] = df['column_name'].apply(lambda x: x.tolist())

We can address the issue by extracting one element at a time. Here I'm assuming for a dataframe temp_df, sub_id of type numpy.int64, we can directly extract the values using the iloc and item as temp_df.iloc[0]['sub_id'].item() and we can push that in DB.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sum of columns from the same file - python

To get a column from a pandas dataframe you can use [] (instead of ()). For more information regarding the use of indexes and selecting data from a dataframe you can check the pandas documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#basics

Related

Adding tooltip to folium.features.GeoJson from a geopandas dataframe

How to read columns from different files and plot?

Loop for multiple dataframes with a function

Tracking Error on a number of benchmarks

ProgrammingError: (psycopg2.ProgrammingError) can't adapt type 'numpy.ndarray'

Categories

Resources