Mean of a Row in Python (Reading CSV files using Pandas) - python

I am completely new to python.. I would like to ask how can I fix my code?
I can't make it to work because for some reason, it only calculates columns.
import numpy as np
import pandas as pd
rainfall = pd.read_csv('rainfall.csv', low_memory=False, parse_dates=True, header=None)
mean_rainfall = rainfall[0].mean()
print(mean_rainfall)
the picture of my csv

In pandas dataframe mean function you can provide parameter to let him him know either take mean of a row or column.
Check Here: pandas.DataFrame.mean.
It seams though it takes default axis value of 1 so it is calculation the mean of column.
Try this:
mean_rainfall = rainfall.iloc[0].mean(axis = 1)

Related

Using pyMannKendall python package for testing the trend for gridded rainfall data

I am using pyMannKendall python package for testing the trend for gridded rainfall data. I am successful in carrying out trend analysis for all the grids but now I want to write the results of all the grids to a CSV file. I am new to coding and is facing a problem. Attached below is my code.
import pymannkendall as mk
import csv
import numpy as np
df = pd.read_csv("pr_1979_2018.csv", index_col = "Year")
for i in df.columns:
res = mk.hamed_rao_modification_test(df[i])
new_df=pd.DataFrame(data=a, index= ['trend','h','p','z', 'Tau',
's','var_s','slope', 'intercept'], columns=['stats'], dtype=None)
new_df.to_csv("Mk_2.csv")
On running this code I am getting only a single column in my CSV file, however I want results of all the columns in my resulting CSV file. Please help
You can convert your rows into columns in Python using Transpose() in Pandas before export.
Try this:
new_df = pd.DataFrame.transpose(new_df)
new_df.to_csv("Mk_2.csv",header=True)

Label encoding in Pandas

I am working with data set which have numerical and categorical values. I find solution with numerical values, so next step is to make label encoding with categorical values. In order to do that I wrote these lines of code:
import pandas as pd
dataset_categorical = dataset.select_dtypes(include = ['object'])
new_column = dataset_categorical.astype('category')
After execution of last line of code in Jupyter I can't see an error, but values are not converted into encoded values.
Also this line work for example when I try with only one column but don't work with whole data frame.
So can anybody help me how to solve this problem?
df1 = {
'Name':['George','Andrea','micheal','maggie','Ravi',
'Xien','Jalpa'],
'Is_Male':[1,0,1,0,1,1,0]}
df1 = pd.DataFrame(df1,columns=['Name','Is_Male'])
Typecast to Categorical column in pandas
df1['Is_Male'] = df1.Is_Male.astype('category')

Find all min and max values in Column Python Panda Dataframe and save it in a new Dataframe

I want to find all the local min and maxima values in a column and save the whole row in a new dataframe.
See the example code below. I know we have groupy and likes.
How do I do it in a proper way and create the cycle, which should increase by 1? Lastly only take the time of the minimum and they save it.
import pandas as pd
import numpy as np
l = list(np.linspace(0,10,12))
data = [('time',l),
('A',[0,5,0.6,-4.8,-0.3,4.9,0.2,-4.7,0.5,5,0.1,-4.6]),
('B',[ 0,300,20,-280,-25,290,30,-270,40,300,-10,-260]),
]
df = pd.DataFrame.from_dict(dict(data))
print(df)
data_1 = [('cylce',[1,2,3]),
('delta_time',[2.727273,6.363636 ,10.000000]),
('A_max',[5,4.9,5]),
('A_min',[-4.8,-4.7,-4.6]),
('B_min',[-280,-270,-260]),
('B_max',[300,290,300]),
]
df_1 = pd.DataFrame.from_dict(dict(data_1))
print(df_1)
Any help is much appreciated.

Python Combining two columns into one based on each columns value

I am working from this dataset and I would like to combine yr_built and yr_renovated into one, preferably to yr_built, based on this: if the value in yr_renovated is bigger than 0, then I would like to have this value, otherwise the yr_built's value.
Can you please help me on this?
Thank you!
Here you go. You basically need pandas for the dataframe, then create a new column using numpy to check if the value of 'yr_renovated' is greater than zero else use 'yr_built'
import pandas as pd
import numpy as np
df = pd.read_csv('https://raw.githubusercontent.com/Jonasyao/Machine-Learning-Specialization-University-of-Washington-/master/Regression/Assignment_four/kc_house_data.csv', error_bad_lines=False)
df=df[['yr_built','yr_renovated','date','bedrooms']]
newdata['MyYear']=np.where(df['yr_renovated'] > 0,df['yr_renovated'],df['yr_built'])
newdata

Plot diagram in Pandas from CSV without headers

I am new to plotting charts in python. I've been told to use Pandas for that, using the following command. Right now it is assumed the csv file has headers (time,speed, etc). But how can I change it to when the csv file doesn't have headers? (data starts from row 0)
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.read_csv("P1541350772737.csv")
#df.head(5)
df.plot(figsize=(15,5), kind='line',x='timestamp', y='speed') # scatter plot
You can specify x and y by the index of the columns, you don't need names of the columns for that:
Very simple: df.plot(figsize=(15,5), kind='line',x=0, y=1)
It works if x column is first and y column is second and so on, columns are numerated from 0
For example:
The same result with the names of the columns instead of positions:
I may havve missinterpreted your question but II'll do my best.
Th problem seems to be that you have to read a csv that have no header but you want to add them. I would use this code:
cols=['time', 'speed', 'something', 'else']
df = pd.read_csv('useful_data.csv', names=cols, header=None)
For your plot, the code you used should be fine with my correction. I would also suggest to look at matplotlib in order to do your graph.
You can try
df = pd.read_csv("P1541350772737.csv", header=None)
with the names-kwarg you can set arbitrary column headers, this implies silently headers=None, i.e. reading data from row 0.
You might also want to check the doc https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
Pandas is more focused on data structures and data analysis tools, it actually supports plotting by using Matplotlib as backend. If you're interested in building different types of plots in Python you might want to check it out.
Back to Pandas, Pandas assumes that the first row of your csv is a header. However, if your file doesn't have a header you can pass header=None as a parameter pd.read_csv("P1541350772737.csv", header=None) and then plot it as you are doing it right now.
The full list of commands that you can pass to Pandas for reading a csv can be found at Pandas read_csv documentation, you'll find a lot of useful commands there (such as skipping rows, defining the index column, etc.)
Happy coding!
For most commands you will find help in the respective documentation. Looking at pandas.read_csv you'll find an argument names
names : array-like, default None
List of column names to use. If file contains no header row, then you should explicitly
pass header=None.
So you will want to give your columns names by which they appear in the dataframe.
As an example: Suppose you have this data file
1, 2
3, 4
5, 6
Then you can do
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("data.txt", names=["A", "B"], header=None)
print(df)
df.plot(x="A", y="B")
plt.show()
which outputs
A B
0 1 2
1 3 4
2 5 6

Categories

Resources