I have an Excel file with several columns.
From this columns I want to plot columns which have a name like this:
IVOF_1_H, IVOF_1_L, IVOF_2_H, IVOF_2_L,...those columns will be on y axis. For the x axis the column will always be the same
I do not know how many of those columns I have in the file. I only know that the number is increasing. Is there any possibility to check how many of those IVOF columns I have and plot them.
In general, there is a limitation of those IVOF columns and I don't mind to set up my script in a way that all of those columns got plotted (if they are existing), but then I don't know how to avoid the code to crash if one of those columns is missing.
You can filter your data frame by its column name:
import pandas as pd
df = pd.read_excel('sample.xlsx')
df = df.filter(regex=("IVOF.*"))
#plot the first row
df.iloc[0].plot(kind="bar")
#plot all rows
df.plot(kind="bar")
simple example:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[2,4,4],[4,3,3],[5,9,1]]),columns=['A','B1','B2'])
df = df.filter(regex=("B.*"))
df.plot(kind="bar")
The result:
Related
I am trying to create a dataframe where the column lengths are not equal. How can I do this?
I was trying to use groupby. But I think this will not be the right way.
import pandas as pd
data = {'filename':['file1','file1'], 'variables':['a','b']}
df = pd.DataFrame(data)
grouped = df.groupby('filename')
print(grouped.get_group('file1'))
Above is my sample code. The output of which is:
What can I do to just have one entry of 'file1' under 'filename'?
Eventually I need to write this to a csv file.
Thank you
If you only have one entry in a column the other will be NaN. So you could just filter the NaNs by doing something like df = df.at[df["filename"].notnull()]
I try to analyse wind speed data from a lidar, creating a dataframe in which the columns are the investigated heights and the row is the number of NaNs at that elevation. My script creates the dataframe and names the columns as it required but it doesn't write the number of NaNs in the corresponding cells. Any idea what the problem might be?
df=pd.read_csv(fileApath,delimiter=',',skiprows=1)
heights = ['123','98','68','65','57','48','39','38','29','18','10']
nanvalues_speed=pd.DataFrame()
for i in heights:
nanvalues_speed[i+'m']=pd.notnull(df['Horizontal Wind Speed (m/s) at '+i+'m']).sum()
The function you are looking for is pandas.DataFrame.isna()
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isna.html
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': [1,2,3,np.nan ,5],
'b': ['a',np.nan ,'c','d','e']})
df.isna().sum()
The pandas.DataFrame.isna() funtion returns a boolean same-sized object indicating if the values in the DataFrame are NA.
I'm trying to learn python and have been trying to figure out how to create a sum column of my data. I want to sum all other columns. I create the new column but all sum values are zero. The data can be found here. My code is below, thank you for the help:
import pandas as pd
#Importing csv file to chinaimport_df datafram
filename=r'C:\Users\Ing PC\Documents\Intro to Data Analysis\Final Project\CHINA_DOLLAR_IMPORTS.csv'
chinaimport_df = pd.read_csv(filename)
# Removing all rows that contain only zeros, thresh since since first column is words
chinaimport_df = chinaimport_df.dropna(how='all',axis=0, thresh=2)
#Convert NANs to zeros
chinaimport_df=chinaimport_df.fillna(0)
#create a list of columns excluding the first column, to make sum func work later
col_list= list(chinaimport_df)
col_list.remove('Commodity')
print(col_list)
#adding column that sums
chinaimport_df['Total'] = chinaimport_df[col_list].sum(axis=1)
chinaimport_df.to_csv("output.csv", index=False)
IIUC this should do it.
import pandas as pd
df = pd.read_csv('CHINA_DOLLAR_IMPORTS.csv')
df['Total'] = df.replace(r',',"", regex=True).iloc[:, 1:].astype(float).sum(axis=1)
df.to_csv('output.csv', index=False)
I am new to plotting charts in python. I've been told to use Pandas for that, using the following command. Right now it is assumed the csv file has headers (time,speed, etc). But how can I change it to when the csv file doesn't have headers? (data starts from row 0)
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.read_csv("P1541350772737.csv")
#df.head(5)
df.plot(figsize=(15,5), kind='line',x='timestamp', y='speed') # scatter plot
You can specify x and y by the index of the columns, you don't need names of the columns for that:
Very simple: df.plot(figsize=(15,5), kind='line',x=0, y=1)
It works if x column is first and y column is second and so on, columns are numerated from 0
For example:
The same result with the names of the columns instead of positions:
I may havve missinterpreted your question but II'll do my best.
Th problem seems to be that you have to read a csv that have no header but you want to add them. I would use this code:
cols=['time', 'speed', 'something', 'else']
df = pd.read_csv('useful_data.csv', names=cols, header=None)
For your plot, the code you used should be fine with my correction. I would also suggest to look at matplotlib in order to do your graph.
You can try
df = pd.read_csv("P1541350772737.csv", header=None)
with the names-kwarg you can set arbitrary column headers, this implies silently headers=None, i.e. reading data from row 0.
You might also want to check the doc https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
Pandas is more focused on data structures and data analysis tools, it actually supports plotting by using Matplotlib as backend. If you're interested in building different types of plots in Python you might want to check it out.
Back to Pandas, Pandas assumes that the first row of your csv is a header. However, if your file doesn't have a header you can pass header=None as a parameter pd.read_csv("P1541350772737.csv", header=None) and then plot it as you are doing it right now.
The full list of commands that you can pass to Pandas for reading a csv can be found at Pandas read_csv documentation, you'll find a lot of useful commands there (such as skipping rows, defining the index column, etc.)
Happy coding!
For most commands you will find help in the respective documentation. Looking at pandas.read_csv you'll find an argument names
names : array-like, default None
List of column names to use. If file contains no header row, then you should explicitly
pass header=None.
So you will want to give your columns names by which they appear in the dataframe.
As an example: Suppose you have this data file
1, 2
3, 4
5, 6
Then you can do
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("data.txt", names=["A", "B"], header=None)
print(df)
df.plot(x="A", y="B")
plt.show()
which outputs
A B
0 1 2
1 3 4
2 5 6
I have a csv file with 8 columns in it. I want to plot a graph between 2 columns using matplotlib. One of the columns has repetitive values. I want to take the mean of the values from the other column which has same corresponding value in the first column.
How can I do it?
This isn't really specific to matplotlib. Pandas has nice support for this kind of data mangling. Read your csv file into a Pandas dataframe:
import pandas as pd
df = pd.read_csv('data.csv')
Then, assuming the column you want to group by is named 'key' and the column whose values you want to take means of is named 'value', you can do:
grouped = df.groupby('key').mean()
grouped.plot('value')