Hi so I have a function that plots timeseries data for a given argument (in my case its a country name). Now some of the columns have na values and when i try to plot them I cant because of thos NaN values. How can I solve this problem?
This is the code, which gets you dataframe and function im using:
url2='https://spreadsheets.google.com/pub?key=phAwcNAVuyj1jiMAkmq1iMg&output=xls'
source=io.BytesIO(requests.get(url2).content)
income=pd.read_excel(source)
income.head()
income.set_index("GDP per capita", inplace=True)
def gdpchange(country):
dfff=income.loc[country]
dfff.T.plot(kind='line')
plt.legend([country])
Now if I want to plot all of them on one graph it gives an error because of nan values in some columns. Any suggestions?
for ctr in income.index.values:
gdpchange(ctr)
You have to drop all nan values with pandas.dropna():
income.dropna(inplace=True)
This statement drops all rows that have any nan values in income dataframe.
Related
I want to create a graph with lines represented by my label
so in this example picture, each line represents a distinct label
The data looks something like this where the x-axis is the datetime and the y-axis is the count.
datetime, count, label
1656140642, 12, A
1656140643, 20, B
1656140645, 11, A
1656140676, 1, B
Because I have a lot of data, I want to aggregate it by 1 hour or even 1 day chunks.
I'm able to generate the above picture with
# df is dataframe here, result from pandas.read_csv
df.set_index("datetime").groupby("label")["count"].plot
and I can get a time-range average with
df.set_index("datetime").groupby(pd.Grouper(freq='2min')).mean().plot()
but I'm unable to get both rules applied. Can someone point me in the right direction?
You can use .pivot (documentation) function to create a convenient structure where datetime is index and the different labels are the columns, with count as values.
df.set_index('datetime').pivot(columns='label', values='count')
output:
label A B
datetime
1656140642 12.0 NaN
1656140643 NaN 20.0
1656140645 11.0 NaN
1656140676 NaN 1.0
Now when you have your data in this format, you can perform simple aggregation over the index (with groupby / resample/ whatever suits you) so it will be applied each column separately. Then plotting the results is just plotting different line for each column.
I am trying to replace NaN value in my 'price' column of my dataset, I tried using:
avg_price = car.groupby('make')['price'].agg(np.mean) # calculating average value of the price of each car company model
new_price= car['price'].fillna(avg_price,inplace=True)
car['price']=new_price
The code runs well without any error, but on checking, I can still see the NaN values in the dataset. Dataset snap shot is attached below:
Are you trying to fill the NaN with a grouped (by make) average? Will this work?
df.loc[df.price.isnull(), 'price'] = df.groupby('make').price.transform('mean')
Good evening everyone!
I have a problem with NaN values in python with pandas.
I am working on database with information on different countries. I cannot get rid of all of my NaN values altogether or I would lose too much data.
I wish to replace the NaN values based on some condition.
The dataframe I am working on
What I would like is to create a new column that would take the existing values of a column (Here: OECDSTInterbkRate) and replace all its NaN values based on a specific condition.
For example, I want to replace the NaN corresponding to Australia with the moving average of the values I already have for Australia.
Same thing for every other country for which I am missing values (Replace NaN observations in this column for France by the moving average of the values I already have for France, etc.).
What piece of code do you think I could use?
Thank you very much for your help !
Maybe you can try something like this df.fillna(df.mean(), inplace=True)
Replace df.mean() with your mean values.
So, what i am trying to do, is complete the NaN values of a Dataframe with the correct values that are to be found in a second dataframe. It would be something like this
df={"Name":["Lennon","Mercury","Jagger"],"Band":["The Beatles", "Queen", NaN]}
df2={"Name":["Jagger"],"Band":["The Rolling Stones"]}
So, I have this command to know which rows have at least one NaN:
inds = list(pd.isnull(dfinal).any(1).nonzero()[0].astype(int))
I thought this would be useful to use a for like function (didn't succeed there)
And then I tried this:
result=df.join(dfinal, on=["Name"])
But it gives me the following error
ValueError: You are trying to merge on object and int64 columns. If
you wish to proceed you should use pd.concat
I checked, and both Series "Name" are string values. So i am unable to solve this.
Keep in mind there are more columns, and the likely result it would be that if a row has one NaN, it will have like 7 NaN.
It is there a way to solve this?
Thanks in advance!
Map and Fillna()
we can target missing values in your target df with missing values from the second dataframe based on the Name column.
df["Band"] = df["Band"].fillna(df["Name"].map(df2.set_index("Name")["Band"]))
print(df)
Name Band
0 Lennon The Beatles
1 Mercury Queen
2 Jagger The Rolling Stones
I have an original data frame with information from real estate properties. To fill nan values in the column price per m2 in usd I have made a multi-index pivot table that has the mean of the price per m2 sliced by property type, place and surface covered in m2.
Now, I want to iterate in the original data frameĀ“s column price per m2 in usd to fill nan values with the ones I created in the pivot table.
Pivot table code:
df6 = df4.pivot_table( values=['price_usd_per_m2'],
index=['cuartiles_superficie_cubierta'],
columns=['localidad','property_type'],
aggfunc=['mean'])
I'm not sure my understanding is correct or not, do you mind to show how does your data table look like? Based on your question, you have done the calculation on mean values, and you wish to replace the NaN in the original table before pivoted.
Possible if you fill up NaN with the mean values in the pivoted table, then only transform back to the original structure as you wish?
Apologize if my answer not helping, I just wish to learn how to solving problem. I will also learn from other who giving advise on this question.