dataframe values converted as 'nan' after applied df.iloc() - python

nan values
I ran into a problem after runnning: pd.DataFrame(), the whole data-frame became 'nan' (empty). I could not reverse this again. I also assigned the data-frame columns names, but their values also disappeared:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('PuntaCapi.csv', header=None, sep='\n')
df = df[0].str.split(',', expand=True)
df.to_csv("PuntaCapi.tab",sep="\t",header=None, index=False)
print(df)
Akim =df.iloc[:,0:1]
A= pd.DataFrame(data =Akim ,columns=['Akim'])
veriler2 = pd.DataFrame(data = df, columns=['Akim','Kuvvet','Zaman','Soguma','Yaklasma','Baski','SacKalinliği','PuntaCapi'])
print(veriler2)
Please view the following results from the above mentioned code:
[![Spyder View DataFrame code [][2]][2]1
There is no nan value into the csv file. But after .iloc[], entire dataframe became nan value. I have tried solve the problem but I could not. I need help to solve problem
enter image description here

I do not understand your question.
You read data using pd.read_csv('PuntaCapi.csv', header=None, sep='\n') and save it as df, but you modify df as df[0].str.split(',', expand=True), which directly impact on the result.
Try this code.
df = pd.read_csv('PuntaCapi.csv', header=None, sep='\n')
veriler2 = pd.DataFrame(data = df.values, columns=['Akim','Kuvvet','Zaman','Soguma','Yaklasma','Baski','SacKalinliği','PuntaCapi'])

Related

DataFrame returns empty after .update()

I am trying to create a new DataFrame which contains a calculation from an original DF.
To that purpose, I run a for loop with the calc for each column, but I am still getting the empty original DF and I don't see where is the source of the error.
May I ask for some help here?
import yfinance as yf
import pandas as pd
df = yf.download(["YPFD.BA", "GGAL.BA"], period='6mo')
df2 = pd.DataFrame()
for i in ["YPFD.BA", "GGAL.BA"]:
df2.update(df["Volume"][i] * df["Close"][i])
df2
I expected to create a new DF which contains the original index but with the calculation obtained from original DF
I think this is what you are looking to do:
import yfinance as yf
import pandas as pd
df = yf.download(["YPFD.BA", "GGAL.BA"], period='6mo')
df2 = pd.DataFrame()
for i in ["YPFD.BA", "GGAL.BA"]:
df2[i] = df["Volume"][i] * df["Close"][i]
df2

How to assign new column to existing DataFrame in pandas

I'm new to pandas. I'm trying to add new columns to my existing DataFrame but It's not getting assigned don't know why can anyone explain me what I'm missing this is what i tried
import pandas as pd
df = pd.DataFrame(data = {"test":["mkt1","mkt2","mkt3"],
"test2":["cty1","cty2","cty3"]})
print("Before",df.columns)
df.assign(test3="Hello")
print("After",df.columns)
Output
Before Index(['test', 'test2'], dtype='object')
After Index(['test', 'test2'], dtype='object')
Pandas assign method returns a new modified dataframe with a new column, it does not modify it in place.
import pandas as pd
df = pd.DataFrame(data = {"test":["mkt1","mkt2","mkt3"],
"test2":["cty1","cty2","cty3"]})
print("Before",df.columns)
df = df.assign(test3="Hello") # <--- Note the variable reassingment
print("After",df.columns)

When trying to drop a column from my dataset using pandas, i get the error "['churn'] not found in axis"

I want the x to be all the columns except "churn" column.
But when i do the below i get the "['churn'] not found in axis" error, eventhough i can see the column name when i write "print(list(df.column))"
Here is my code:
import pandas as pd
import numpy as np
df = pd.read_csv("/Users/utkusenel/Documents/Data Analyzing/data.csv", header=0)
print(df.head())
print(df.columns)
print(len(df.columns))
x = df.drop(["churn"], axis=1) ## this is the part it gives the error
I am adding a snippet of my dataset as well:
account_length;area_code;international_plan;voice_mail_plan;number_vmail_messages;total_day_minutes;total_day_calls;total_day_charge;total_eve_minutes;total_eve_calls;total_eve_charge;total_night_minutes;total_night_calls;total_night_charge;total_intl_minutes;total_intl_calls;total_intl_charge;number_customer_service_calls;churn;
1;KS;128;area_code_415;no;yes;25;265.1;110;45.07;197.4;99;16.78;244.7;91;11.01;10;3;2.7;1;no
2;OH;107;area_code_415;no;yes;26;161.6;123;27.47;195.5;103;16.62;254.4;103;11.45;13.7;3;3.7;1;no
3;NJ;137;area_code_415;no;no;0;243.4;114;41.38;121.2;110;10.3;162.6;104;7.32;12.2;5;3.29;0;no
I see that your df snippet is separeted with ';' (semi colon). If that is what your actual data looks like, then probably your csv is being read wrong. Please try adding sep=';' to read_csv function.
df = pd.read_csv("/Users/utkusenel/Documents/Data Analyzing/data.csv", header=0, sep=';')
I also suggest print df.columns again and check if there is a leading or trailing whitespace in the column name for churn.

pandas dataframe replace function

I want to replace "?" with NaN in Python.
The following code does not work, and I am not sure what is the reason.
import pandas as pd;
import numpy as np;
col_names = ['BI_RADS', 'age','shape','margin','density','severity']
dataset = pd.read_csv('mammographic_masses.data.txt', names = col_names)
dataset.replace("?", np.NaN)
After executing the above code, I still get those question marks in the dataset.
The format of the dataset looks like the followings:
5,67,3,5,3,1
4,43,1,1,?,1
5,58,?,5,3,1
4,28,1,1,3,0
5,74,1,5,?,1
Use inplace=True
Ex:
dataset.replace("?", np.NaN, inplace=True)

Sum all columns in each column in csv using Pandas

The program I have written generally has done what I've wanted it to do - for the most part. To add totals of each column. My dataframe uses the csv file format. My code is below:
import pandas as pd
import matplotlib.pyplot
class ColumnCalculation:
"""This houses the functions for all the column manipulation calculations"""
def max_electricity(self):
df.set_index('Date', inplace=True)
df.loc['Total'] = df.sum()
print(df)
df = pd.read_csv("2011-onwards-city-elec-consumption.csv")
ColumnCalculation.max_electricity(df)
Also my dataset (I didn't know how to format it properly)
The code nicely adds up all totals into a total column at the bottom of each column, except when it comes to the last column(2017)(image below):
I am not sure the program does is, I've tried to use different formatting options like .iloc or .ix but it doesn't seem to make a difference. I have also tried adding each column individually (below):
def max_electricity(self):
df.set_index('Date', inplace=True)
df.loc['Total', '2011'] = df['2011'].sum()
df.loc['Total', '2012'] = df['2012'].sum()
df.loc['Total', '2013'] = df['2013'].sum()
df.loc['Total', '2014'] = df['2014'].sum()
df.loc['Total', '2015'] = df['2015'].sum()
df.loc['Total', '2016'] = df['2016'].sum()
df.loc['Total', '2017'] = df['2017'].sum()
print(df)
But I receive an error, as I assume this would be too much? I've tried to figure this out for a good hour and a bit.
Your last column isn't being parsed as floats, but strings.
To fix this, try casting to numeric before summing:
import locale
locale.setlocale(locale.LC_NUMERIC, '')
df['2017'] = df['2017'].map(locale.atoi)
Better still, try reading in the data as numeric data. For example:
df = pd.read_csv('file.csv', sep='\t', thousands=',')

Categories

Resources