import pandas as pd
I need to create a Dataframe outside a function.Like a global dataframe.And make the header of the dataframe.
import pandas as pd
import datetime
global df = pd.DataFrame(columns = ['Time','Call'])
Now I have 2 functions as below.
def a():
#Checking whether the df is available
if df is None:
df = pd.DataFrame(columns = ['Time','Call'])
#Appending
now = datetime.datetime.now()
timestamp1 = now.strftime("%Y-%m-%d %H:%M:%S")
call_num = 1
df = df.append({'Time':'timestamp1','Call': call_num}, ignore_index=True)
below is the main function.
def main():
a()
print(1) #Do something
a()
main()
Mainly I have a requirement that no values should be passed from a() which is in the main() function.
How to achieve this.Print two date time information in the data frame.
My current code does not work and giving an error.
Related
I'm trying to export dataframes that are iteratively created based on the column value. The idea is that I would use both the column value to dictate the folder as well as filtering the dataframe.
In order to create the dataframes iteratively I'm using exec(). The example follows below. The idea would be to be able to run iteratively the creation of df.to_json('dfName/'+datetime.today().strftime('%d-%m-%Y')+'.json') where the dfName would change iteratively to a, b, c. I'm sorry if this is a duplicate I didn't seem to find anything of sorts so far
from datetime import datetime
import pandas as pd
data1 = ['a', 'a', 'a','b','b','b','c','c','c']
data2 = [1,2,3,4,5,6,7,8,9]
data3 = [10,11,12,13,14,15,16,17,18]
data = {
'Name':data1,
'data2':data2,
'data3':data3}
df = pd.DataFrame(data)
for test in df.Name.unique():
exec(test + "=df[df['Name'] == test]")
You can do it without filters using groupby():
from datetime import datetime
import pandas as pd
data1 = ['a', 'a', 'a','b','b','b','c','c','c']
data2 = [1,2,3,4,5,6,7,8,9]
data3 = [10,11,12,13,14,15,16,17,18]
data = {
'Name':data1,
'data2':data2,
'data3':data3}
df = pd.DataFrame(data)
for name, n_df in df.groupby('Name'):
# do what you need... n_df.to_csv() etc...
print(name)
print(n_df)
Newbie at dealing with classes.
I have some dataframe objects I want to transform, but I'm having trouble manipulating them with classes. Below is an example. The goal is to transpose a dataframe and reassign it to its original variable name. In this case, the dataframe is assets.
import pandas as pd
from requests import get
import numpy as np
html = get("https://www.cbn.gov.ng/rates/Assets.asp").text
table = pd.read_html(html,skiprows=[0,1])[2]
assets = table[1:13]
class Array_Df_Retitle:
def __init__(self,df):
self.df = df
def change(self):
self.df = self.df.transpose()
self.df.columns = self.df[0]
return self.df
However, calling assets = Array_Df_Retitle(assets).change() simply yields an error:
KeyError: 0
I'd like to know where I'm getting things wrong.
I made a few changes to your code. The problem is coming from self.df[0]. This means you are selecting the column named 0. However, after transposing, you will not have any column named 0. You will have a row instead.
import pandas as pd
from requests import get
import numpy as np
html = get("https://www.cbn.gov.ng/rates/Assets.asp").text
table = pd.read_html(html,skiprows=[0,1])[2]
assets = table[1:13]
class Array_Df_Retitle:
def __init__(self,df):
self.df = df
def change(self):
self.df = self.df.dropna(how='all').transpose()
self.df.columns = self.df.loc[0,:]
return self.df.drop(0).reset_index(drop=True)
Array_Df_Retitle(assets).change()
I have 8 functions that I would like to run under one main() function. The process starts with importing from a file and creating a df and then doing some cleaning operations on that df under a new function. I have copied in the basic structure including the three starting functions and then a main() function. What I am unsure about is how to 'carry' the result of loader() to clean_data() and then the result of clean_data() to operation_one() in the right way. At the moment I get an error that df is not defined. Thank you for your help!
def loader():
import pandas as pd
import numpy as np
df = pd.read_excel('file_example.xlsx')
return df
def clean_data():
del df['column_7']
return df
def operation_one():
del df['column_12']
return df
def main():
loader()
clean_data()
operation_one()
with pd.ExcelWriter(file.xlsx") as writer:
df.to_excel(writer, sheet_name='test' , index=False)
if __name__ == "__main__":
main()
So your main function just tells the other functions to run. Functions have their own variables that are kept within the function that defines them. So when def loader() runs is returns the value of df to the line that ran the function, within def main(): To store that value in the main function just put df = loader() in the main function. And when you call the new functions you need to pass this value into them for them to preform on the value of df. So when you call the next function in your main function, add df to the input field. clean_data(df). Then your clean data function will take in the value of df. You now need to redefine your def clean_data(): to take a variable like this, def clean_data(df):
This is what I have a bit cleaned up,
import pandas as pd
import numpy as np
def loader():
df = pd.read_excel('file_example.xlsx')
return df
def clean_data(df):
del df['column_7']
return df
def operation_one(df):
del df['column_12']
return df
def main():
df = loader()
df = clean_data(df)
df = operation_one(df)
with pd.ExcelWriter("file.xlsx") as writer:
df.to_excel(writer, sheet_name='test', index=False)
if __name__ == "__main__":
main()
I hope this was somewhat helpful as it is my first question answered here.
You need to make sure to assign variables for the function return values. That is how you "carry" the result. You also need to pass in those variables as function arguments as you proceed. Adding a function parameter for the filename in loader() rather than hardcoding the file in the function is probably something you'll want to think about too.
import pandas as pd
import numpy as np
def loader():
df = pd.read_excel('file_example.xlsx')
return df
def clean_data(df):
del df['column_7']
return df
def operation_one(df):
del df['column_12']
return df
def main():
df = loader()
df = clean_data(df)
df = operation_one(df)
with pd.ExcelWriter("file.xlsx") as writer:
df.to_excel(writer, sheet_name='test' , index=False)
if __name__ == "__main__":
main()
This question may have been asked for fundamentals of Python, unfortunately, I spent an hour looking for the answer, but couldn't find it. So I am hoping for someone's input. I am used to writing Class where I can give self and get the variable into the def function from another function. How do I capture that variable without writing a Class function? Is there a way? Thanks!
import pandas as pd
file_Name = 'test.xlsx'
def read_file():
df = pd.read_excel(file_Name)
return df
read_file()
def clean_data():
text_data = df['some_column_name'].str.replace(';',',') # How to get df from read_file() function?
return text_data
clean_data()
You're overthinking it:
df = read_file()
clean_data() # Uses the global variable df capturing the return value of read_file
Or course, clean_data should take an argument rather than using a global variable.
def clean_data(f):
text_data = f['some_column_name'].str.replace(';', ',')
return text_data
f = read_file()
clean_data(f)
Call the first function and save the returned dataframe in a variable df. Then call the second function (clean_data) and pass this df inside it as argument.
Use this:
import pandas as pd
file_Name = 'test.xlsx'
import pandas as pd
def read_file():
df = pd.read_excel(file_Name)
return df
df = read_file()
def clean_data(df):
text_data = df['some_column_name'].str.replace(';', ',')
return text_data
clean_data()
In general... you can use global variables. But with how your method is set up, you should just do
df = read_file()
inside of your clean_data() method. Then use df from there. Notice df is just the local name for the result of calling read_file(), you can call it anything.
I am trying to put together a pandas dataframe for a school project with but to do so I am hitting an api repeatedly. I can't figure out excatly why I am returning the same dataframe over and over, sans the column title, any help much appreciated.
Code is as follows:
a.py
import json
import requests
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
tmp = []
tmp_1 = []
def fetchdata(ticker):
url = 'https://api.iextrading.com/1.0/stock/'
time = '/chart/5y'
get = url + ticker + time
data = requests.get(get).json()
length = len(data)
# i = i + 1
for j in range(0, length):
date = data[j]['date']
closing = data[j]['close']
x = tmp.append(date)
y = tmp_1.append(closing)
df = pd.DataFrame(x)
df[ticker] = tmp_1
df_1 = df.loc[1:1000]
return df_1
b.py
import pandas as pd
import numpy as np
from slizzy import fetchdata
df_appl_1 = fetchdata('aapl')
df_appl_2 = fetchdata('aapl')
df_appl_3 = fetchdata('aapl')
df_gold = fetchdata('gld')
print df_appl_1
print df_gold
Move your list declarations into your function:
def fetchdata(ticker):
tmp = []
tmp_1 = []
As it stands, after the first call to your function, these lists are not cleared out (because they're globals), so you successively query the same 1000 elements each time.