python Dataframe using pandas data insert into excel file - python

I have received a data frame using pandas, data have one column and multiple rows in that column
and each row has multiple data like ({buy_quantity:0, symbol:nse123490,....})
I want to insert it into an excel sheet using pandas data frame with python xlwings lib. with some selected data please help me
wb = xw.Book('Easy_Algo.xlsx')
ts = wb.sheets['profile']
pdata=sas.get_profile()
df = pd.DataFrame(pdata)
ts.range('A1').value = df[['symbol','product','avg price','buy avg']]
output like this :
please help me... how to insert data into excel only selected.

Considering that the dataframe below is named df and the type of the column positions is dict, you can use the code below to transform the keys to columns and values to rows.
out = df.join(pd.DataFrame(df.pop('positions').values.tolist()))
out.to_excel('Easy_Algo.xlsx', sheet_name=['profile'], index=False) #to store the result in an Excel file/spreadsheet.
Note : Make sure to add these two lines below if the type of the column positions is not dict.
import ast
df['positions']=df['positions'].apply(ast.literal_eval)
#A sample dataframe for test :
import pandas as pd
import ast
string_dict = {'{"Symbol": "NIFTY2292218150CE NFO", "Produc": "NRML", "Avg. Price": 18.15, "Buy Avg": 0}',
'{"Symbol": "NIFTY22SEP18500CE NFO", "Produc": "NRML", "Avg. Price": 20.15, "Buy Avg": 20.15}',
'{"Symbol": "NIFTY22SEP16500PE NFO", "Produc": "NRML", "Avg. Price": 16.35, "Buy Avg": 16.35}'}
df = pd.DataFrame(string_dict, columns=['positions'])
df['positions']=df['positions'].apply(ast.literal_eval)
out = df.join(pd.DataFrame(df.pop('positions').values.tolist()))
>>> print(out)
Symbol Produc Avg. Price Buy Avg
0 NIFTY22SEP16500PE NFO NRML 16.35 16.35
1 NIFTY22SEP18500CE NFO NRML 20.15 20.15
2 NIFTY2292218150CE NFO NRML 18.15 0.00

If i understood correctly, you want only those columns written to an excel file
df = df[['symbol','product','avg price','buy avg']]
df.to_excel("final.xlsx")
df.to_excel("final.xlsx", index = False) # in case there was a default index generated by pandas and you wanna get rid of it.
i hope this helps.

Related

Convert "Price" column values in a CSV file from "46.25 lacs" to 4625000 using python

I have a CSV file. And in this CSV file I have a column named Price. These price values are like 6.35 crore and 27.2 lacs. I want to convert these values to actual values (63500000 and 2720000) with integer data type.
This is what I have done till now
import pandas as pd
df = pd.DataFrame({'selling_price' : ['5.5 Lakh*', '5.7 Lakh*', '3.5 Lakh*', '3.15 Lakh*'],
'new-price':['Rs.7.11-7.48 Lakh*','Rs.10.14-13.79 Lakh*','Rs.5.16-6.94 Lakh*','Rs.6.54-6.63 Lakh*',]})
df = pd.DataFrame({'selling_price' :[int(float(str(x).strip(' Lakh*'))*100000) for x in df['selling_price'].to_list()]})
print(df)
This gives me actual values. But I cannot figure out, how to apply it on a CSV file. Improvement or any better solution would be high appreciated. Thanks.

How to filter out Column data From Multiple rows data?

Good Evening
Hi everyone, so i got the following JSON file from Walmart regarding their product items and price.
so i loaded up jupyter notebook, imported pandas and then loaded it into a Dataframe with custom columns as shown in the pics below.
now this is what i want to do:
make new columns named as min price and max price and load the data into it
how can i do that ?
Here is the code in jupyter notebook for reference.
i also want the offer price as some items dont have minprice and maxprice :)
EDIT: here is the PYTHON Code:
import json
import pandas as pd
with open("walmart.json") as f:
data = json.load(f)
walmart = data["items"]
wdf = pd.DataFrame(walmart,columns=["productId","primaryOffer"])
print(wdf.loc[0,"primaryOffer"])
pd.set_option('display.max_colwidth', None)
print(wdf)
Here is the JSON File:
https://pastebin.com/sLGCFCDC
The following code snippet on top of your code would achieve the required task:
min_prices = []
max_prices = []
offer_prices = []
for i,row in wdf.iterrows():
if('showMinMaxPrice' in row['primaryOffer']):
min_prices.append(row['primaryOffer']['minPrice'])
max_prices.append(row['primaryOffer']['maxPrice'])
offer_prices.append('N/A')
else:
min_prices.append('N/A')
max_prices.append('N/A')
offer_prices.append(row['primaryOffer']['offerPrice'])
wdf['minPrice'] = min_prices
wdf['maxPrice'] = max_prices
wdf['offerPrice'] = offer_prices
Here we are checking for the 'showMinMaxPrice' element from the json in the column named 'primaryOffer'. For cases where the minPrice and maxPrice is available, the offerPrice is shown as 'N/A' and vice-versa. These are first stored in lists and later added to the dataframe as columns.
The output for wdf.head() would then be:

Load csv into pandas dataframe from Pydrill Query

I am able to load a csv into pandas dataframe, but it is stuck in a list. How can I load directly into a pandas dataframe from Pydrill or unlist the pandas dataframe columns and data? I've tried unlisting and it puts everything into a list of a list.
I've used the to_dataframe(), but can't seem to find documentation on if I can use a delimeter. pd.dataframe doesn't work because of the Pydrill query.
reviews = drill.query("SELECT * FROM hdfs.datasets.`titanic_ML/titanic.csv` LIMIT 1000", timeout=30)
print(reviews)
import pandas as pd
df2 = reviews.to_dataframe()
df2.rename(columns=df2.iloc[0])
headers = df2.iloc[0]
print(headers)
new_df = pd.DataFrame(df2.values[1:], columns=headers)
new_df.head()
The results cast everything into a list.
["pclass","sex","age","sibsp","parch","fare","embarked","survived"]
0 ["3","1","38.0","0","0","7.8958","1","0"]
1 ["1","1","42.0","0","0","26.55","1","0"]
2 ["3","0","9.0","4","2","31.275","1","0"]
3 ["3","1","27.0","0","0","7.25","1","0"]
4 ["1","1","41.0","0","0","26.55","1","0"]
I'd like to get everything into a normal pandas dataframe.
The solution I found was this:
it doesn't unlist the dataframe, but it's an alternate solution to the problem.
connect_str = "dbname='dbname' user='dsa_ro_user'
conn = psycopg2.connect(connect_str) host='host database'
SQL = "SELECT * "
SQL += " FROM train"
df = pd.read_sql(SQL,conn)
df.head()
Try using Table Functions as described in O’Reily Text: Chapter 4. Querying Delimited Data. This will delimit the file and apply the first row to your columns. Note: because everything is being read as text, you may need to cast your values as floats if you want to do arithmetic in your select or where.
This should get you what you want:
sql="""
SELECT *
FROM table(hdfs.datasets.`/titanic_ML/titanic.csv`(
type => 'text',
extractHeader => true,
fieldDelimiter => ',')
) LIMIT 1000
"""
rows = drill.query(sql, timeout=30)
df = rows.to_dataframe()
df.head()

Loop for multiple dataframes with a function

I tried to run a function through multiple data frames, but I have a problem with it. My main questions are:
1) I tried to run a defined function with zip(df1, df2, df3,...) and the outputs are new DF1, DF2, DF3,...; however, I failed. Is it possible to run a function through multiple dataframes and outputs are also dataframes by "zip"?
2) If zip() is not a choice, how do I do to make my function running in a loop? Currently, I just have three dataframes and they are easy to be done separately. But I would like to know how to handle it when I have 50, 100, or even more dataframes.
Here are my codes:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
#import scipy.stats as ss
# *********** 3 City Temperature files from NOAA ***********
# City 1
df1 = pd.pandas.read_csv('https://docs.google.com/spreadsheets/d/1Uj5N363dEVJZ9WVy2a_kkbJKJnyyE5qnEqOfzO0UCQE/gviz/tq?tqx=out:csv')
# City 2
df2 = pd.pandas.read_csv('https://docs.google.com/spreadsheets/d/13CgTdDCDzB_3WIYIRVMeLu6E36xzHSzRR5T_Ku0vThA/gviz/tq?tqx=out:csv')
# City 3
df3 = pd.pandas.read_csv('https://docs.google.com/spreadsheets/d/17pNZFIaV_NpQfSed-msIGu9jzzqF6JBvCZrBRiU2ZkQ/gviz/tq?tqx=out:csv')
def CleanDATA(data):
data = data.drop(columns=['Annual'])
data = data.drop(data.index[29:-1])
data = data.drop(data.index[-1])
monthname=[]
Temp=[]
for row in range(0,len(data)):
for col in range(1,13):
#monthname.append(str(col)+"-"+str(data['Year'][row]))
monthname.append(str(data['Year'][row])+str(col))
Temp.append(data.iloc[row,col])
df0=pd.DataFrame()
df0['Month']=monthname
df0['Temperature']=Temp
df0['Month']=pd.to_datetime(df0['Month'],format='%Y.0%m') #change the date form
df0['Month'] = pd.to_datetime(df0['Month']).dt.date # remove time, only keep date
data =df0[df0.applymap(np.isreal).all(1)] # remove non-numerical
return data
data1 = CleanDATA(df1)
data2 = CleanDATA(df2)
data3 = CleanDATA(df3)
Also, I found an issue with Pandas while reading the following excel file:
https://drive.google.com/file/d/1V9fKpACbLrSi0NfB0FHSgc96PQerKkUF/view?usp=sharing (This is city 1 temperature data from 1990-2019)
2019 is ongoing, hence, NOAA stations only provide information till this May. The excel data labels all missing data by "M". I noticed that once the column comes with an "M", I cannot use boxplot directly even I already drop 2019 row. Spyder console will say "items [Jun to Dec]" are missing (and the wired thing is I can use the same data to plot XY line plot). To plot the boxplot, I have to manually remove 2019 information (1 row) in excel than read the new file.
I would do it using dictionaries (or lists or other iterable).
cities = {'city1': 'https://...', 'city2': 'https://...', 'city3': 'https://...'}
df = {}
data = {}
for city, url in iteritems(cities):
df[city] = pd.pandas.read_csv(url)
data[city] = CleanDATA(df[city])

Merge CSV files in Python: only keep 1 column of each file and name the columns with original file names

I have 7 csv files of 7 stocks. Each file shares the same format, of columns and rows.
I have applied different ways to merge these files into 1 dataframe but still don't succeed (loop, using glob, etc). I want to keep the "Date" column as the index for the dataframe, and the "High" column of each file next to each other. Then the "High" columns are renamed based on the stock names.
import pandas as pd
FDX = pd.read_csv("../Data/FDX.csv")
GOOGL = pd.read_csv("../Data/GOOGL.csv")
IBM = pd.read_csv("../Data/IBM.csv")
KO = pd.read_csv("../Data/KO.csv")
MS = pd.read_csv("../Data/MS.csv")
NOK = pd.read_csv("../Data/NOK.csv")
XOM = pd.read_csv("../Data/XOM.csv")
stocks = pd.DataFrame({"FDX": FDX["High"],
"GOOGL": GOOGL["High"],
"IBM": IBM["High"],
"KO": KO["High"],
"MS": MS["High"],
"NOK": NOK["High"],
"XOM": XOM["High"]
})
stocks.head()
The codes I wrote has errors. In there anyway to do it?
Thank you for your answers!
If they all have the same date range this would work.
MergeList = [[GOOGL,'GOOGL'],[IBM,'IBM'],[KO,'KO'],[MS,'MS'],[NOK,'NOK'],[XOM,'XOM']]
NewList = []
for df_t,col_name in MergeList:
df_t = df_t[['Date','High']]
df_t.columns = ['Date',col_name]
NewList.append(df_t)
Merge = FDX
for df_t in NewList:
Merge = pd.merge(Merge,df_t,on='Date')

Categories

Resources