I am creating a azure function. This function locally have a csv file. I read it via given code
csv = f'{context.function_directory}/new_csv.csv'
df=pd.read_csv(csv)
after reading the data, I want to make some changes to this csv (like adding some columns). Please suggest some code how can I write this updated csv/dataframe in same directory and with the same name.
you just have to write back to the same file (or a new file)...
csv = f'{context.function_directory}/new_csv.csv'
df=pd.read_csv(csv)
# make edits to the dataframe
make a new csv or write to existing csv
df.to_csv()
Here is a link:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html
Related
How do I make a dataset that shows historic data from snapshots?
I have a csv-file that is updated and overwritten with new snapshot data once a day. I would like to make a python-script that regularly updates the snapshot data with the current snapshots.
One way I thought of was the following:
import pandas as pd
# Read csv-file
snapshot = pd.read_csv('C:/source/snapshot_data.csv')
# Try to read potential trend-data
try:
historic = pd.read_csv('C:/merged/historic_data.csv')
# Merge the two dfs and write back to historic file-path
historic.merge(snapshot).to_csv('C:/merged/historic_data.csv')
except:
snapshot.to_csv('C:/merged/historic_data.csv')
However, I don't like the fact that I use a try-function to get the historic data if the file-path exists or write the snapshot data to the historic path if the path doesn't exist.
Is there anyone that knows a better way of creating a trend dataset?
You can use os module to check if the file exists and mode argument in to_csv function to append data to the file.
The code below will:
Read from snapshot.csv.
Checks if the historic.csv file exists.
If it exists then save the headers else dont save header.
Save the file. If the file already exists, new data will be appended to the file instead of overwriting it.
import os
import pandas as pd
# Read snapshot file
snapshot = pd.read_csv("snapshot.csv")
# Check if historic data file exists
file_path = "historic.csv"
header = not os.path.exists(file_path) # whether header needs to written
# Create or append to the historic data file
snapshot.to_csv(file_path, header=header, index=False, mode="a")
you could easily one line it by utilising the mode parameter in `to_csv'.
pandas.read_csv('snapshot.csv').to_csv('historic.csv', mode='a')
It will create the file if it doesn't already exist, or will append if it does.
What happens if you don't have a new snapshot file? You might want to wrap that in a try... except block. The pythonic way is typically ask for forgiveness instead of permission.
I wouldn't even both with an external library like pandas as the standard library has all you need to 'append' to a file.
with open('snapshot.csv', 'r') as snapshot:
with open('historic.csv', 'a') as historic:
for line in new_file.readline():
historic_file.write(line)
My Code :
path="C:/MasterData?"
for view in Workbook.views:
server.views.populate_csv(view)
with.open(path+'Output.xlsx','wb') as f:
f.write(b''.join(view.csv)
In Tableau,I am iterating each worksheets in dasboard.I am generatng csv view and save it as excel file.Is there any possible i can create single workbook with 3 seperate sheeets(iterate through each view)
Possible:
Method 1:Use XML structure
1)If Organisation policy doesnt allow you to install aditional library.
Built our own python Excel using using XmL and save it as xls format.
2)Incase your allowed to use library named "Excelwriter","openpyxl" that will be more helpful
Is there a way to automate this process?
1- create new excel file
2- paste data which is ale+redy in clipboard to this new excel file with python
3- save this excel file
I have tried to find this function in openpyxl and xlsxwriter but there is only function for writing exact data to defined cells.
edit: the data in clipboard is the result from program run before this script
thanks
This method gets data copied from the clipboard and converts it into a DataFrame. You can then select whichever column you want from the DataFrame and use the to_csv method to write to the Excel file.
pandas.read_clipboard(sep='\\s+', **kwargs)
DataFrame.to_csv()
Check out the documentation :
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_clipboard.html
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html
I have a spark dataframe (hereafter spark_df) and I'd like to convert that to .csv format. I tried two following methods:
spark_df_cut.write.csv('/my_location/my_file.csv')
spark_df_cut.repartition(1).write.csv("/my_location/my_file.csv", sep=',')
where I get no error message for any of them and both get completed [it seems], but I cannot find any output .csv file in the target location! Any suggestion?
I'm on a cloud-based Jupyternotebook using spark '2.3.1'.
spark_df_cut.write.csv('/my_location/my_file.csv')
//will create directory named my_file.csv in your specified path and writes data in CSV format into part-* files.
We are not able to control the names of files while writing the dataframe, look for directory named my_file.csv in your location (/my_location/my_file.csv).
In case if you want filename ending with *.csv then you need to rename using fs.rename method.
spark_df_cut.write.csv save the files as part files. there is no direct solution available in spark to save as .csv file that can be opened directly with xls or some other. but there are multiple workarounds available one such work around is to convert spark Dataframe to panda Dataframe and use to_csv method like below
df = spark.read.csv(path='game.csv', sep=',')
pdf = df.toPandas()
pdf.to_csv(path_or_buf='<path>/real.csv')
this will save the data as .csv file
and another approach is using open the file using hdfs command and cat that to a file.
please post if you need more help
I have following N number of invoice data in Excel and I want to create CSV of that file so that it can be imported whenever needed...so how can I archive this?
Here is a screenshot:
Assuming you have a Folder "excel" full of Excel Files within your Project-Directory and you also have another folder "csv" where you intend to put your generated CSV Files, you could pretty much easily batch-convert all the Excel Files in the "excel" Directory into "csv" using Pandas.
It will be assumed that you already have Pandas installed on your System. Otherwise, you could do that via: pip install pandas. The fairly commented Snippet below illustrates the Process:
# IMPORT DATAFRAME FROM PANDAS AS WELL AS PANDAS ITSELF
from pandas import DataFrame
import pandas as pd
import os
# OUR GOAL IS:::
# LOOP THROUGH THE FOLDER: excelDir.....
# AT EACH ITERATION IN THE LOOP, CHECK IF THE CURRENT FILE IS AN EXCEL FILE,
# IF IT IS, SIMPLY CONVERT IT TO CSV AND SAVE IT:
for fileName in os.listdir(excelDir):
#DO WE HAVE AN EXCEL FILE?
if fileName.endswith(".xls") or fileName.endswith(".xlsx"):
#IF WE DO; THEN WE DO THE CONVERSION USING PANDAS...
targetXLFile = os.path.join(excelDir, fileName)
targetCSVFile = os.path.join(csvDir, fileName) + ".csv"
# NOW, WE READ "IN" THE EXCEL FILE
dFrame = pd.read_excel(targetXLFile)
# ONCE WE DONE READING, WE CAN SIMPLY SAVE THE DATA TO CSV
pd.DataFrame.to_csv(dFrame, path_or_buf=targetCSVFile)
Hope this does the Trick for you.....
Cheers and Good-Luck.
Instead of putting total output into one csv, you could go with following steps.
Convert your excel content to csv files or csv-objects.
Each object will be tagged with invoice id and save into dictionary.
your dictionary data structure could be like {'invoice-id':
csv-object, 'invoice-id2': csv-object2, ...}
write custom function which can reads your csv-object, and gives you
name,product-id, qty, etc...
Hope this helps.