Export redshift table data to csv file tabs using lambda python - python

I have a table metric_data that has data in the below format:
I want to export this data into csv file in S3 with separate tabs for components. So I will have 1 file with 3 tabs - COMP-01, COMP-02, COMP-03.
UNLOAD function is able to export all the data from the table to one CSV file. but how can I export the data as separate tabs in the CSAV file? Below is the UNLOAD command I am using:
unload ('select * from mydb.metric_data')
to 's3://mybucket/demo/folder/file.xlsx'
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole';
This command generates one csv file with all the data from the table. How can I export the data as separate sheets in a single CSV file?
UPDATE: as CSV doesn't support multiple sheets, I am trying to implement the same with excel. So i updated the Unload command to generate excel file and it produces one file with all the table data

You can't. The CSV file format doesn't support tabs / sheets. You will need to convert the CSV file to a different format (like .xls for example) that does support multiple sheets in one file.
Also the UNLOAD command you posted will produce multiple files, not just one. You can add the PARALLEL OFF option to make one file but this will only work for output files less than 5GB.

Related

exporting to csv converts text to date

From Python i want to export to csv format a dataframe
The dataframe contains two columns like this
So when i write this :
df['NAME'] = df['NAME'].astype(str) # or .astype('string')
df.to_csv('output.csv',index=False,sep=';')
The excel output in csv format returns this :
and reads the value "MAY8218" as a date format "may-18" while i want it to be read as "MAY8218".
I've tried many ways but none of them is working. I don't want an alternative like putting quotation marks to the left and the right of the value.
Thanks.
If you want to export the dataframe to use it in excel just export it as xlsx. It works for me and maintains the value as string in the original format.
df.to_excel('output.xlsx',index=False)
The CSV format is a text format. The file contains no hint for the type of the field. The problem is that Excel has the worst possible support for CSV files: it assumes that CSV files always use its own conventions when you try to read one. In short, one Excel implementation can only read correctly what it has written...
That means that you cannot prevent Excel to interpret the csv data the way it wants, at least when you open a csv file. Fortunately you have other options:
import the csv file instead of opening it. This time you have options to configure the way the file should be processed.
use LibreOffice calc for processing CSV files. LibreOffice is a little behind Microsoft Office on most points except for csv file handling where it has an excellent support.

Extracting specific data from multiple text files to excel file using python

I have mutiple text files that contains data like file1 ,file2,file3. Its just an example, I am wondering how to populate specific data in an Excel sheet like this excel sheet
I am new to learning python and the combination of text to excel through python that's why finding it hard to approch
Basically what you need is to Parse and write a new File in the csv File format for the use in excel
file1 -> PythonScript.py -> excel.csv
File Parser Python Tutorial Tutorial
The .csv File looks like this. You have a header and the data seperated with commas.
excel.csv:
Name,Data
hibiscus_3,54k
hibiscus_7,67k
Rose_3,87MB
Hope i could help you

Converting spark dataframe to flatfile .csv

I have a spark dataframe (hereafter spark_df) and I'd like to convert that to .csv format. I tried two following methods:
spark_df_cut.write.csv('/my_location/my_file.csv')
spark_df_cut.repartition(1).write.csv("/my_location/my_file.csv", sep=',')
where I get no error message for any of them and both get completed [it seems], but I cannot find any output .csv file in the target location! Any suggestion?
I'm on a cloud-based Jupyternotebook using spark '2.3.1'.
spark_df_cut.write.csv('/my_location/my_file.csv')
//will create directory named my_file.csv in your specified path and writes data in CSV format into part-* files.
We are not able to control the names of files while writing the dataframe, look for directory named my_file.csv in your location (/my_location/my_file.csv).
In case if you want filename ending with *.csv then you need to rename using fs.rename method.
spark_df_cut.write.csv save the files as part files. there is no direct solution available in spark to save as .csv file that can be opened directly with xls or some other. but there are multiple workarounds available one such work around is to convert spark Dataframe to panda Dataframe and use to_csv method like below
df = spark.read.csv(path='game.csv', sep=',')
pdf = df.toPandas()
pdf.to_csv(path_or_buf='<path>/real.csv')
this will save the data as .csv file
and another approach is using open the file using hdfs command and cat that to a file.
please post if you need more help

export multiple sheets into one CSV file

I imported a table form an excel that contains many sheets.
All the sheet needs to do the data wrangling.
I finished the data wrangling of the first five sheets, which are called data1, data2,data3, data4, data5.
I could export one of them(data1 to data5) to a CSV file. The problem is how can I export all of them into one CSV file with different sheets.
the codes I used are:
data3.to_csv('new.csv',index = False)
data2.to_csv('new.csv',index = False)
data1.to_csv('new.csv',index = False)
and so on.
But the original CSV file will be covered by the new one.
The short answer is that the CSV file format doesn't have any equivalent of sheets. You can either write the 5 sets to 5 separate csv files, or append them one after the other into a single file.

importing binary data into Python and then exporting to Excel

I would like to upload a large number of binary values from a file (a .phys file) into Python and then export these values into Excel for graphing purposes. Excel only supports ~32,000 rows at a time, but I have up to 3mil values in some cases. I am able to upload the data set into Python using
f = open("c:\DR005289_F00001.PHYS", "rb")
How do I then export this file to Excel in a format which Excel can support? For example, how could I break up the data into columns? I don't care how many values are in each column, it can be an arbitrary break depending on what Excel can support.
This has served me well. Use xlwt to Put all the data into the file.
I would create a list of lists to break the data into columns. Write each list (pick a length, 10k?) to the excel file.

Categories

Resources