Trouble with to_csv saving groupby dataframe - python

I want to save groupby dataframe in csv but i am trying to save in csv it is not saving groupby dataframe.
this is data :
dataframe image
i run this code df.groupby(['Date','Name']).sum() after that i got
output image of groupby dataframe
but i am trying to save in csv file it save like this
I run this code df.to_csv("abcd.csv")
csv file image
But I want to save in csv file like
saving excel file output which i want
please tell me the solution
thank you

CSV files are plain text, The agreed format is each row is separated by newline and each character is separated by , in general.
To achieve the formatting you want, you can convert the df into an excel file instead of csv
gdf = df.groupby(['Date','Name']).sum()
gdf.to_excel("<path_to_file>")
You will need to explicitly install xlwt to achieve working with excel files
pip install xlwt

Related

exporting to csv converts text to date

From Python i want to export to csv format a dataframe
The dataframe contains two columns like this
So when i write this :
df['NAME'] = df['NAME'].astype(str) # or .astype('string')
df.to_csv('output.csv',index=False,sep=';')
The excel output in csv format returns this :
and reads the value "MAY8218" as a date format "may-18" while i want it to be read as "MAY8218".
I've tried many ways but none of them is working. I don't want an alternative like putting quotation marks to the left and the right of the value.
Thanks.
If you want to export the dataframe to use it in excel just export it as xlsx. It works for me and maintains the value as string in the original format.
df.to_excel('output.xlsx',index=False)
The CSV format is a text format. The file contains no hint for the type of the field. The problem is that Excel has the worst possible support for CSV files: it assumes that CSV files always use its own conventions when you try to read one. In short, one Excel implementation can only read correctly what it has written...
That means that you cannot prevent Excel to interpret the csv data the way it wants, at least when you open a csv file. Fortunately you have other options:
import the csv file instead of opening it. This time you have options to configure the way the file should be processed.
use LibreOffice calc for processing CSV files. LibreOffice is a little behind Microsoft Office on most points except for csv file handling where it has an excellent support.

How to extract data from a specific column in the first CSV file to another column in another CSV file?

I have two different CSV files which i have imported using pd.read_csv.
Both files have different header names. I would like to export this specific column under the header name of ["Model"] in the first CSV file to the second CSV file under the header name of ["Product"]
I have tried using the following code but produced value error:
writer=df1[df1['Model']==df2['Product']]
Would appreciate any help.
Try joining the DataFrames on the index using pandas.DataFrame.join then exporting the result as a csv using pandas.DataFrame.to_csv.
df1.join(df2)
df1.to_csv('./df2.csv')

How do it set up automatic DataFrame structure when I scan the data in Excel to python?

My data in Excel is not separated by ",". Twitter data separated by columns. When I throw it in Python, it automatically installs DataFrame and Tweets are not showed full text. How can I overcome this?
enter image description here
If you have a copy open in Excel, the easiest solution would be to save a copy as a csv.
File -> Save As -> dropdown and select CSV.
But pandas also allows you to read excel files. This would be recommended if you have a lot of files and don't want to convert all of them.
df = pd.read_excel(<file>)
Now, if you're saying it isn't .xlsx and also not .csv, but you know the delimiter, then:
df = pd.read_csv(<file>, delimiter='\t') # for tab delimited, but you can change '\t' to any delimiter

How to read excel file with a grouped data

I have a .xlsx file that looks like this:
I want to transform it into this one:
I'm not sure how to do it using python, because pandas cannot properly read the original file.

Converting spark dataframe to flatfile .csv

I have a spark dataframe (hereafter spark_df) and I'd like to convert that to .csv format. I tried two following methods:
spark_df_cut.write.csv('/my_location/my_file.csv')
spark_df_cut.repartition(1).write.csv("/my_location/my_file.csv", sep=',')
where I get no error message for any of them and both get completed [it seems], but I cannot find any output .csv file in the target location! Any suggestion?
I'm on a cloud-based Jupyternotebook using spark '2.3.1'.
spark_df_cut.write.csv('/my_location/my_file.csv')
//will create directory named my_file.csv in your specified path and writes data in CSV format into part-* files.
We are not able to control the names of files while writing the dataframe, look for directory named my_file.csv in your location (/my_location/my_file.csv).
In case if you want filename ending with *.csv then you need to rename using fs.rename method.
spark_df_cut.write.csv save the files as part files. there is no direct solution available in spark to save as .csv file that can be opened directly with xls or some other. but there are multiple workarounds available one such work around is to convert spark Dataframe to panda Dataframe and use to_csv method like below
df = spark.read.csv(path='game.csv', sep=',')
pdf = df.toPandas()
pdf.to_csv(path_or_buf='<path>/real.csv')
this will save the data as .csv file
and another approach is using open the file using hdfs command and cat that to a file.
please post if you need more help

Categories

Resources