I've created a csv file with the column names and saved it using pandas library. This file will be used to create a historic record where the rows will be charged one by one in different moments... what I'm doing to add rows to this csv previously created is transform the record to a DataFrame and then using to_csv() I choose mode = 'a' as a parameter in order to append this record to the existing file. The problem here is that I would like to see and index automatically generated in the file everytime I add a new row. I already know when I import this file as a DF, an index is generated automatically, but this is within the idle interface...when I open the csv with Excel for example...the file doesn't have an index.
While writing your files to csv, you can use set index = True in the to_csv method. This ensures that the index of your dataframe is written explicitly to the csv file
Related
I have two different CSV files which i have imported using pd.read_csv.
Both files have different header names. I would like to export this specific column under the header name of ["Model"] in the first CSV file to the second CSV file under the header name of ["Product"]
I have tried using the following code but produced value error:
writer=df1[df1['Model']==df2['Product']]
Would appreciate any help.
Try joining the DataFrames on the index using pandas.DataFrame.join then exporting the result as a csv using pandas.DataFrame.to_csv.
df1.join(df2)
df1.to_csv('./df2.csv')
I have a CSV file that has a table with information that I'd like to reference in another table. To give you a better perspective, I have the following example:
"ID","Name","Flavor"
"45fc754d-6a9b-4bde-b7ad-be91ae60f582","account1-test1","m1.medium"
"83dbc739-e436-4c9f-a561-c5b40a3a6da5","account3-test2","m1.tiny"
"ef68fcf3-f624-416d-a59b-bb8f1aa2a769","account1-test3","m1.medium"
I would like to add another column that references the Name column and pulls the customner name in one column and the rest of the info into another column, example:
"ID","Name","Flavor","Customer","Misc"
"45fc754d-6a9b-4bde-b7ad-be91ae60f582","account1-test1","m1.medium","account1","test1"
"83dbc739-e436-4c9f-a561-c5b40a3a6da5","account3-test2","m1.tiny","account3,"test2"
"ef68fcf3-f624-416d-a59b-bb8f1aa2a769","account1-test3","m1.medium","account1","test3"
The task here is to have a python script that opens the original CSV file, and creates a new CSV file with the added column. Any ideas? I've been having trouble parsing through the name column successfully.
data = pd.read_csv('your_file.csv')
data[['Customer','Misc']] = data.Name.str.split("-",expand=True)
Now you can again save it to csv file by :
data.to_csv('another_file.csv')
Have you tried opening your csv file with a pandas DataFrame. This can be done with:
df = pd.read_csv('input_data.csv')
If the customer and misc columns are part of another csv file you can load this with the same method as above (naming df2) and then append with the following:
df['Customer'] = df2['Customer']
You can then output the DataFrame as a csv file with the following:
df.to_csv('output_data_name.csv')
I have multiple scripts, each of them have a Dataframe. I want to Export one column from each script/Dataframe into a single csv.
I can crate a csv from my "first" script with one column:
Vorhersagen = pd.DataFrame(columns=["solar_prediction"])
Vorhersagen["solar_prediction"] = Erzeugung["erzeugung_predicted"]
Vorhersagen.to_csv(r"H:/.../Vorhersagen_2017.csv")
Now i have a csv (called "Vorhersagen_2017") with the column "solar_prediction". But how can I add another column (from another script) to the same csv as a second column? (The columns have the same length)
If I understood correctly you want to update the csv file by running different scripts. If this is the case then I would just read the file, append the new column and save the file again. Something like:
df=pd.read_csv('Vorhersagen_2017.csv',...)
df2=pd.concat([df,df1],axis=1) #df1 is the dataframe created by your second script
df2.to_csv(...)
Then you would have to run this iteratively in all your scripts.
However, I think is more efficient to import all your script as modules in a main script and run them from there. From this main script you could easily concatenate the various columns and save them as a csv at once.
I made a program to save two arrays into csv file using pandas data frame in python so that I could record all the data.
I tried the code listed below.
U_8=[]
start=[]
U_8.append(d)
start.append(str(time.time()))
x=pd.DataFrame({'1st':U_8, 'Time Stamp':start})
export_csv = x.to_csv (r'/home/pi/Frames/q8.csv', index = None,
header=True)
Every time the program is closed and run again , it overwrites the previous values stored in the csv file. I expected it to save the new values along with the previous ones. How could I store the past and present value in this csv file.
In order to append to a csv instead of writing, pass mode='a' to df.to_csv. The default mode is 'w' which overwrites any existing csv with the same filename. Plain appending, however, appends the column headers as well which will appear as values in your csv. To mitigate that, pass header=False in your subsequent runs: df.to_csv(data, mode='a', header=False).
Another way is to read your original DataFrame and use pd.concat to join it with your new results. The workflow will then be as such:
Read the original csv into a DataFrame.
Create a DataFrame with new results.
Concatenate the two DataFrames.
Write the resulting DataFrame to csv.
Assigning the return value of .to_csv to a variable is not necessary. As in df.to_csv(data) will still export to csv.
I'm working with lastest version of Spark(2.1.1). I read multiple csv files to dataframe by spark.read.csv.
After processing with this dataframe, How can I save it to output csv file with specific name.
For example, there are 100 input files (in1.csv,in2.csv,in3.csv,...in100.csv).
The rows that belong to in1.csv should be saved as in1-result.csv. The rows that belong to in2.csv should be saved as in2-result.csv and so on. (The default file name will be like part-xxxx-xxxxx which is not readable)
I have seen partitionBy(col) but look like it can just partition by column.
Another question is I want to plot my dataframe. Spark has no built-in plot library. Many people use df.toPandas() to convert to pandas and plot it. Is there any better solution? Since my data is very big and toPandas() will cause memory error. I'm working on the server and want to save the plot as image instead of showing.
I suggest below solution for writing DataFrame in specific directories related to input file:
in loop for each file:
read csv file
add new column with information about input file using withColumn tranformation
union all DataFrames using union transformation
do required preprocessing
save result using partitionBy by providing column with input file information, so that rows related to the same input file will be saved in the same output directory
Code could look like:
all_df = None
for file in files: # where files is list of input CSV files that you want to read
df = spark.read.csv(file)
df.withColumn("input_file", file)
if all_df is None:
all_df = df
else:
all_df = all_df.union(df)
# do preprocessing
result.write.partitionBy(result.input_file).csv(outdir)