Attempting to add a column heading to the newly created csv file - python

I'm trying to add the add the header to my csv file that I created in the code given below:
There's only 1 column in the csv file that I'm trying to create,
the data frame consists of an array, the array is
[0.6999346, 0.6599296, 0.69770324, 0.71822715, 0.68585426, 0.6738229, 0.70231324, 0.693281, 0.7101939, 0.69629824]
i just want to create a csv file with header like this
Desired csv File , I want my csv file in this format
Please help me with detailed code, I'm new to coding.
I tried this
df = pd.DataFrame(c)
df.columns = ['Confidence values']
pd.DataFrame(c).to_csv('/Users/sunny/Desktop/objectdet/final.csv',header= True , index= True)
But i'm getting this csv file

Try this
import pandas as pd
array = [0.6999346, 0.6599296, 0.69770324, 0.71822715, 0.68585426, 0.6738229, 0.70231324, 0.693281, 0.7101939, 0.69629824]
df = pd.DataFrame(array)
df.columns = ['Confidence values']
df.to_csv('final.csv', index=True, header=True)
Your action pd.DataFrame(c) is creating a new dataframe with no header, while your df is a dataframe with header.
You are writing the dataframe with no header to a csv, that's why you dont get your header in your csv. All you need to do is replace pd.DataFrame(c) with df

Related

How to preserve complicated excel header formats when manipulating data using Pandas Python?

I am parsing a large excel data file to another one, however the headers are very abnormal. I tried to use "read_excel skiprows" and that did not work. I also tried to include the header in
df = pd.read_excel(user_input, header= [1:3], sheet_name = 'PN Projection'), but then I get this error "ValueError: cannot join with no overlapping index names." To get around this I tried to name the columns by location and that did not work either.
When I run the code as shows below everything works fine, but past cell "U" I get the header titles to be "unnamed1, 2, ..." I understand this is because pandas is considering the first row to be the header(which are empty), but how do I fix this? Is there a way to preserve the headers without manually typing in the format for each cell? Any and all help is appreciated, thank you!
small section of the excel file header
the code I am trying to run
#!/usr/bin/env python
import sys
import os
import pandas as pd
#load source excel file
user_input = input("Enter the path of your source excel file (omit 'C:'): ")
#reads the source excel file
df = pd.read_excel(user_input, sheet_name = 'PN Projection')
#Filtering dataframe
#Filters out rows with 'EOL' in column 'item status' and 'xcvr' in 'description'
df = df[~(df['Item Status'] == 'EOL')]
df = df[~(df['Description'].str.contains("XCVR", na=False))]
#Filters in rows with "XC" or "spartan" in 'description' column
df = df[(df['Description'].str.contains("XC", na=False) | df['Description'].str.contains("Spartan", na=False))]
print(df)
#Saving to a new spreadsheet called Filtered Data
df.to_excel('filtered_data.xlsx', sheet_name='filtered_data')
If you do not need the top 2 rows, then:
df = pd.read_excel(user_input, sheet_name = 'PN Projection',error_bad_lines=False, skiprows=range(0,2)
This has worked for me when handling several strangely formatted files. Let me know if this isn't what your looking for, or if their are additional issues.

How to insert data into a specific cell in csv with python?

I am trying to insert data into a specific cell in csv. My code is as follows.
The existing file.
Output
The data in cell A1("Custmor") is replaced with new data("Name").
My code is as follows.
import pandas as pd
#The existing CSV file
file_source = r"C:\Users\user\Desktop\Customer.csv"
#Read the existing CSV file
df = pd.read_csv(file_source)
#Insert"Name"into cell A1 to replace "Customer"
df[1][0]="Name"
#Save the file
df.to_csv(file_source, index=False)
And it doesn't work. Please help me finding the bug.
Customer is column header, you need do
df = df.rename(columns={'Customer': 'Name'})
I am assuming you are going to want to work with header less csv so if that's the case, your code is already correct, just need to add header=None while reading from csv
import pandas as pd
#The existing CSV file
file_source = r"C:\Users\user\Desktop\Customer.csv"
#Read the existing CSV file
df = pd.read_csv(file_source,header=None) #notice this line is now different
#Insert"Name"into cell A1 to replace "Customer"
df[1][0]="Name"
#Save the file
df.to_csv(file_source, index=False,header=None) #made this header less too

how to remove rowheaders from dataframe

right so this is my .csv file
,n,bubble sort,insertion sort,quick sort,tim sort
0,10,9.059906005859375e-06,5.0067901611328125e-06,1.9073486328125e-05,1.9073486328125e-06
1,50,0.0001659393310546875,8.487701416015625e-05,5.3882598876953125e-05,3.0994415283203125e-06
2,100,0.0006668567657470703,0.0003230571746826172,0.00011801719665527344,7.867813110351562e-06
3,500,0.028728008270263672,0.011162996292114258,0.0013577938079833984,6.008148193359375e-05
4,1000,0.11858582496643066,0.049070119857788086,0.0027892589569091797,0.000141143798828125
5,5000,2.022613048553467,0.8588027954101562,0.011118888854980469,0.0006251335144042969
and I was a bit confused with how could I remove the row headers from this line since its using DataFrame to get those row headers.
df = pd.DataFrame(timming)
df = pd.DataFrame(timming , header=None)

Write Dataframe row to excel sheet using Pandas

How do I save returned row from dataframe into excel sheet?
Story: Am working with large txt file (1.7M rows), containing postal codes for Canada. I created a dataframe, and extracted values I need into it. One column of the dataframe is the province id (df['PID']). I created a list of the unique values found in that PID column, and am successfully creating the (13) sheets, each named after the unique PID, in a new excel spread sheet.
Problem: Each sheet only contains the headers, and not the values of the row.
I am having trouble writing the matching row to the sheet. Here is my code:
import pandas as pd
# parse text file into dataframe
path = 'the_file.txt'
df = pd.read_csv(path, sep='\t', header=None, names=['ORIG', 'PID','PCODE'], encoding='iso-8859-1')
# extract characters to fill values
df['ORIG'] = df['ORIG']
df['PID'] = df['ORIG'].str[11:13].astype(int)
df['PCODE'] = df['ORIG'].str[:6]
# create list of unique province ID's
prov_ids = df['PID'].unique().tolist()
prov_ids_string = map(str, prov_ids)
# create new excel file
writer = pd.ExcelWriter('CanData.xlsx', engine='xlsxwriter')
for id in prov_ids_string:
mydf = df.loc[df.PID==id]
# NEED TO WRITE VALUES FROM ROW INTO SHEET HERE*
mydf.to_excel(writer, sheet_name=id)
writer.save()
I know where the writing should happen, but I haven't gotten the correct result. How can I write only the rows which have matching PID's to their respective sheets?
Thank you
The following should work:
import pandas as pd
import xlsxwriter
# parse text file into dataframe
# extract characters to fill values
df['ORIG'] = df['ORIG']
df['PID'] = df['ORIG'].str[11:13].astype(int)
df['PCODE'] = df['ORIG'].str[:6]
# create list of unique province ID's
prov_ids = df['PID'].unique().tolist()
#prov_ids_string = map(str, prov_ids)
# create new excel file
writer = pd.ExcelWriter('./CanData.xlsx', engine='xlsxwriter')
for idx in prov_ids:
mydf = df.loc[df.PID==idx]
# NEED TO WRITE VALUES FROM ROW INTO SHEET HERE*
mydf.to_excel(writer, sheet_name=str(idx))
writer.save()
For example data:
df = pd.DataFrame()
df['ORIG'] = ['aaaaaa111111111111111111111',
'bbbbbb2222222222222222222222']
df['ORIG'] = df['ORIG']
df['PID'] = df['ORIG'].str[11:13].astype(int)
df['PCODE'] = df['ORIG'].str[:6]
print(df)
In my Sheet 11, I have:
Kr.

How to get rid of "chaning" rows above headers (lenght changes everytime but headers and data are always the same)

I have the following csv file:
csv file
there are about 6-8 rows at the top of the file, I know how to make a new dataframe in Pandas, and filter the data:
df = pd.read_csv('payments.csv')
df = df[df["type"] == "Order"]
print df.groupby('sku').size()
df = df[df["marketplace"] == "amazon.com"]
print df.groupby('sku').size()
df = df[df["promotional rebates"] > ((df["product sales"] + df["shipping credits"])*-.25)]
print df.groupby('sku').size()
df.to_csv("out.csv")
My issue is with the Headers. I need to
1. look for the row that has date/time & another field.
That way I do not have to change my code if the file keeps changing the row count before the headers.
2. make a new DF excluding those rows.
What is the best approach, to make sure the code does not break to changes as long as the header row exist and has a few Fields matching. Open for any suggestions.
considering a CSV file like this:
random line content
another random line
yet another one
datetime, settelment id, type
dd, dd, dd
You can use the following to compute the header's line number:
#load the first 20 rows of the csv file as a one column dataframe
#to look for the header
df = pd.read_csv("csv_file.csv", sep="|", header=None, nrows=20)
# use a regular expression to look check which column has the header
# the following will generate a array of booleans
# with True if the row contains the regex "datetime.+settelment id.+type"
indices = df.iloc[:,0].str.contains("datetime.+settelment id.+type")
# get the row index of the header
header_index = df[indices].index.values[0]
and read the csv file starting from the header's index:
# to read the csv file, use the following:
df = pd.read_csv("csv_file.csv", skiprows=header_index+1)
Reproducible example:
import pandas as pd
from StringIO import StringIO
st = """
random line content
another random line
yet another one
datetime, settelment id, type
dd, dd, dd
"""
df = pd.read_csv(StringIO(st), sep="|", header=None, nrows=20)
indices = df.iloc[:,0].str.contains("datetime.+settelment id.+type")
header_index = df[indices].index.values[0]
df = pd.read_csv(StringIO(st), skiprows=header_index+1)
print(df)
print("columns")
print(df.columns)
print("shape")
print(df.shape)
Output:
datetime settelment id type
0 dd dd dd
columns
Index([u'datetime', u' settelment id', u' type'], dtype='object')
shape
(1, 3)

Categories

Resources