Quoting in CSV from Pandas - python

I am working on a project where I need to manipulate certain text files and write down as text files again. A sample file will look like
As you can see I have headers which are like "A". When I use the following code
import pandas as pd
df = pd.read_csv("Test doc.txt",sep =";")
df.to_csv("Output.txt",sep=";",index = None)
I get the output as
Now the headers are like A, the " are gone. How do I write the file in the exact same format as before?
I also tried
df.to_csv("Output.txt",sep=";",index = None, header = ["'A'","'B'","'C'"])
But this gives me
Now the header is 'A' but still not in the original format.
If I try
df.to_csv("Output.txt",sep=";",index = None, header = ['"A"','"B"','"C"'])
Now it looks like

import csv
df.to_csv("Output.txt",sep=";", index=None, quoting=csv.QUOTE_NONNUMERIC)

Change the default quote char.
df.to_csv("Output.txt", sep=';', index=None, quotechar="'", header=['"A"','"B"','"C"'])

Related

How to create function to import CSV?

I'd like to create a function in Python to import CSV files from github, by just inputting the file name.
I tried the following code, but no dataframe is created. Can anyone give me a hand? Million thanks
import pandas as pd
def in_csv(file_name):
file = 'https://raw.githubusercontent.com/USER/file_name.csv'
file_name = pd.read_csv(file, header = 0)
in_csv('csv_name')
There are many ways to read à csv, but with à pandas.DataFrame:
import pandas as pd
def in_csv(file_name):
file_path = f'https://raw.githubusercontent.com/USER/{file_name}'
df = pd.read_csv(file_path, header = 0)
return df
df = in_csv('csv_name')
print(df.head())
Thanks #Alian and #Aaron
Just add '.csv' after {file_name} and work perfect.
import pandas as pd
def in_csv(file_name):
file_path = f'https://raw.githubusercontent.com/USER/{file_name}**.csv**'
df = pd.read_csv(file_path, header = 0)
return df
df = in_csv('csv_name')
In order to do this, you probably want to use a Python 3 F-string rather than a regular string. In this case, you would want to change your first line in the function to this:
file = f'https://raw.githubusercontent.com/USER/{file_name}.csv'
The f'{}' syntax uses the value of the variable or expression within the brackets instead of the string literal you included.

How to preserve complicated excel header formats when manipulating data using Pandas Python?

I am parsing a large excel data file to another one, however the headers are very abnormal. I tried to use "read_excel skiprows" and that did not work. I also tried to include the header in
df = pd.read_excel(user_input, header= [1:3], sheet_name = 'PN Projection'), but then I get this error "ValueError: cannot join with no overlapping index names." To get around this I tried to name the columns by location and that did not work either.
When I run the code as shows below everything works fine, but past cell "U" I get the header titles to be "unnamed1, 2, ..." I understand this is because pandas is considering the first row to be the header(which are empty), but how do I fix this? Is there a way to preserve the headers without manually typing in the format for each cell? Any and all help is appreciated, thank you!
small section of the excel file header
the code I am trying to run
#!/usr/bin/env python
import sys
import os
import pandas as pd
#load source excel file
user_input = input("Enter the path of your source excel file (omit 'C:'): ")
#reads the source excel file
df = pd.read_excel(user_input, sheet_name = 'PN Projection')
#Filtering dataframe
#Filters out rows with 'EOL' in column 'item status' and 'xcvr' in 'description'
df = df[~(df['Item Status'] == 'EOL')]
df = df[~(df['Description'].str.contains("XCVR", na=False))]
#Filters in rows with "XC" or "spartan" in 'description' column
df = df[(df['Description'].str.contains("XC", na=False) | df['Description'].str.contains("Spartan", na=False))]
print(df)
#Saving to a new spreadsheet called Filtered Data
df.to_excel('filtered_data.xlsx', sheet_name='filtered_data')
If you do not need the top 2 rows, then:
df = pd.read_excel(user_input, sheet_name = 'PN Projection',error_bad_lines=False, skiprows=range(0,2)
This has worked for me when handling several strangely formatted files. Let me know if this isn't what your looking for, or if their are additional issues.

Bug on read csv format , assertin the header in jupyter notebook [duplicate]

I a importing a .csv file in python with pandas.
Here is the file format from the .csv :
a1;b1;c1;d1;e1;...
a2;b2;c2;d2;e2;...
.....
here is how get it :
from pandas import *
csv_path = "C:...."
data = read_csv(csv_path)
Now when I print the file I get that :
0 a1;b1;c1;d1;e1;...
1 a2;b2;c2;d2;e2;...
And so on... So I need help to read the file and split the values in columns, with the semi color character ;.
read_csv takes a sep param, in your case just pass sep=';' like so:
data = read_csv(csv_path, sep=';')
The reason it failed in your case is that the default value is ',' so it scrunched up all the columns as a single column entry.
In response to Morris' question above:
"Is there a way to programatically tell if a CSV is separated by , or ; ?"
This will tell you:
import pandas as pd
df_comma = pd.read_csv(your_csv_file_path, nrows=1,sep=",")
df_semi = pd.read_csv(your_csv_file_path, nrows=1, sep=";")
if df_comma.shape[1]>df_semi.shape[1]:
print("comma delimited")
else:
print("semicolon delimited")

Python Datacompy library: how to save report string into a csv file?

I'm comparing between two data frames using Datacompy, but how can I save the final result as an excel sheet or csv file? I got a string as an output, but how can I save it as a CSV.
import pandas as pd
df1_1=pd.read_csv('G1-1.csv')
df1_2=pd.read_csv('G1-2.csv')
import datacompy
compare = datacompy.Compare(
df1_1,
df1_2,
join_columns='SAMPLED CONTENT (URL to content)',
)
print(compare.report())
I have tried this, and it worked for me:
with open('//Path', encoding='utf-8') as report_file:
report_file.write(compare.report())
If you just using pandas, you can try pandas's own way to write into csv:
> df = pd.DataFrame([['yy','rr'],['tt', 'rr'],['cc', 'rr']], index=range(3),
columns=['a', 'b'])
> df.to_csv('compare.csv')
I hadn't used datacompy, but I suggest that you can make your results into a dataframe, then you can use the to_csv way.
This is working fine me also Full code
compare = datacompy.Compare(
Oracle_DF1,PostgreSQL_DF2,
join_columns=['c_transaction_cd','c_anti_social_force_req_id'], #You can also specify a list of columns
abs_tol=0,
rel_tol=0,
df1_name = 'Oracle Source',
df2_name = 'PostgrSQL Reference'
)
compare.matches(ignore_extra_columns=False)
Report = compare.report() csvFileToWrite=r'D://Postgres_Problem_15Feb21//Oracle_PostgreSQLDataFiles//Sample//summary.csv'
with open(csvFileToWrite,mode='r+',encoding='utf-8') as report_file:
report_file.write(compare.report())

How to read and query headers of txt using Pandas?

I am writing a script to read txt file using Pandas.
I need to query on particular type of hearders.
Reading excel is possible but i cannot read txt file.
import pandas as pd
#df=pd.read_excel('All.xlsx','Sheet1',dtype={'num1':str},index=False) #works
df=pd.read_csv('read.txt',dtype={'PHONE_NUMBER_1':str}) #doest work
array=['A','C']
a = df['NAME'].isin(array)
b = df[a]
print(b)
try using this syntax.
you are not using the correct key value
df=pd.read_csv('read.txt',dtype={'BRAND_NAME_1':str})
You can try this:
import pandas as pd
df = pd.read_table("input.txt", sep=" ", names=('BRAND_NAME_1'), dtype={'BRAND_NAME_1':str})
You can read file txt then astype for column.
Read file:
pd.read_csv('file.txt', names = ['PHONE_NUMBER_1', 'BRAND_NAME_1'])
names: is name of columns
Assign type:
df['PHONE_NUMBER_1'] = df['PHONE_NUMBER_1'].astype(str)

Categories

Resources