I have a sentence like !=This is great
How do I write the sentence into a .CSV file with "!" in the first column and "This is great" in another column?
You can use pandas to_csv method
code:
import pandas as pd
col1 = []
col2 = []
f = '!=This is great'
l1 = f.split('=')
col1.append(l1[0])
col2.append(l1[1])
df = pd.DataFrame()
df['col1'] = col1
df['col2'] = col2
df.to_csv('test.csv')
split the text, and write it to an output file:
text = open('in.txt').read() #if from input file
text = '!=This is great' #if not from input file
with open('out.csv','w') as f:
f.write(','.join(text.split('=')))
output:
!,This is great
if you have multiple lines, you will have to loop through the input file and split each one
Of course, you could write using standard io with open() and manually write with comma delimiter for each line, but python has csv standard library that will help you with this. You could specify the dialect
In [1]: import csv
In [2]: sentence="!=This is great"
In [3]: with open("test.csv", "w", newline='') as f:
...: my_csvwriter = csv.writer(f)
...: my_csvwriter.writerow(sentence.split("="))
With multiple data, assuming it's in list, you could iterate through it when writing.
with open("test.csv", "w", newline='') as f:
my_csvwriter = csv.writer(f)
for sentence in sentences:
my_csvwriter.writerow(sentence.split("="))
This library helps handling comma in a sentence, instead of handling it yourself. For instance you have:
sentence = "!=Hello, my name is.."
with open("test.csv", "w", newline='') as f:
my_csvwriter = csv.writer(f)
my_csvwriter.writerow(sentence.split("="))
# This will be written: !,"Hello, my name is.."
# With that quote, you could still open it in excel without confusing it
# and it knows that `Hello, my name is..` is in the same column
Related
I have got multiple csv files which look like this:
ID,Text,Value
1,"I play football",10
2,"I am hungry",12
3,"Unfortunately",I get an error",15
I am currently importing the data using the pandas read_csv() function.
df = pd.read_csv(filename, sep = ',', quotechar='"')
This works for the first two rows in my csv file, unfortunately I get an error in row 3. The reason is that within the 'Text' column there is a quotechar character-comma combination before the end of the column.
ParserError: Error tokenizing data. C error: Expected 3 fields in line 4, saw 4
Is there a way to solve this issue?
Expected output:
ID Text Value
1 I play football 10
2 I am hungry 12
3 Unfortunately, I get an error 15
You can try to fix the CSV using re module:
import re
import pandas as pd
from io import StringIO
with open("your_file.csv", "r") as f_in:
s = re.sub(
r'"(.*)"',
lambda g: '"' + g.group(1).replace('"', "\\") + '"',
f_in.read(),
)
df = pd.read_csv(StringIO(s), sep=r",", quotechar='"', escapechar="\\")
print(df)
Prints:
ID Text Value
0 1 I play football 10
1 2 I am hungry 12
2 3 Unfortunately,I get an error 15
One (not so flexible) approach would be to firstly remove all " quotes from the csv, and then enclose the elements of the specific column with "" quotes(this is done to avoid misinterpreting the "," seperator while parsing), like this:
import csv
# Specify the column index (0-based)
column_index = 1
# Open the input CSV file
with open('input.csv', 'r') as f:
reader = csv.reader(f)
# Open the output CSV file
with open('output.csv', 'w', newline='') as g:
writer = csv.writer(g)
# Iterate through the rows of the input CSV file
for row in reader:
# Replace the " character with an empty string
row[column_index] = row[column_index].replace('"', '')
# Enclose the modified element in "" quotes
row[column_index] = f'"{row[column_index]}"'
# Write the modified row to the output CSV file
writer.writerow(row)
This code creates a new modified csv file
Then your problematic csv row will look like that:
3,"Unfortunately,I get an error",15"
Then you can import the data like you did: df = pd.read_csv(filename, sep = ',', quotechar='"')
To automate this conversion for all csv files within a directory:
import csv
import glob
# Specify the column index (0-based)
column_index = 1
# Get a list of all CSV files in the current directory
csv_files = glob.glob('*.csv')
# Iterate through the CSV files
for csv_file in csv_files:
# Open the input CSV file
with open(csv_file, 'r') as f:
reader = csv.reader(f)
# Open the output CSV file
output_file = csv_file.replace('.csv', '_new.csv')
with open(output_file, 'w', newline='') as g:
writer = csv.writer(g)
# Iterate through the rows of the input CSV file
for row in reader:
# Replace the " character with an empty string
row[column_index] = row[column_index].replace('"', '')
# Enclose the modified element in "" quotes
row[column_index] = f'"{row[column_index]}"'
# Write the modified row to the output CSV file
writer.writerow(row)
this names the new csv files as the old ones but with "_new.csv" instead of just ".csv".
A possible solution:
df = pd.read_csv(filename, sep='(?<=\d),|,(?=\d)', engine='python')
df = df.reset_index().set_axis(['ID', 'Text', 'Value'], axis=1)
df['Text'] = df['Text'].replace('\"', '', regex=True)
Another possible solution:
df = pd.read_csv(StringIO(text), sep='\t')
df[['ID', 'Text']] = df.iloc[:, 0].str.split(',', expand=True, n=1)
df[['Text', 'Value']] = df['Text'].str.rsplit(',', expand=True, n=1)
df = df.drop(df.columns[0], axis=1).assign(
Text=df['Text'].replace('\"', '', regex=True))
Output:
ID Text Value
0 1 I play football 10
1 2 I am hungry 12
2 3 Unfortunately,I get an error 15
How can I replace multiple whitespaces for every lines by comma, I'm using tabulate and been trying to figure out how
Here's a my code:
def print_extensions(self):
i = InternetExplorer(self.os)
content1 = tabulate(i.extensions(), headers="keys", tablefmt="plain")
text_file=open("output.csv","w")
text_file.write(content1)
text_file.close()
My CSV output:
where • = whitespace
path•••••name•••••id
C:\Windows•••••Microsoft•••••{CFBFAE00}
Expected CSV output:
path,name,id
C:\Windows,Microsoft,{CFBFAE00}
Probably you'd be better off using Python's standard
csv module.
For example:
import csv
data = ["some", "header"], ["and", "data"]
with open("test.csv", "w") as csv_file:
writer = csv.writer(csv_file)
writer.writerows(data)
produces the file
some,header
and,data
Tabulate is really more intended for pretty-printing. I certainly wouldn't suggest producing CSV files by parsing the output of tabulate with regular expressions.
import csv
data = ["some", "header"], ["and", "data"]
with open("test.csv", "w") as csv_file:
writer = csv.writer(csv_file)
writer.writerows(data)
I am trying to combine multiple rows in a csv file together. I could easily do it in Excel but I want to do this for hundreds of files so I need it to be as a code. I have tried to store rows in arrays but it doesn't seem to work. I am using Python to do it.
So lets say I have a csv file;
1,2,3
4,5,6
7,8,9
All I want to do is to have a csv file as this;
1,2,3,4,5,6,7,8,9
The code I have tried is this;
fin = open("C:\\1.csv", 'r+')
fout = open("C:\\2.csv",'w')
for line in fin.xreadlines():
new = line.replace(',', ' ', 1)
fout.write (new)
fin.close()
fout.close()
Could you please help?
You should be using the csv module for this as splitting CSV manually on commas is very error-prone (single columns can contain strings with commas, but you would incorrectly end up splitting this into multiple columns). The CSV module uses lists of values to represent single rows.
import csv
def return_contents(file_name):
with open(file_name) as infile:
reader = csv.reader(infile)
return list(reader)
data1 = return_contents('csv1.csv')
data2 = return_contents('csv2.csv')
print(data1)
print(data2)
combined = []
for row in data1:
combined.extend(row)
for row in data2:
combined.extend(row)
with open('csv_out.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerow(combined)
That code gives you the basis of the approach but it would be ugly to extend this for hundreds of files. Instead, you probably want os.listdir to pull all the files in a single directory, one by one, and add them to your output. This is the reason that I packed the reading code into the return_contents function; we can repeat the same process millions of times on different files with only one set of code to do the actual reading. Something like this:
import csv
import os
def return_contents(file_name):
with open(file_name) as infile:
reader = csv.reader(infile)
return list(reader)
all_files = os.listdir('my_csvs')
combined_output = []
for file in all_files:
data = return_contents('my_csvs/{}'.format(file))
for row in data:
combined_output.extend(row)
with open('csv_out.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerow(combined_output)
If you are specially dealing with csv file format. I recommend you to use csv package for the file operations. If you also use with...as statement, you don't need to worry about closing the file etc. You just need to define the PATH then program will iterate all .csv files
Here is what you can do:
PATH = "your folder path"
def order_list():
data_list = []
for filename in os.listdir(PATH):
if filename.endswith(".csv"):
with open("data.csv") as csvfile:
read_csv = csv.reader(csvfile, delimiter=',', quoting=csv.QUOTE_NONNUMERIC)
for row in read_csv:
data_list.extend(row)
print(data_list)
if __name__ == '__main__':
order_list()
Store your data in pandas df
import pandas as pd
df = pd.read_csv('file.csv')
Store the modified dataframe into new one
df_2 = df.groupby('Column_Name').agg(lambda x: ' '.join(x)).reset_index() ## Write Name of your column
Write the df to new csv
df2.to_csv("file_modified.csv")
You could do it also like this:
fIn = open("test.csv", "r")
fOut = open("output.csv", "w")
fOut.write(",".join([line for line in fIn]).replace("\n",""))
fIn.close()
fOut.close()
I've you want now to run it on multiple file you can run it as script with arguments:
import sys
fIn = open(sys.argv[1], "r")
fOut = open(sys.argv[2], "w")
fOut.write(",".join([line for line in fIn]).replace("\n",""))
fIn.close()
fOut.close()
So now expect you use some Linux System and the script is called csvOnliner.py you could call it with:
for i in *.csv; do python csvOnliner.py $i changed_$i; done
With windows you could do it in a way like this:
FOR %i IN (*.csv) DO csvOnliner.py %i changed_%i
I'm trying to combine two lists into a csv, and have it output a line per each line of a second list.
a.csv
1
2
3
b.csv
a,x
b,y
c,z
Output:
c.csv
1|a|x
2|a|x
3|a|x
1|b|y
2|b|y
3|b|y
1|c|z
2|c|z
3|c|z
So for each line of "a" combine each line of "b", and get a list in "c".
Note, I have no need to separate "b" to reorder the columns, keeping the original order is fine.
A loop seems needed, but I'm having zero luck doing it.
Answered (output is not perfect, but ok for what i was needing):
import csv
from itertools import product
def main():
with open('a.csv', 'rb') as f1, open('b.csv', 'rb') as f2:
reader1 = csv.reader(f1, dialect=csv.excel_tab)
reader2 = csv.reader(f2, dialect=csv.excel_tab)
with open('output.csv', 'wb') as output:
writer = csv.writer(output, delimiter='|', dialect=csv.excel_tab)
writer.writerows(row1 + row2 for row1, row2 in product(reader1, reader2))
if __name__ == "__main__":
main()
Output file:
1|a,x
1|b,y
1|c,z
2|a,x
2|b,y
2|c,z
3|a,x
3|b,y
3|c,z
Yes the "|" is only one of the separators.
It would be nice to know how to get "1|a|x" and so on.
One way is to use pandas:
import pandas as pd
df = pd.concat([pd.read_csv(f, header=None) for f in ('a.csv', 'b.csv')], axis=1)
df.to_csv('out.csv', sep='|', index=False, header=False)
A native Python approach, using itertools.product:
from itertools import product
#read file a, remove newline, replace commas with new delimiter and ignore empty lines
a = [line[:-2].strip().replace(",", "|") for line in open("a.csv", "r") if line[:-2].strip()]
#read file b, leave newline in string
b = [line.replace(",", "|") for line in open("b.csv", "r") if line[:-2].strip()]
#combine the two lists
c = ["|".join([i, j]) for i, j in product(a, b)]
#write into a new file
with open("c.csv", "w") as f:
for item in c:
f.write(item)
#output
1|a|x
1|b|y
1|c|z
2|a|x
2|b|y
2|c|z
3|a|x
3|b|y
3|c|z
I have some data that needs to be written to a CSV file. The data is as follows
A ,B ,C
a1,a2 ,b1 ,c1
a2,a4 ,b3 ,ct
The first column has comma inside it. The entire data is in a list that I'd like to write to a CSV file, delimited by commas and without disturbing the data in column A. How can I do that? Mentioning delimiter = ',' splits it into four columns on the whole.
Just use the csv.writer from the csv module.
import csv
data = [['A','B','C']
['a1,a2','b1','c1']
['a2,a4','b3','ct']]
fname = "myfile.csv"
with open(fname,'wb') as f:
writer = csv.writer(f)
for row in data:
writer.writerow(row)
https://docs.python.org/library/csv.html#csv.writer
No need to use the csv module since the ',' in the first column is already part of your data, this will work:
with open('myfile.csv', 'w') as f:
for row in data:
f.write(', '.join(row))
f.write('\n')
You could try the below.
Code:
import csv
import re
with open('infile.csv', 'r') as f:
lst = []
for line in f:
lst.append(re.findall(r',?(\S+)', line))
with open('outfile.csv', 'w', newline='') as w:
writer = csv.writer(w)
for row in lst:
writer.writerow(row)
Output:
A,B,C
"a1,a2",b1,c1
"a2,a4",b3,ct