Unable to convert csv file to text tab delimited file in Python - python

Instead of manually convert csv file to text tab delimited file using excel software
I would like to automate this process using Python.
However, using the following code
with open('endnote_csv.csv', 'r') as fin:
with open('endnote_deliminated.txt', 'w', newline='') as fout:
reader = csv.DictReader(fin, delimiter=',')
writer = csv.DictWriter(fout, reader.fieldnames, delimiter='|')
writer.writeheader()
writer.writerows(reader)
Return an error of
ValueError: dict contains fields not in fieldnames: None
May I know where did I do wrong,
The csv file is accessible via the following link
Thanks in advance for any insight.

You can use the Python package called pandas to do this:
import pandas as pd
fname = 'endnote_csv'
pd.read_csv(f'{fname}.csv').to_csv(f'{fname}.tsv', sep='\t', index=False)
Here's how it works:
pd.read_csv(fname) - reads a CSV file and stores it as a pd.DataFrame object (not important for this example)
.to_csv(fname) - writes a pd.DataFrame to a CSV file given by fname
sep='\t' - replaces the ',' used in CSVs with a tab character
index=False - use this to remove the row numbers
If you want to be a bit more advanced and use the command line only, you can do this:
# csv-to-tsv.py
import sys
import pandas as pd
fnames = sys.argv[1:]
for fname in fnames:
main_name = '.'.join(fname.split('.')[:-1])
pd.read_csv(f'{main_name}.csv').to_csv(f'{main_name}.tsv', sep='\t', index=False)
This will allow you to run a command like this from the command line and change all .csv files to .tsv files in one go:
python csv-to-tsv.py *.csv

It is erroring out on comma seperated author names. It appears that columns in the underline rows exceeds number of headers.

Related

Making a CSV via Excel gives me '' in front of the first column name

I generated a csv via excel and when printing the key names, I get some weird characters appended to the first key like so:
keys(['row1', 'row2']
import csv
path = 'C:\\Users\\asdf\\Desktop\\file.csv'
with open(path, 'r') as file:
reader = csv.DictReader(file)
for row in reader:
print(row.keys())
However, if I just create the csv in the IDE everything works fine and no strange chars are printed. How can I read the excel csv in to chop off the strange characters?
with open(path, 'r', encoding='utf-8-sig')
this worked

How to filter tab delimited text file that selects lines that start with certain string and convert to a CSV

I have a text file that is tab delimited and I want to only select lines that start with a certain string. Then I want to take those lines and convert it to a CSV file. I was able to do this, but in the excel csv, each line from the text file is split into 3 cells in a row in the csv and within each cell, there are still tabs. Also, it skips every other row.
I tried replacing tabs with commas, but it didn't work.
#parse APT.txt for airport data
import pandas as pd
import csv
import itertools
airport_data = source
APT_lines = []
for line in open(airport_data):
if line.startswith('APT'):
APT_lines.append(line)
df = pd.DataFrame(APT_lines)
df.to_csv('apt.csv', header=False, index=False, quoting=csv.QUOTE_NONE, escapechar=' ')
The csv module in python handles tab-delimited files as well as comma-separated values. I think you want to do something like this:
import csv
with open(input_file,newline='') as csvfile, open(output_file, 'w+', newline='') as output:
reader = csv.reader(csvfile,delimiter='\t',quotechar='"')
writer = csv.writer(output, delimited=',',quotechar='"')
for row in reader:
if len(row) != 0 and row[0].startswith('APT'):
writer.writerow(row)
(I haven't tested this code and you might find typos in it; but the CSV module is a pleasure to work with. I recommend reading the file directly as CSV, and then using the CSV module with the desired settings to write it back out.)

Replacing Delimiter In CSV Files with Python

I have a folder with several CSV files. These files all contain the box drawing double vertical and horizontal as the delimiter. I am trying to import all these files into python, change that delimiter to a pipe, and then save the new files into another location. The code I currently have runs without any errors but doesn't actually do anything. Any suggestions?
import os
import pandas as pd
directory = 'Y:/Data'
dirlist = os.listdir(directory)
file_dict = {}
x = 0
for filename in dirlist:
if filename.endswith('.csv'):
file_dict[x] = pd.read_csv(filename)
column = file_dict[x].columns[0]
file_dict[x] = file_dict[x][column].str.replace('╬', '|')
file_dict[x].to_csv("python/file{}.csv".format(x))
x += 1
Here is a picture of sample data:
Instead of directly replacing occurrences with the new character (which may replace escaped occurrences of the character as well), we can just use built-in functionality in the csv library to read the file for us, and then write it again
import csv
with open('myfile.csv', newline='') as infile, open('outfile.csv', 'w', newline='') as outfile:
reader = csv.reader(infile, delimiter='╬')
writer = csv.writer(outfile, delimiter='|')
for row in reader:
writer.writerow(row)
Adapted from the docs
with i as open(filename):
with o as open(filename+'.new', 'w+):
for line in i.readlines():
o.write(line.replace('╬', '|'))
or, skip the python, and use sed from your terminal:
$ sed -i 's/╬/|/g' *.csv
Assuming the original delimiter doesn't appear in any escaped strings, this should be slightly faster than using the regular csv module. Panada seems to do some filesystem voodoo when reading CSVs, so I wouldn't be too surprised if it is just as fast. sed will almost certainly beat them both by far.

Read csv lines and save it as seperate txt file, named as a line - python

i have some problem with simple code.
I have a csv file with one column, and hundreds rows. I would like to get a code to read each line of csv and save it as separate txt files. What is important, the txt files should have be named as read line.
Example:
1.Adam
2. Doroty
3. Pablo
will give me adam.txt, doroty.txt and pablo txt. files. Please, help.
This should do what you need on python 3.6
with open('file.csv') as f: # Open file with hundreds of rows
for name in f.read().split('\n'): # Get list of all names
with open(f'{name.strip()}.txt', 'w') as s: # Create file per name
pass
Alternatively you can use built-in CSV library to avoid any complications with parsing csv files:
import csv
with open('names.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
file_name ='{0}.txt'.format(row['first_name'])
with open(file_name, 'w') as f:
pass

Excel disregards decimal separators when working with Python generated CSV file

I am currently trying to write a csv file in python. The format is as following:
1; 2.51; 12
123; 2.414; 142
EDIT: I already get the above format in my CSV, so the python code seems ok. It appears to be an excel issue which is olved by changing the settigs as #chucksmash mentioned.
However, when I try to open the generated csv file with excel, it doesn't recognize decimal separators. 2.414 is treated as 2414 in excel.
csvfile = open('C:/Users/SUUSER/JRITraffic/Data/data.csv', 'wb')
writer = csv.writer(csvfile, delimiter=";")
writer.writerow(some_array_with_floats)
Did you check that the csv file is generated correctly as you want? Also, try to specify the delimeter character that your using for the csv file when you import/open your file. In this case, it is a semicolon.
For python 3, I think your above code will also run into a TypeError, which may be part of the problem.
I just made a modification with your open method to be 'w' instead of 'wb' since the array has float and not binary data. This seemed to generate the result that you were looking for.
csvfile = open('C:/Users/SUUSER/JRITraffic/Data/data.csv', 'w')
An ugly solution, if you really want to use ; as the separator:
import csv
import os
with open('a.csv', 'wb') as csvfile:
csvfile.write('sep=;'+ os.linesep) # new line
writer = csv.writer(csvfile, delimiter=";")
writer.writerow([1, 2.51, 12])
writer.writerow([123, 2.414, 142])
This will produce:
sep=;
1;2.51;12
123;2.414;142
which is recognized fine by Excel.
I personally would go with , as the separator in which case you do not need the first line, so you can basically:
import csv
with open('a.csv', 'wb') as csvfile:
writer = csv.writer(csvfile) # default delimiter is `,`
writer.writerow([1, 2.51, 12])
writer.writerow([123, 2.414, 142])
And excel will recognize what is going on.
A way to do this is to specify dialect=csv.excel in the writer. For example:
a = [[1, 2.51, 12],[123, 2.414, 142]]
csvfile = open('data.csv', 'wb')
writer = csv.writer(csvfile, delimiter=";", dialect=csv.excel)
writer.writerows(a)
csvfile.close()
Unless Excel is already configured to use semicolon as its default delimiter, it will be necessary to import data.csv using Data/FromText and specify semicolon as the delimiter in the Text Import Wizard step 2 screen.
Very little documentation is provided for the Dialect class at csv.Dialect. More information about it is at Dialects in the PyMOTW's "csv – Comma-separated value files" article on the Python csv module. More information about csv.writer() is available at https://docs.python.org/2/library/csv.html#csv.writer.

Categories

Resources