Failing to deliminite csv-file - python

I have a csv-file that doesn't delimit. Screenshot of csv-file.
This means that all the data stays in row[0], and does not divide into 6 columns. Does anybody know how to solve this issue?
import csv
n=1048576
id=[]*n
a=[]*n
date=[]*n
b=[]*n
c=[]*n
with open('C:\\Users\\andsc\\data_1.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
line_count = 0
for row in csv_reader:
id[line_count] = row[0]
a[line_count] = row[1]
date[line_count] = row[2]
b[line_count] = row[3]
c[line_count] = row[4]
line_count += 1

You appear to be using a non-US version of Excel. In locales where the comma is used as a decimal separator, Excel expects the semicolon as the column delimiter:
csv_reader = csv.reader(csv_file, delimiter=';')

Firstly, don't do this:
id=[]*n
a=[]*n
...etc...
What you are trying to do is emulate a fixed-length array. That won't work. As you will see if you do this at the command prompt:
>>> [] * 9
[]
This is because the * really is a multiply, and just as [1] * 3 gives [1, 1, 1] (three repetitions of the list [1]) doing [] * 9 gives 9 repetitions of the empty list, which is just as empty as one repetition.
Instead create empty lists:
id=[]
a=[]
...etc...
Then, in your loop, do not index into these lists, append() new values to them instead:
id.append(row[0])
a.append(row[1])
...etc...
That means you don't need to keep track of line_count, and even if you do need to do that, use the provided method csv_reader.line_num().
Using Excel screenshots to look at a CSV is often misleading. It is clear that your version of Excel expects the delimiter of the CSV to be a semicolon not a comma, which is why the data is all in one column. To be 100% sure of what is in the file, open it in a text editor like Notepad or Notepad++. That avoids Excel's aggressive type coercion, which changes anything that looks like a date, or a hexadecimal string, into a number. And above all do not save the CSV back from Excel and assume the file still to be as expected.
It is clear that the code you presented will not run. It will get an IndexError the first time through the loop. You have to fix the code before it will run, and when you do that you will see that Python really does respect the comma as delimiter.
But opening the input file in Excel has given you a mistaken idea of where the problem is. You are quite right to say that comma is clearly the intended delimiter in the file. But when you open a CSV in Excel, Excel uses your system decimal and delimiter settings, which for European installations of Windows and MacOS are usually , and ;.
Excel is not bright enough to figure out on its own that those settings are inappropriate for a given file; it needs help from you. You can change Excel's File | Open behaviour by altering your system settings, but if you change the delimiter to , you will have to change the decimal point to . (for every single application, not just Excel) and it is unlikely you would want to do that.
The workaround is to set it manually for a particular file, by importing the CSV instead of simply opening it. On the Data tab select From Text/CSV and Excel will then try to guess the settings from the first 2000 rows. If it guesses wrong you have the opportunity to fix it.
But getting Excel to display the file as you expect has nothing to do with the way Python is reading it.

Related

How to replace characters in a csv file

I'm doing some measurements in the lab and want to transform them into some nice Python plots. The problem is the way the software exports CSV files, as I can't find a way to properly read the numbers. It looks like this:
-10;-0,0000026
-8;-0,00000139
-6;-0,000000546
-4;-0,000000112
-2;-5,11E-09
0,0000048;6,21E-09
2;0,000000318
4;0,00000304
6;0,0000129
8;0,0000724
10;0,000268
Separation by ; is fine, but I need every , to be ..
Ideally I would like Python to be able to read numbers such as 6.21E-09 as well, but I should be able to fix that in excel...
My main issue: Change every , to . so Python can read them as a float.
The simplest way would be for you to convert them to string and then use the .replace() method to pretty much do anything. For i.e.
txt = "0,0000048;6,21E-09"
txt = txt.replace(';', '.')
You could also read the CSV file (I don't know how you are reading the file) but depending on the library, you could change the 'delimiter' (to : for example). CSV is Comma-separated values and as the name implies, it separates columns by means of '.
You can do whatever you want in Python, for example:
import csv
with open('path_to_csv_file', 'r') as csv_file:
data = list(csv.reader(csv_file, delimiter=';'))
data = [(int(raw_row[0]), float(raw_row[1].replace(',', '.'))) for row in data]
with open('path_to_csv_file', 'w') as csv_file:
writer = csv.writer(csv_file, delimiter=';')
writer.writerows(data)
Can you consider a regex to match the ',' all in the text, then loop the match results in a process that takes ',' to '.'.

Why is my csv file separated by " \t " instead of commas (" , ")?

I downloaded data from internet and saved as a csv (comma delimited) file. The image shows what the file looks like in excel.
Using csv.reader in python, I printed each row. I have shown my code below along with the output in Spyder.
import csv
with open('p_dat.csv', 'r') as file:
reader = csv.reader(file)
for row in reader:
print(row)
I am very confused as to why my values are not comma separated. Any help will be greatly appreciated.
As pointed out in the comments, technically this is a TSV (tab-separated values) file, which is actually perfectly valid.
In practice, of course, not all libraries will make a "hard" distinction between a TSV and CSV file. The way you parse a TSV file is basically the same as the way you parse a CSV file, except that the delimiter is different.
There are actually multiple valid delimiters for this kind of file, such as tabs, commas, and semicolons. Which one you choose is honestly a matter of preference, not a "hard" technical limit.
See the specification for csvs. There are many options for the delimiter in the file. In this case you have a tab, \t.
The option is important. Suppose your data had commas in it, then a , as a delimiter would not be a good choice.
Even though they're named comma-separated values, they're sometimes separated by different symbols (like the tab character that you have currently).
If you want to use Python to view this as a comma-separated file, you can try something like:
import csv
...
with open('p_dat.csv', 'r') as file:
reader = csv.reader(file)
for row in reader:
commarow = row.replace("\t",",")
print(commarow)

Issue with parsing csv from Django web form

I was hoping someone could help me with this. I'm getting a file from a form in Django, this file is a csv and I'm trying to read it with Python's library csv. The problem here is that when I apply the function csv.reader and I turn that result into a list in order to print it, I find out that csv.reader is not splitting correctly my file.
Here are some images to show the problem
This is my csv file:
This my code:
And this is the printed value of the variable file_readed:
As you can see in the picture, it seems to be splitting my file character by character with some exceptions.
I thank you for any help you can provide me.
If you are pulling from a web form, try getting the csv as a string, confirm in a print or debug tool that the result is correct, and then pass it to csv using StringIO.
from io import StringIO
import csv
csv_string = form.files['carga_cie10'].file_read().decode(encoding="ISO-88590-1")
csv_file = StringIO(csv_string)
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in reader:
print(row)
Another thing you can try is changing the lineterminator argument to csv.reader(). It can default to \r\n but the web form might use some other value. Inspect the string you get from the web form to confirm.
that CSV does not seem right: you got some lines with more arguments than others.
The acronym of CSV being Comma Separated Values, you need to have the exact same arguments separated by commas for each line, or else it will mess it up.
I see in your lines you're maybe expecting to have 3 columns, instead you got lines with 2, or 4 arguments, and some of them have an opening " in one argument, comma, then closing " in the second argument
check if your script works with other CSVs maybe
Most likely you need to specify delimiter. Since you haven't explicitly told about the delimiter, I guess it's confused.
csv.reader(csvfile, delimiter=',')
However, since there are quotations with comma delimiter, you may need to alter the default delimiter on the CSV file's creation too for tab or something else.
The problem is here:
print(list(file_readed))
'list' is causing printing of every element within the csv as an individual unit.
Try this instead:
with open('carga_cie10') as f:
reader = csv.reader(f)
for row in reader:
print(" ".join(row))
Edit:
import pandas as pd
file_readed = pd.read_csv(file_csv)
print(file_readed)
The output should look clean. Pandas is highly useful in situations where data needs to be read, manipulated, changed, etc.

How to read CSV with column with more than one element in Python

I have the following CSV file:
id;name;duration;predecessors;
10;A;7;;
20;B;10;10;
25;B2;3;10;
30;C;5;10;
40;D;5;20,30, 25;
That is, the last row, in the fourth column I have three elements (20,30,25) separated by comma.
I have the following code:
csv_file = open(path_to_csv, 'r')
csv_file_reader = csv.reader(csv_file, delimiter=',')
first_row = True
for row in csv_file_reader :
if not first_row:
print(row)
else :
first_row = False
but I get a weird output:
['10;A;7;;']
['20;B;10;10;']
['25;B2;3;10;']
['30;C;5;10;']
['40;D;5;20', '30', ' 25;']
Any ideas?
Thanks in advance
You have specified CSV in your description, which stands for Comma Separated Values. However, your data uses semicolons.
Consider specifying the delimiter as ; for the CSV library:
with open(path_to_csv, 'r') as csv_file:
csv_file_reader = csv.reader(csv_file, delimiter=';')
...
And while we're here, note the change to using the with statement to open the file. The with statement allows you to open the file in a language-robust manner. No matter what happens (exception, quit, etc.), Python guarantees that the file will be closed and all resources accounted for. You don't need to close the file, just exit the block (unindent). It's "Pythonic" and a good habit to get into.
โœ“ #Antonio, I appreciate the above answer. As we know CSV is a file with comma separated values and Python's csv module works based on this, by default.
โœ“ No problem, you can still read from it without using csv module.
โœ“ Based on your provided input in problem I have written another simple solution without using any Python module to read CSVs (it's ok for simple tasks).
Please read, try and comment if you are not satisfied with the code or if it fails for some of your test cases.I will modify and make it workable.
ยป Data.csv
id;name;duration;predecessors;
10;A;7;;
20;B;10;10;
25;B2;3;10;
30;C;5;10;
40;D;5;20,30, 25;
Now, have a look at the below code (that finds and prints all the lines with 4th column having more than one elements):
with open ("Data.csv") as csv_file:
for line in csv_file.readlines()[1:]:
arr = line.strip().split(";")
if len(arr[3].split(",") )> 1:
print(line) # 40;D;5;20,30, 25;

read in one row of csv file (based on input if i can) with DictReader, then format and write to new file

I'm trying to read in a csv file with many rows and columns; i would like to print one row, in a particular format, to a text file, and do some hashing on the values. SO far, i have been able to read in the file, parse thru it using DictReader, find the row i want using an IF statement and then print the keys and values. I cannot figure out how to format it to the format i want in the end ( Key = Value \n), and i cannot figure how to write to a file (much less in the format i want) using the value of 'row' obtained below. I've been trying for days and make a little progress but cannot get it to work. Here is what i got to work (with much detail left out of results):
>>>import csv
with open("C:\path_to_script\filename_Brief.csv") as infh:
reader = csv.DictReader(infh)
for row in reader:
if row['ALIAS'] == 'Y4K':
print(row)
result-output
{'Full_Name': 'Jack Flash', 'PHONE_NO': '555 555-1212', 'ALIAS': 'Y4K'}
I'd like to ask the user to input the Alias and then use that to determine row to print. I've done a ton of research but am new-ish to Python so am asking for help! i've used pyexcel, xlrd/xlwt, even thought I'd try pandas but too much to learn. I also got it to format the way i wanted in one test but then could not get the row selection to work--in other words, it prints all the records rather than the row i want. Have 30 Firefox tabs open trying to find an answer! Thanks in advance!
The following may at least be close to what you want (I think):
import csv
with open(r'C:\path_to_script\filename_Brief.csv') as infh, \
open('new_file.txt', 'wt') as outfh:
reader = csv.DictReader(infh)
for row in reader:
if row['ALIAS'] == 'Y4K':
outfh.write('Full_Name = {Full_Name}\n'
'PHONE_NO = {PHONE_NO}\n'
'ALIAS = {ALIAS}\n'.format(**row))
This would write 3 lines formatted like this into the output file for every matchingrow:
Full_Name = Jack Flash
PHONE_NO = 555 555-1212
ALIAS = Y4K
BTW, the **rownotation means basically "take all the entries in the specified dictionary and turn them into keyword arguments for this function call". The {keyword} syntax in the format string refers to any keyword arguments that will be passed to the str.format() method.

Categories

Resources