Python CSV proplem - python

For the first time I am having a proble loading a csv into Python.
I am trying to do this. My csv file is identical to his, but longer and with different values.
When I run this,
import collections
path='../data/struc.csv'
answer = collections.defaultdict(list)
with open(path, 'r+') as istream:
for line in istream:
line = line.strip()
try:
k, v = line.split(',', 1)
answer[k.strip()].append(v.strip())
except ValueError:
print('Ignoring: malformed line: "{}"'.format(line))
print(answer)
Everything runs fine. I get exactly what you would expect.
With out copy and pasting the code from the link, in both instances I get an error.
In the accepted answer, the terminal spits back ValueError: need more than 1 value to unpack
In the second answer, I get AttributeError: 'file' object has no attribute 'split'. It also does not work if you adjust it to take a list.
I feel like the problem is the csv file itself. The head of it is
_id,parent,name,\n
Section,none,America's,\n
Section,none,Europe,\n
Section,none,Asia,\n
Section,none,Africa,\n
Country,America's,United States,\n
Country,America's,Argentina,\n
Country,America's,Bahamas,\n
Country,America's,Bolivia,\n
Country,America's,Brazil,\n
Country,America's,Colombia,\n
Country,America's,Canada,\n
Country,America's,Cayman Islands,\n
Country,America's,Chile,\n
Country,America's,Costa Rica,\n
Country,America's,Dominican Republic,\n
I have read a lot of stuff about csv's, tried the import csv stuff, and still no luck.
Please someone help. Having this kind of problem is the worst.
import re
from collections import defaultdict
parents=defaultdict(list)
path='../data/struc.csv'
with open(path, 'r+') as istream:
for i, line in enumerate(istream.split(',')):
if i != 0 and line.strip():
id_, parent, name = re.findall(r"[\d\w-]+", line)
parents[parent].append((id_, name))
Traceback (most recent call last):
File "<ipython-input-29-2b2fd98946b3>", line 1, in <module>
runfile('/home/bob/Documents/mega/tree/python/structure.py', wdir='/home/bob/Documents/mega/tree/python')
File "/home/bob/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 685, in runfile
execfile(filename, namespace)
File "/home/bob/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 78, in execfile
builtins.execfile(filename, *where)
File "/home/bob/Documents/mega/tree/python/structure.py", line 15, in <module>
for i, line in enumerate(istream.split(',')):
AttributeError: 'file' object has no attribute 'split'

First of all, Python has a special module in it's standard library for dealing with CSV of different flavours. Refer to documentation.
When CSV file has headers, csv.DictReader is probably more intuitive way to parse the file:
import collections
import csv
filepath = '../data/struc.csv'
answer = collections.defaultdict(list)
with open(filepath) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
answer[row["_id"].strip()].append(row["parent"].strip())
print(answer)
You can refer to the field in the row by their names in the header. Here I assumed you would like to use _id and parent, but you got the idea.
Also, dialect=csv.excel_tab can be added as a parameter to DictReader to parse tab-separated files.

If you plan on doing any analysis on this data, then I would suggest learning the pandas library. Pandas library takes care of all the details that seem to be tripping you up, making opening a csv file a one-liner.
import pandas as pd
csv_file = pd.read_csv(file_path)

Related

How do I write the time from datetime to a file in Python?

I'm trying to have my Python code write everything it does to a log, with a timestamp. But it doesn't seem to work.
this is my current code:
filePath= Path('.')
time=datetime.datetime.now()
bot_log = ["","Set up the file path thingy"]
with open ('bot.log', 'a') as f:
f.write('\n'.join(bot_log)%
datetime.datetime.now().strftime("%d-%b-%Y (%H:%M:%S.%f)"))
print(bot_log[0])
but when I run it it says:
Traceback (most recent call last):
File "c:\Users\Name\Yuna-Discord-Bot\Yuna Discord Bot.py", line 15, in <module>
f.write('\n'.join(bot_log)%
TypeError: not all arguments converted during string formatting
I have tried multiple things to fix it, and this is the latest one. is there something I'm doing wrong or missing? I also want the time to be in front of the log message, but I don't think it would do that (if it worked).
You need to put "%s" somewhere in the input string before string formatting. Here's more detailed explanation.
Try this:
filePath= Path('.')
time=datetime.datetime.now()
bot_log = "%s Set up the file path thingy\n"
with open ('bot.log', 'a') as f:
f.write(bot_log % datetime.datetime.now().strftime("%d-%b-%Y (%H:%M:%S.%f)"))
print(bot_log)
It looks like you want to write three strings to your file as separate lines. I've rearranged your code to create a single list to pass to writelines, which expects an iterable:
filePath= Path('.')
time=datetime.datetime.now()
bot_log = ["","Set up the file path thingy"]
with open ('bot.log', 'a') as f:
bot_log.append(datetime.datetime.now().strftime("%d-%b-%Y (%H:%M:%S.%f)"))
f.writelines('\n'.join(bot_log))
print(bot_log[0])
EDIT: From the comments the desire is to prepend the timestamp to the message and keep it on the same line. I've used f-strings as I prefer the clarity they provide:
import datetime
from pathlib import Path
filePath = Path('.')
with open('bot.log', 'a') as f:
time = datetime.datetime.now()
msg = "Set up the file path thingy"
f.write(f"""{time.strftime("%d-%b-%Y (%H:%M:%S.%f)")} {msg}\n""")
You could also look at the logging module which does a lot of this for you.

Getting an error: Line Contains Null, Not sure the cause [duplicate]

This question already has answers here:
"Line contains NULL byte" in CSV reader (Python)
(14 answers)
Closed 3 years ago.
I am getting and error: line contains NUL. I think it means there's a strange character in my CSV file. But this program and import file worked on a different machine (both Macs), so I don't know if the cause is a different version of Python or how I am running it. From reading the other entries, I am thinking this line may also be the cause:
reader = csv.reader(open(filePath, 'r', encoding="utf-8-sig", errors="ignore"))
Appreciate any help / advice!
paths CWD=/Users/sternit/Downloads/Ten-code-4, CPD=/Users/sternit/Downloads/Ten-code/
Traceback (most recent call last):
File "/Users/sternit/Downloads/Ten-code-4/Master.py", line 145, in <module>
main()
File "/Users/sternit/Downloads/Ten-code-4/Master.py", line 114, in main
playerLists = loadFiles(CPD + "PlayerFiles/")
File "/Users/sternit/Downloads/Ten-code-4/Master.py", line 50, in loadFiles
for n, row in enumerate(reader):
_csv.Error: line contains NUL
this should work fine:
data_initial = open(filePath, "rb")
data = csv.reader((line.replace('\0','') for line in data_initial), delimiter=",")
If the csv module says you have a "NULL" (silly message, should be "NUL") byte in your reading file, I would suggest checking out what is in your file.
Try use rb, it might make problem go away:
reader = csv.reader(open(filePath, 'rb', encoding="utf-8-sig", errors="ignore"))
Depends on how the file generated, there might include NULL byte, so you might need to
Open it in an editor, to see whether it is a reasonable CSV file, if the file too big, use nano or head in CLI.
Using another library like pandas, which could be more robust.
If the problem persists, you can replace all the '\x00', with empty string:
fi = open(filePath, 'rb')
data = fi.read()
fi.close()
fo = open('mynew.csv', 'wb')
fo.write(data.replace('\x00', ''))
fo.close()

Python Error: iterator should return strings, not bytes (did you open the file in text mode?) [duplicate]

I've been struggling with this simple problem for too long, so I thought I'd ask for help. I am trying to read a list of journal articles from National Library of Medicine ftp site into Python 3.3.2 (on Windows 7). The journal articles are in a .csv file.
I have tried the following code:
import csv
import urllib.request
url = "ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/file_list.csv"
ftpstream = urllib.request.urlopen(url)
csvfile = csv.reader(ftpstream)
data = [row for row in csvfile]
It results in the following error:
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
data = [row for row in csvfile]
File "<pyshell#4>", line 1, in <listcomp>
data = [row for row in csvfile]
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
I presume I should be working with strings not bytes? Any help with the simple problem, and an explanation as to what is going wrong would be greatly appreciated.
The problem relies on urllib returning bytes. As a proof, you can try to download the csv file with your browser and opening it as a regular file and the problem is gone.
A similar problem was addressed here.
It can be solved decoding bytes to strings with the appropriate encoding. For example:
import csv
import urllib.request
url = "ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/file_list.csv"
ftpstream = urllib.request.urlopen(url)
csvfile = csv.reader(ftpstream.read().decode('utf-8')) # with the appropriate encoding
data = [row for row in csvfile]
The last line could also be: data = list(csvfile) which can be easier to read.
By the way, since the csv file is very big, it can slow and memory-consuming. Maybe it would be preferable to use a generator.
EDIT:
Using codecs as proposed by Steven Rumbalski so it's not necessary to read the whole file to decode. Memory consumption reduced and speed increased.
import csv
import urllib.request
import codecs
url = "ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/file_list.csv"
ftpstream = urllib.request.urlopen(url)
csvfile = csv.reader(codecs.iterdecode(ftpstream, 'utf-8'))
for line in csvfile:
print(line) # do something with line
Note that the list is not created either for the same reason.
Even though there is already an accepted answer, I thought I'd add to the body of knowledge by showing how I achieved something similar using the requests package (which is sometimes seen as an alternative to urlib.request).
The basis of using codecs.itercode() to solve the original problem is still the same as in the accepted answer.
import codecs
from contextlib import closing
import csv
import requests
url = "ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/file_list.csv"
with closing(requests.get(url, stream=True)) as r:
reader = csv.reader(codecs.iterdecode(r.iter_lines(), 'utf-8'))
for row in reader:
print row
Here we also see the use of streaming provided through the requests package in order to avoid having to load the entire file over the network into memory first (which could take long if the file is large).
I thought it might be useful since it helped me, as I was using requests rather than urllib.request in Python 3.6.
Some of the ideas (e.g using closing()) are picked from this similar post
I had a similar problem using requests package and csv.
The response from post request was type bytes.
In order to user csv library, first I a stored them as a string file in memory (in my case the size was small), decoded utf-8.
import io
import csv
import requests
response = requests.post(url, data)
# response.content is something like:
# b'"City","Awb","Total"\r\n"Bucuresti","6733338850003","32.57"\r\n'
csv_bytes = response.content
# write in-memory string file from bytes, decoded (utf-8)
str_file = io.StringIO(csv_bytes.decode('utf-8'), newline='\n')
reader = csv.reader(str_file)
for row_list in reader:
print(row_list)
# Once the file is closed,
# any operation on the file (e.g. reading or writing) will raise a ValueError
str_file.close()
Printed something like:
['City', 'Awb', 'Total']
['Bucuresti', '6733338850003', '32.57']
urlopen will return a urllib.response.addinfourl instance for an ftp request.
For ftp, file, and data urls and requests explicity handled by legacy
URLopener and FancyURLopener classes, this function returns a
urllib.response.addinfourl object which can work as context manager...
>>> urllib2.urlopen(url)
<addinfourl at 48868168L whose fp = <addclosehook at 48777416L whose fp = <socket._fileobject object at 0x0000000002E52B88>>>
At this point ftpstream is a file like object, using .read() would return the contents however csv.reader requires an iterable in this case:
Defining a generator like so:
def to_lines(f):
line = f.readline()
while line:
yield line
line = f.readline()
We can create our csv reader like so:
reader = csv.reader(to_lines(ftps))
And with a url
url = "http://pic.dhe.ibm.com/infocenter/tivihelp/v41r1/topic/com.ibm.ismsaas.doc/reference/CIsImportMinimumSample.csv"
The code:
for row in reader: print row
Prints
>>>
['simpleci']
['SCI.APPSERVER']
['SRM_SaaS_ES', 'MXCIImport', 'AddChange', 'EN']
['CI_CINUM']
['unique_identifier1']
['unique_identifier2']

How to use string as input for csv reader without storing it to file

I'm trying to loop through rows in a csv file. I get csv file as string from a web location. I know how to create csv.reader using with when data is stored in a file. What I don't know is, how to get rows using csv.reader without storing string to a file. I'm using Python 2.7.12.
I've tried to create StringIO object like this:
from StringIO import StringIO
csv_data = "some_string\nfor_example"
with StringIO(csv_data) as input_file:
csv_reader = reader(csv_data, delimiter=",", quotechar='"')
However, I'm getting this error:
Traceback (most recent call last):
File "scraper.py", line 228, in <module>
with StringIO(csv_data) as input_file:
AttributeError: StringIO instance has no attribute '__exit__'
I understand that StringIO class doesn't have __exit__ method which is called when when finishes doing whatever it does with this object.
My answer is how to do this correctly? I suppose I can alter StringIO class by subclassing it and adding __exit__ method, but I suspect that there is easier solution.
Update:
Also, I've tried different combinations that came to my mind:
with open(StringIO(csv_data)) as input_file:
with csv_data as input_file:
but, of course, none of those worked.
>>> import csv
>>> csv_data = "some,string\nfor,example"
>>> result = csv.reader(csv_data.splitlines())
>>> list(result)
[['some', 'string'], ['for', 'example']]
You should use the io module instead of the StringIO one, because io.BytesIO for byte string or io.StringIO for Unicode ones both support the context manager interface and can be used in with statements:
from io import BytesIO
from csv import reader
csv_data = "some_string\nfor_example"
with BytesIO(csv_data) as input_file:
csv_reader = reader(input_file, delimiter=",", quotechar='"')
for row in csv_reader:
print row
If you like context managers, you can use tempfile instead:
import tempfile
with tempfile.NamedTemporaryFile(mode='w') as t:
t.write('csv_data')
t.seek(0)
csv_reader = reader(open(t.name), delimiter=",", quotechar='"')
As an advantage to pass string splitlines directly to csv reader you can write file of any size and then safely read it in csv reader without memory issues.
This file will be closed and deleted automatically

CSV Should Return Strings, Not Bytes Error

I am trying to read CSV files from a directory that is not in the same directory as my Python script.
Additionally the CSV files are stored in ZIP folders that have the exact same names (the only difference being one ends with .zip and the other is a .csv).
Currently I am using Python's zipfile and csv libraries to open and get the data from the files, however I am getting the error:
Traceback (most recent call last): File "write_pricing_data.py", line 13, in <module>
for row in reader:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
My code:
import os, csv
from zipfile import *
folder = r'D:/MarketData/forex'
localFiles = os.listdir(folder)
for file in localFiles:
zipArchive = ZipFile(folder + '/' + file)
with zipArchive.open(file[:-4] + '.csv') as csvFile:
reader = csv.reader(csvFile, delimiter=',')
for row in reader:
print(row[0])
How can I resolve this error?
It's a bit of a kludge and I'm sure there's a better way (that just happens to elude me right now). If you don't have embedded new lines, then you can use:
import zipfile, csv
zf = zipfile.ZipFile('testing.csv.zip')
with zf.open('testing.csv', 'r') as fin:
# Create a generator of decoded lines for input to csv.reader
# (the csv module is only really happy with ASCII input anyway...)
lines = (line.decode('ascii') for line in fin)
for row in csv.reader(lines):
print(row)

Categories

Resources