How to load CSV file in Jupyter Notebook? - python

I'm new and studying machine learning. I stumble upon a tutorial I found online and I'd like to make the program work so I'll get a better understanding. However, I'm getting problems about loading the CSV File into the Jupyter Notebook.
I get this error:
File "<ipython-input-2-70e07fb5b537>", line 2
student_data = pd.read_csv("C:\Users\xxxx\Desktop\student-intervention-
system\student-data.csv")
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in
position 2-3: truncated \UXXXXXXXX escape
and here is the code:
I followed tutorials online regarding this error but none worked. Does anyone know how to fix it?
3rd attempt with r"path"
I've tried also "\" and utf-8 but none worked.
I'm using the latest version of Anaconda
Windows 7
Python 3.7

Use raw string notation for your Windows path. In python '\' have meaning in python. Try instead do string like this r"path":
student_data = pd.read_csv(r"C:\Users\xxxx\Desktop\student-intervention-
system\student-data.csv")
If it doesnt work try this way:
import os
path = os.path.join('c:' + os.sep, 'Users', 'xxxx', 'Desktop', 'student-intervention-system', 'student-data.csv')
student_data = pd.read_csv(path)

Either replace all backslashes \ with frontslashes / or place a r before your filepath string to avoid this error. It is not a matter of your folder name being too long.
As Bohun Mielecki mentioned, the \ character which is typically used to denote file structure in Windows has a different function when written within a string.
From Python3 Documentation: The backslash \ character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character.
How this particularly affects your statement is that in the line
student_data = pd.read_csv("C:\Users\xxxx\Desktop\student-intervention-
system\student-data.csv")
\Users matches the escape sequence \Uxxxxxxxx whereby xxxxxxxx refers to a Character with 32-bit hex value xxxxxxxx. Because of this, Python tries to find a 32-bit hex value. However as the -sers from Users doesn't match the xxxxxxxx format, you get the error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in
position 2-3: truncated \UXXXXXXXX escape
The reason why your code works now is that you have placed a r in front of 'C:\Users\xxxx\Desktop\project\student-data.csv'. This tells python not to process the backslash character / as it usually does and read the whole string as-is.
I hope this helps you better understand your problem. If you need any more clarification, do let me know.
Source: Python 3 Documentation

I had the same problem. I tried to encode it with 'Latin-1' and it worked for me.
autos = pd.read_csv('filename',encoding = "Latin-1")

Try changing \ to /: -
import pandas as pd
student_data = pd.read_csv("C:/Users/xxxx/Desktop/student-intervention-
system/student-data.csv")
print(student data)
OR
import pandas as pd
student_data = pd.read_csv("C:/Users/xxxx/Desktop/student-intervention- system/student-data.csv"r)
print(student data)

Try this student_data = pd.read_csv("C:/Users/xxxx/Desktop/student-intervention-
system/student-data.csv").
Replacing backslashes in that code it will work for you.

You probably have a file name with a backlash... try to write the path using two backlashes instead of one.
student_data = pd.read_csv("C:\\Users\\xxxx\\Desktop\\student-intervention-system\\student-data.csv")

Try
pd.read_csv('file_name',encoding = "utf-8")

I found the problem. The problem is my folder name that is really long. I changed my folder name into "project" and the data is now finally loaded! Silly!

Please open notepad, write csv format data into the file and opt 'Save As' to save the file with format .csv.
E.g. Train.csv
Use this file, ensure you mention the same path correctly for the above saved CSV file during python coding.
Import pandas as pd
df=pd.read_csv('C:/your_path/Train.csv')
I've seen people using existing .txt/other format files to covert their format to .csv by just renaming. Which actually does nothing than changing the name of the file. It doesn't become a CSV file at all.
Hope this helps. 🙏🙏

Generally this kind of error arises if there is any space in file path...
df=pd.read_csv('/home/jovyan/binder/kidney disease.csv')
above command will create an error and will get resolved when it is
df=pd.read_csv('/home/jovyan/binder/kidney_disease.csv')
replaced space with underscore

To begin with, this has nothing to do with Jupyter. This is a pure Python problem!
In Python, as in most languages, a backslash is an special (escape) character in strings, for example "foo\"bar" is the string foo"bar without causing a syntax error. Also, "foo\nbar" is the strings foo and bar with a newline in between. There are many more escapes.
In your case, the meaning of \ in your path is literal, i.e. you actually want a backslash appearing in the string.
One option is to escape the backslash itself with another backslash: "foo\\bar" amounts to the string foo\bar. However, in your case, you have several of these, so for readability you might want to switch on "raw string mode", which disables (almost all) escapes:
r"foo\bar\baz\quux\etc"
will produce
foo\bar\baz\quuz\etc
As a matter of programming style, though, if you want your code to be portable, it is better to use os.path.join which knows the right path separator for your OS/platform:
In [1]: import os.path
In [2]: os.path.join("foo", "bar", "baz")
Out[2]: 'foo/bar/baz'
on Windows, that would produce foo\bar\baz.

import pandas as pd
data=pd.read_csv("C:\Users\ss\Desktop\file or csv file name.csv")
just place the csv file on the desktop

Related

Trouble with opening docx file, seems to be Unicode issue

I am a novice to python & this is my first small project
I am having trouble inputting a file directory to open a Word document. I tried this by copying & pasting the directory from my command prompt, but this Error appears after plugging it in. How do I convert the command prompt to UTF-8 or find the directory in Unicode?
#After importing necessary modules for the project, I access the file
from docx import Document
import pandas as pd
import docx
doc = Document('C:\Users\trisy\OneDrive\Desktop\classes\SP_22_courses\CS1110\pye_files\kw_txt.docx')
#Error message
doc = Document('C:\Users\xxx\OneDrive\Desktop\classes\SP_22_courses\xxx\pye_files\kw_txt.docx')
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
The problem is caused by the backslashes in that pathname, combined with certain other characters.
In Python, putting \x in a string can have special behavior depending on what x is.
For example, \n does not mean "backslash n"; it means a newline character.
\U is one of these special cases.
To get around this, you have two options:
Use "raw strings". Put an r before the string. r'C:\Users\...' The r tells Python that backslashes should have no special meaning.
Use forward slashes in the file path. 'C:/Users/...' These will work even on Windows.

How I can convert file with any format to text format using Python 3.6?

I am trying to have a converter that can convert any file of any format to text, so that processing becomes easier to me. I have used the Python textract library.
Here is the documentation: https://textract.readthedocs.io/en/stable/
I have install it using the pip and have tried to use it. But got error and could not understand how to resolve it.
>>> import textract
>>> text = textract.process('C:\Users\beta\Desktop\Projects Done With Specification.pdf', method='pdfminer')
File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
Even I have tried using the command without specifying method.
>>> import textract
>>> text = textract.process('C:\Users\beta\Desktop\Projects Done With Specification.pdf')
File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
Kindly let me know how I can get rid of this issue with your suggestion. If it is possible then please suggest me the solution, if there is anything else that can be handy instead of textract, then still you can suggest me. I would like to hear.
The \ character means different things in different contexts. In Windows pathnames, it is the directory separator. In Python strings, it introduces escape sequences. When specifying paths, you have to account for this.
Try any one of these:
text = textract.process('C:\\Users\\beta\\Desktop\\Projects Done With Specification.pdf', method='pdfminer')
text = textract.process(r'C:\Users\beta\Desktop\Projects Done With Specification.pdf', method='pdfminer')
text = textract.process('C:/Users/beta/Desktop/Projects Done With Specification.pdf', method='pdfminer')
The problem is with the string
'C:\Users\beta\Desktop\Projects Done With Specification.pdf'
The \U starts an eight-character Unicode escape, such as '\U00014321`. In your code, the escape is followed by the character 's', which is invalid.
You either need to duplicate all backslashes, or prefix the string with r (to produce a raw string).
Try encoding='utf-8'
textract.process('C:\Users\beta\Desktop\Projects Done With Specification.pdf', encoding='utf-8')
In your case, error is due to invalid path.
Try this and it works:
'C:\Users\beta\Desktop\Projects Done With Specification.pdf'
"OR"
'C:/Users/beta/Desktop/Projects Done With Specification.pdf'
import textract
text = textract.process(r'C:\Users\myname\Desktop\doc\an.docx', encoding='utf-8')
this worked for me. Try.
textract doesn't work for me, when I was trying to convert slurm file output to text file. But simple with open did.
with open('disktest.o1761955', 'r') as f:
txt = f.read()

Python - Must add r when opening a file

I have several .py files and I can open my file everywhere, except in my test.py file (I test scripts and functions there) instead of this:
file = open("C:\Users\User\Desktop\key_values.txt", "r")
I need to use this (with r) to avoid error:
file = open(r"C:\Users\User\Desktop\key_values.txt", "r")
I get this error: (when I try to open a file without r in my test.py script)
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
Any idea why is this happening ?
Backslash is an escape character, so you can include characters like "\n" (new line) and "\t" (tab). The r before the string means means "my backslashes are not escape characters".
Interestingly, it looks like your string "C:\Users\User\Desktop\key_values.txt" works ok in python 2 because none of the backslashes are part of anything looking like a known escape sequence. But in python 3, "\Uxxxx" indicates a unicode character. So maybe that is why some of your python files can cope and some can't.
The other answers are OK.. but this a time saving trick:
Try using slashes instead of backslashes:
file = open("C:/Users/User/Desktop/key_values.txt", "r")
It works in Windows. Tried with Python 2.7
Hope this helps

Python "SyntaxError: Non-ASCII character '\xe2' in file" [duplicate]

This question already has answers here:
SyntaxError: Non-ASCII character '\xa3' in file when function returns '£'
(6 answers)
Closed 2 years ago.
I am writing some python code and I am receiving the error message as in the title, from searching this has to do with the character set.
Here is the line that causes the error
hc = HealthCheck("instance_health", interval=15, target808="HTTP:8080/index.html")
I cannot figure out what character is not in the ANSI ASCII set? Furthermore searching "\xe2" does not give anymore information as to what character that appears as. Which character in that line is causing the issue?
I have also seen a few fixes for this issue but I am not sure which to use. Could someone clarify what the issue is (python doesn't interpret unicode unless told to do so?), and how I would clear it up properly?
EDIT:
Here are all the lines near the one that errors
def createLoadBalancer():
conn = ELBConnection(creds.awsAccessKey, creds.awsSecretKey)
hc = HealthCheck("instance_health", interval=15, target808="HTTP:8080/index.html")
lb = conn.create_load_balancer('my_lb', ['us-east-1a', 'us-east-1b'],[(80, 8080, 'http'), (443, 8443, 'tcp')])
lb.configure_health_check(hc)
return lb
If you are just trying to use UTF-8 characters or don't care if they are in your code, add this line to the top of your .py file
# -*- coding: utf-8 -*-
You've got a stray byte floating around. You can find it by running
with open("x.py") as fp:
for i, line in enumerate(fp):
if "\xe2" in line:
print i, repr(line)
where you should replace "x.py" by the name of your program. You'll see the line number and the offending line(s). For example, after inserting that byte arbitrarily, I got:
4 "\xe2 lb = conn.create_load_balancer('my_lb', ['us-east-1a', 'us-east-1b'],[(80, 8080, 'http'), (443, 8443, 'tcp')])\n"
Or you could just simply use:
# coding: utf-8
at top of .py file
\xe2 is the '-' character, it appears in some copy and paste it uses a different equal looking '-' that causes encoding errors.
Replace the '-'(from copy paste) with the correct '-' (from you keyboard button).
Change the file character encoding,
put below line to top of your code always
# -*- coding: utf-8 -*-
I had the same error while copying and pasting a comment from the web
For me it was a single quote (') in the word
I just erased it and re-typed it.
Adding # coding=utf-8 line in first line of your .py file will fix the problem.
Please read more about the problem and its fix on below link, in this article problem and its solution is beautifully described : https://www.python.org/dev/peps/pep-0263/
I got this error for characters in my comments (from copying/pasting content from the web into my editor for note-taking purposes).
To resolve in Text Wrangler:
Highlight the text
Go the the Text menu
Select "Convert to ASCII"
Based on PEP 0263 -- Defining Python Source Code Encodings
Python will default to ASCII as standard encoding if no other
encoding hints are given.
To define a source code encoding, a magic comment must
be placed into the source files either as first or second
line in the file, such as:
# coding=<encoding name>
or (using formats recognized by popular editors)
#!/usr/bin/python
# -*- coding: <encoding name> -*-
or
#!/usr/bin/python
# vim: set fileencoding=<encoding name> :
I had the same issue and just added this to the top of my file (in Python 3 I didn't have the problem but do in Python 2
#!/usr/local/bin/python
# coding: latin-1
If it helps anybody, for me that happened because I was trying to run a Django implementation in python 3.4 with my python 2.7 command
I my case \xe2 was a ’ which should be replaced by '.
In general I recommend to convert UTF-8 to ASCII using e.g. https://onlineasciitools.com/convert-utf8-to-ascii
However if you want to keep UTF-8 you can use
#-*- mode: python -*-
# -*- coding: utf-8 -*-
After about a half hour of looking through stack overflow, It dawned on me that if the use of a single quote " ' " in a comment will through the error:
SyntaxError: Non-ASCII character '\xe2' in file
After looking at the traceback i was able to locate the single quote used in my comment.
I had this exact issue running the simple .py code below:
import sys
print 'version is:', sys.version
DSM's code above provided the following:
1 'print \xe2\x80\x98version is\xe2\x80\x99, sys.version'
So the issue was that my text editor used SMART QUOTES, as John Y suggested. After changing the text editor settings and re-opening/saving the file, it works just fine.
I am trying to parse that weird windows apostraphe and after trying several things here is the code snippet that works.
def convert_freaking_apostrophe(self,string):
try:
issuer_rename = string.decode('windows-1252')
except:
issuer_rename = string.decode('latin-1')
issuer_rename = issuer_rename.replace(u'’', u"'")
issuer_rename = issuer_rename.encode('ascii','ignore')
try:
os.rename(directory+"/"+issuer,directory+"/"+issuer_rename)
print "Successfully renamed "+issuer+" to "+issuer_rename
return issuer_rename
except:
pass
#HANDLING FOR FUNKY APOSTRAPHE
if re.search(r"([\x90-\xff])", issuer):
issuer = self.convert_freaking_apostrophe(issuer)
I fixed this using pycharm. At the bottom of pycharm you can see file encoding. I noticed that it is UT-8. I changed it to US-ASCII
I had the same issue but it was because I copied and pasted the string as it is.
Later when I manually typed the string as it is the error vanished.
I had the error due to the - sign. When I replaced it with manually inputting a - the error was solved.
Copied string 10 + 3 * 5/(16 − 4)
Manually typed string 10 + 3 * 5/(16 - 4)
you can clearly see there is a bit of difference between both the hyphens.
I think it's because of the different formatting used by different OS or maybe just different software.
For me the problem had caused due to "’" that symbol in the quotes. As i had copied the code from a pdf file it caused that error. I just replaced "’" by this "'".
If you want to spot what character caused this just assign the problematic variable to a string and print it in a iPython console.
In my case
In [1]: array = [[24.9, 50.5]​, [11.2, 51.0]] # Raises an error
In [2]: string = "[[24.9, 50.5]​, [11.2, 51.0]]" # Manually paste the above array here
In [3]: string
Out [3]: '[[24.9, 50.5]\xe2\x80\x8b, [11.2, 51.0]]' # Here they are!
for me, the problem was caused by typing my code into Mac Notes and then copied it from Mac Notes and pasted into my vim session to create my file. This made my single quotes the curved type. to fix it I opened my file in vim and replaced all my curved single quotes with the straight kind, just by removing and retyping the same character. It was Mac Notes that made the same key stroke produce the curved single quote.
I was unable to find what's the issue for long but later I realised that I had copied a line "UTC-12:00" from web and the hyphen/dash in this was causing the problem. I just wrote this "-" again and the problem got resolved.
So, sometimes the copy pasted lines also give errors. In such cases, just re-write the copy pasted code and it works. On re-writing, it would look like nothing got changed but the error will be gone.
Plenty of good solutions here.
One challenge not really addressed in any of them is how to visually identify certain hard-to-spot non-ASCII characters that resemble other plain ASCII ones. For example, en dashes can appear almost exactly like hyphens and curly quotes look a lot like straight quotes, depending on your text editor's font.
This one-liner, which should work on Mac or Linux, will strip characters not in the ASCII printable range and show you the differences side-by-side:
# assumes Bash shell; for Bourne shell (sh), rearrange as a pipe and
# give '-' as second argument to 'sdiff' instead
sdiff --suppress-common-lines script.py <(tr -cd '\11\12\15\40-\176' <script.py)
The characters \11, \12, and \15 are tab, newline, and carriage return, respectively, in octal; the remaining range is the visible ASCII characters. (hat tip)
Another tip gleaned from this SO thread uses an inverse character class consisting of anything not in the ASCII visible range, and highlights it:
grep --color '[^ -~]' script.py
This should also work fine with the macOS / BSD version of grep.
When I have a similar issue when reading text files i use...
f = open('file','rt', errors='ignore')

EOL Error with file path string in Python

I am trying to create a simple sting that points to a folder which contains a file on my C drive. The string is as follows:
filelocation = "C:\Documents\Folder\"
I am getting an EOL error which I think is being caused by the backslashes. Is it possible to have these backslashes in the string or is there another way of achieving this?
Thanks
Python on Windows supports forward slashes:
filelocation = "C:/Documents/Folder/"
Alternatively, escape each of your \ characters:
filelocation = "C:\\Documents\\Folder\\"
The reason you're getting the error is because of the final \ character - Python is interpreting that as an escape sequence, and it thinks the string has not been terminated. To work around it, use one of my solutions above, or just omit the final \.
On Windows: filelocation = "C:\\Documents\\Folder\\"
On Linux: filelocation = "C:/Documents/Folder/"

Categories

Resources