Trouble with opening docx file, seems to be Unicode issue

Trouble with opening docx file, seems to be Unicode issue - python

I am a novice to python & this is my first small project
I am having trouble inputting a file directory to open a Word document. I tried this by copying & pasting the directory from my command prompt, but this Error appears after plugging it in. How do I convert the command prompt to UTF-8 or find the directory in Unicode?
#After importing necessary modules for the project, I access the file
from docx import Document
import pandas as pd
import docx
doc = Document('C:\Users\trisy\OneDrive\Desktop\classes\SP_22_courses\CS1110\pye_files\kw_txt.docx')
#Error message
doc = Document('C:\Users\xxx\OneDrive\Desktop\classes\SP_22_courses\xxx\pye_files\kw_txt.docx')
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

The problem is caused by the backslashes in that pathname, combined with certain other characters.
In Python, putting \x in a string can have special behavior depending on what x is.
For example, \n does not mean "backslash n"; it means a newline character.
\U is one of these special cases.
To get around this, you have two options:
Use "raw strings". Put an r before the string. r'C:\Users\...' The r tells Python that backslashes should have no special meaning.
Use forward slashes in the file path. 'C:/Users/...' These will work even on Windows.

Related

How to load CSV file in Jupyter Notebook?

I'm new and studying machine learning. I stumble upon a tutorial I found online and I'd like to make the program work so I'll get a better understanding. However, I'm getting problems about loading the CSV File into the Jupyter Notebook.
I get this error:
File "<ipython-input-2-70e07fb5b537>", line 2
student_data = pd.read_csv("C:\Users\xxxx\Desktop\student-intervention-
system\student-data.csv")
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in
position 2-3: truncated \UXXXXXXXX escape
and here is the code:
I followed tutorials online regarding this error but none worked. Does anyone know how to fix it?
3rd attempt with r"path"
I've tried also "\" and utf-8 but none worked.
I'm using the latest version of Anaconda
Windows 7
Python 3.7

Use raw string notation for your Windows path. In python '\' have meaning in python. Try instead do string like this r"path":
student_data = pd.read_csv(r"C:\Users\xxxx\Desktop\student-intervention-
system\student-data.csv")
If it doesnt work try this way:
import os
path = os.path.join('c:' + os.sep, 'Users', 'xxxx', 'Desktop', 'student-intervention-system', 'student-data.csv')
student_data = pd.read_csv(path)

Either replace all backslashes \ with frontslashes / or place a r before your filepath string to avoid this error. It is not a matter of your folder name being too long.
As Bohun Mielecki mentioned, the \ character which is typically used to denote file structure in Windows has a different function when written within a string.
From Python3 Documentation: The backslash \ character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character.
How this particularly affects your statement is that in the line
student_data = pd.read_csv("C:\Users\xxxx\Desktop\student-intervention-
system\student-data.csv")
\Users matches the escape sequence \Uxxxxxxxx whereby xxxxxxxx refers to a Character with 32-bit hex value xxxxxxxx. Because of this, Python tries to find a 32-bit hex value. However as the -sers from Users doesn't match the xxxxxxxx format, you get the error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in
position 2-3: truncated \UXXXXXXXX escape
The reason why your code works now is that you have placed a r in front of 'C:\Users\xxxx\Desktop\project\student-data.csv'. This tells python not to process the backslash character / as it usually does and read the whole string as-is.
I hope this helps you better understand your problem. If you need any more clarification, do let me know.
Source: Python 3 Documentation

I had the same problem. I tried to encode it with 'Latin-1' and it worked for me.
autos = pd.read_csv('filename',encoding = "Latin-1")

Try changing \ to /: -
import pandas as pd
student_data = pd.read_csv("C:/Users/xxxx/Desktop/student-intervention-
system/student-data.csv")
print(student data)
OR
import pandas as pd
student_data = pd.read_csv("C:/Users/xxxx/Desktop/student-intervention- system/student-data.csv"r)
print(student data)

Try this student_data = pd.read_csv("C:/Users/xxxx/Desktop/student-intervention-
system/student-data.csv").
Replacing backslashes in that code it will work for you.

You probably have a file name with a backlash... try to write the path using two backlashes instead of one.
student_data = pd.read_csv("C:\\Users\\xxxx\\Desktop\\student-intervention-system\\student-data.csv")

Try
pd.read_csv('file_name',encoding = "utf-8")

I found the problem. The problem is my folder name that is really long. I changed my folder name into "project" and the data is now finally loaded! Silly!

Please open notepad, write csv format data into the file and opt 'Save As' to save the file with format .csv.
E.g. Train.csv
Use this file, ensure you mention the same path correctly for the above saved CSV file during python coding.
Import pandas as pd
df=pd.read_csv('C:/your_path/Train.csv')
I've seen people using existing .txt/other format files to covert their format to .csv by just renaming. Which actually does nothing than changing the name of the file. It doesn't become a CSV file at all.
Hope this helps. 🙏🙏

Generally this kind of error arises if there is any space in file path...
df=pd.read_csv('/home/jovyan/binder/kidney disease.csv')
above command will create an error and will get resolved when it is
df=pd.read_csv('/home/jovyan/binder/kidney_disease.csv')
replaced space with underscore

To begin with, this has nothing to do with Jupyter. This is a pure Python problem!
In Python, as in most languages, a backslash is an special (escape) character in strings, for example "foo\"bar" is the string foo"bar without causing a syntax error. Also, "foo\nbar" is the strings foo and bar with a newline in between. There are many more escapes.
In your case, the meaning of \ in your path is literal, i.e. you actually want a backslash appearing in the string.
One option is to escape the backslash itself with another backslash: "foo\\bar" amounts to the string foo\bar. However, in your case, you have several of these, so for readability you might want to switch on "raw string mode", which disables (almost all) escapes:
r"foo\bar\baz\quux\etc"
will produce
foo\bar\baz\quuz\etc
As a matter of programming style, though, if you want your code to be portable, it is better to use os.path.join which knows the right path separator for your OS/platform:
In [1]: import os.path
In [2]: os.path.join("foo", "bar", "baz")
Out[2]: 'foo/bar/baz'
on Windows, that would produce foo\bar\baz.

import pandas as pd
data=pd.read_csv("C:\Users\ss\Desktop\file or csv file name.csv")
just place the csv file on the desktop

why can't I open/ interact with files through python

I'm new to coding and have started to try out the OS module, it occasionally it will work on specific paths.
example:
but when I try to interact with an individual file this will happen:
print(os.stat('my_file.txt'))
>>>filenotfounderror: [errno 2] no such file or directory found.
'my_file.txt'
or when I try to interact with a path that is not in my cwd then this would happen:
print(os.listdir(C:\folder\folder\folder))
>>>SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in
position 2-3: truncated \UXXXXXXXX escape
I don't understand why this is happening and it would be great if someone could explain why this is happening, thanks.

Python tells your that my_file.txt does not exist in the current directory in your first example.
Verify that you have a file called my_file.txt and then check the current working directory of your python process with os.getcwd().
For your second example, in python the backslash \ is a special character for escape sequences in a string. For example the linefeed \n or the tab \t.
The error in your example is most likely the result of accidentaly forming an invalid escape sequence by not escaping the backslash itself like this:
print(os.listdir('C:\\folder\\folder\\folder'))

Pygame sound not importing correctly [duplicate]

This question already has answers here:
How should I write a Windows path in a Python string literal?
(5 answers)
Closed 2 years ago.
The community reviewed whether to reopen this question last year and left it closed:
Original close reason(s) were not resolved
I am using Python 3.1 on a Windows 7 machine. Russian is the default system language, and utf-8 is the default encoding.
Looking at the answer to a previous question, I have attempting using the "codecs" module to give me a little luck. Here's a few examples:
>>> g = codecs.open("C:\Users\Eric\Desktop\beeline.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-4: truncated \UXXXXXXXX escape (<pyshell#39>, line 1)
>>> g = codecs.open("C:\Users\Eric\Desktop\Site.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-4: truncated \UXXXXXXXX escape (<pyshell#40>, line 1)
>>> g = codecs.open("C:\Python31\Notes.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 11-12: malformed \N character escape (<pyshell#41>, line 1)
>>> g = codecs.open("C:\Users\Eric\Desktop\Site.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-4: truncated \UXXXXXXXX escape (<pyshell#44>, line 1)
My last idea was, I thought it might have been the fact that Windows "translates" a few folders, such as the "users" folder, into Russian (though typing "users" is still the correct path), so I tried it in the Python31 folder. Still, no luck. Any ideas?

The problem is with the string
"C:\Users\Eric\Desktop\beeline.txt"
Here, \U in "C:\Users... starts an eight-character Unicode escape, such as \U00014321. In your code, the escape is followed by the character 's', which is invalid.
You either need to duplicate all backslashes:
"C:\\Users\\Eric\\Desktop\\beeline.txt"
Or prefix the string with r (to produce a raw string):
r"C:\Users\Eric\Desktop\beeline.txt"

Typical error on Windows because the default user directory is C:\user\<your_user>, so when you want to pass this path as a string argument into a Python function, you get a Unicode error, just because the \u is a Unicode escape. If the next 8 characters after the \u are not numeric this produces an error.
To solve it, just double the backslashes: C:\\user\\<\your_user>...
This will ensure that Python treats the single backslashes as single backslashes.

Prefixing with 'r' works very well, but it needs to be in the correct syntax. For example:
passwordFile = open(r'''C:\Users\Bob\SecretPasswordFile.txt''')
No need for \\ here - maintains readability and works well.

With Python 3 I had this problem:
self.path = 'T:\PythonScripts\Projects\Utilities'
produced this error:
self.path = 'T:\PythonScripts\Projects\Utilities'
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in
position 25-26: truncated \UXXXXXXXX escape
the fix that worked is:
self.path = r'T:\PythonScripts\Projects\Utilities'
It seems the '\U' was producing an error and the 'r' preceding the string turns off the eight-character Unicode escape (for a raw string) which was failing. (This is a bit of an over-simplification, but it works if you don't care about unicode)
Hope this helps someone

Or you could replace '\' with '/' in the path.

path = pd.read_csv(**'C:\Users\mravi\Desktop\filename'**)
The error is because of the path that is mentioned
Add 'r' before the path
path = pd.read_csv(**r'C:\Users\mravi\Desktop\filename'**)
This would work fine.

I had this same error in python 3.2.
I have script for email sending and:
csv.reader(open('work_dir\uslugi1.csv', newline='', encoding='utf-8'))
when I remove first char in file uslugi1.csv works fine.

Refer to openpyxl document, you can do changes as followings.
from openpyxl import Workbook
from openpyxl.drawing.image import Image
wb = Workbook()
ws = wb.active
ws['A1'] = 'Insert a xxx.PNG'
# Reload an image
img = Image(**r**'x:\xxx\xxx\xxx.png')
# Insert to worksheet and anchor next to cells
ws.add_image(img, 'A2')
wb.save(**r**'x:\xxx\xxx.xlsx')

I had same error, just uninstalled and installed again the numpy package, that worked!

I had this error.
I have a main python script which calls in functions from another, 2nd, python script.
At the end of the first script I had a comment block designated with ''' '''.
I was getting this error because of this commenting code block.
I repeated the error multiple times once I found it to ensure this was the error, & it was.
I am still unsure why.

How do I load fiducial node data from Slicer in Python? [duplicate]

This question already has answers here:
How should I write a Windows path in a Python string literal?
(5 answers)
Closed 2 years ago.
The community reviewed whether to reopen this question last year and left it closed:
Original close reason(s) were not resolved
I am using Python 3.1 on a Windows 7 machine. Russian is the default system language, and utf-8 is the default encoding.
Looking at the answer to a previous question, I have attempting using the "codecs" module to give me a little luck. Here's a few examples:
>>> g = codecs.open("C:\Users\Eric\Desktop\beeline.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-4: truncated \UXXXXXXXX escape (<pyshell#39>, line 1)
>>> g = codecs.open("C:\Users\Eric\Desktop\Site.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-4: truncated \UXXXXXXXX escape (<pyshell#40>, line 1)
>>> g = codecs.open("C:\Python31\Notes.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 11-12: malformed \N character escape (<pyshell#41>, line 1)
>>> g = codecs.open("C:\Users\Eric\Desktop\Site.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-4: truncated \UXXXXXXXX escape (<pyshell#44>, line 1)
My last idea was, I thought it might have been the fact that Windows "translates" a few folders, such as the "users" folder, into Russian (though typing "users" is still the correct path), so I tried it in the Python31 folder. Still, no luck. Any ideas?

The problem is with the string
"C:\Users\Eric\Desktop\beeline.txt"
Here, \U in "C:\Users... starts an eight-character Unicode escape, such as \U00014321. In your code, the escape is followed by the character 's', which is invalid.
You either need to duplicate all backslashes:
"C:\\Users\\Eric\\Desktop\\beeline.txt"
Or prefix the string with r (to produce a raw string):
r"C:\Users\Eric\Desktop\beeline.txt"

Typical error on Windows because the default user directory is C:\user\<your_user>, so when you want to pass this path as a string argument into a Python function, you get a Unicode error, just because the \u is a Unicode escape. If the next 8 characters after the \u are not numeric this produces an error.
To solve it, just double the backslashes: C:\\user\\<\your_user>...
This will ensure that Python treats the single backslashes as single backslashes.

Prefixing with 'r' works very well, but it needs to be in the correct syntax. For example:
passwordFile = open(r'''C:\Users\Bob\SecretPasswordFile.txt''')
No need for \\ here - maintains readability and works well.

With Python 3 I had this problem:
self.path = 'T:\PythonScripts\Projects\Utilities'
produced this error:
self.path = 'T:\PythonScripts\Projects\Utilities'
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in
position 25-26: truncated \UXXXXXXXX escape
the fix that worked is:
self.path = r'T:\PythonScripts\Projects\Utilities'
It seems the '\U' was producing an error and the 'r' preceding the string turns off the eight-character Unicode escape (for a raw string) which was failing. (This is a bit of an over-simplification, but it works if you don't care about unicode)
Hope this helps someone

Or you could replace '\' with '/' in the path.

path = pd.read_csv(**'C:\Users\mravi\Desktop\filename'**)
The error is because of the path that is mentioned
Add 'r' before the path
path = pd.read_csv(**r'C:\Users\mravi\Desktop\filename'**)
This would work fine.

I had this same error in python 3.2.
I have script for email sending and:
csv.reader(open('work_dir\uslugi1.csv', newline='', encoding='utf-8'))
when I remove first char in file uslugi1.csv works fine.

Refer to openpyxl document, you can do changes as followings.
from openpyxl import Workbook
from openpyxl.drawing.image import Image
wb = Workbook()
ws = wb.active
ws['A1'] = 'Insert a xxx.PNG'
# Reload an image
img = Image(**r**'x:\xxx\xxx\xxx.png')
# Insert to worksheet and anchor next to cells
ws.add_image(img, 'A2')
wb.save(**r**'x:\xxx\xxx.xlsx')

I had same error, just uninstalled and installed again the numpy package, that worked!

I had this error.
I have a main python script which calls in functions from another, 2nd, python script.
At the end of the first script I had a comment block designated with ''' '''.
I was getting this error because of this commenting code block.
I repeated the error multiple times once I found it to ensure this was the error, & it was.
I am still unsure why.

Python - Must add r when opening a file

I have several .py files and I can open my file everywhere, except in my test.py file (I test scripts and functions there) instead of this:
file = open("C:\Users\User\Desktop\key_values.txt", "r")
I need to use this (with r) to avoid error:
file = open(r"C:\Users\User\Desktop\key_values.txt", "r")
I get this error: (when I try to open a file without r in my test.py script)
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
Any idea why is this happening ?

Backslash is an escape character, so you can include characters like "\n" (new line) and "\t" (tab). The r before the string means means "my backslashes are not escape characters".
Interestingly, it looks like your string "C:\Users\User\Desktop\key_values.txt" works ok in python 2 because none of the backslashes are part of anything looking like a known escape sequence. But in python 3, "\Uxxxx" indicates a unicode character. So maybe that is why some of your python files can cope and some can't.

The other answers are OK.. but this a time saving trick:
Try using slashes instead of backslashes:
file = open("C:/Users/User/Desktop/key_values.txt", "r")
It works in Windows. Tried with Python 2.7
Hope this helps

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trouble with opening docx file, seems to be Unicode issue - python

Related

How to load CSV file in Jupyter Notebook?

why can't I open/ interact with files through python

Pygame sound not importing correctly [duplicate]

How do I load fiducial node data from Slicer in Python? [duplicate]

Python - Must add r when opening a file

Categories

Resources