CSV file can't be read using great expectation

CSV file can't be read using great expectation - python

when I run this code on pycharm using python:
import great_expectations as ge
df=ge.read_csv("C:\Users\TasbeehJ\data\yellow_tripdata_2019-02.csv")
it gave me this error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
how to solve it?

The esiast way whould be to add an r to use the \as a sign and not an escape value.
import great_expectations as ge
df=ge.read_csv(r"C:\Users\TasbeehJ\data\yellow_tripdata_2019-02.csv")

You can try by looking here, https://docs.python.org/3.7/library/csv.html?highlight=csv#module-csv
Maybe you can try writing:
>>> import csv
>>> with open('yellow_tripdata_2019-02.csv', newline='') as csvfile:

Related

change directory in python

from PIL import Image
import os
for f in os.listdir('C:\Users\diodi\Pictures'):
if f.endswith('.jpg'):
print(f)
i get the error
for f in os.listdir('C:\Users\diodi\Pictures'):
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
if someone can edit the error message please do.
i want to print the names of the pictures(jpg) i have in
('C:\Users\diodi\Pictures')
i am using python 3.7,i know i didn't use the pillow library yet.

The backslashes are being parsed as escape characters, use r to denote raw string
os.listdir(r"C:\Users\diodi\Pictures"):
Or escape them with more backslashes
os.listdir('C:\\Users\\diodi\\Pictures'):

How I can convert file with any format to text format using Python 3.6?

I am trying to have a converter that can convert any file of any format to text, so that processing becomes easier to me. I have used the Python textract library.
Here is the documentation: https://textract.readthedocs.io/en/stable/
I have install it using the pip and have tried to use it. But got error and could not understand how to resolve it.
>>> import textract
>>> text = textract.process('C:\Users\beta\Desktop\Projects Done With Specification.pdf', method='pdfminer')
File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
Even I have tried using the command without specifying method.
>>> import textract
>>> text = textract.process('C:\Users\beta\Desktop\Projects Done With Specification.pdf')
File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
Kindly let me know how I can get rid of this issue with your suggestion. If it is possible then please suggest me the solution, if there is anything else that can be handy instead of textract, then still you can suggest me. I would like to hear.

The \ character means different things in different contexts. In Windows pathnames, it is the directory separator. In Python strings, it introduces escape sequences. When specifying paths, you have to account for this.
Try any one of these:
text = textract.process('C:\\Users\\beta\\Desktop\\Projects Done With Specification.pdf', method='pdfminer')
text = textract.process(r'C:\Users\beta\Desktop\Projects Done With Specification.pdf', method='pdfminer')
text = textract.process('C:/Users/beta/Desktop/Projects Done With Specification.pdf', method='pdfminer')

The problem is with the string
'C:\Users\beta\Desktop\Projects Done With Specification.pdf'
The \U starts an eight-character Unicode escape, such as '\U00014321`. In your code, the escape is followed by the character 's', which is invalid.
You either need to duplicate all backslashes, or prefix the string with r (to produce a raw string).

Try encoding='utf-8'
textract.process('C:\Users\beta\Desktop\Projects Done With Specification.pdf', encoding='utf-8')

In your case, error is due to invalid path.
Try this and it works:
'C:\Users\beta\Desktop\Projects Done With Specification.pdf'
"OR"
'C:/Users/beta/Desktop/Projects Done With Specification.pdf'

import textract
text = textract.process(r'C:\Users\myname\Desktop\doc\an.docx', encoding='utf-8')
this worked for me. Try.

textract doesn't work for me, when I was trying to convert slurm file output to text file. But simple with open did.
with open('disktest.o1761955', 'r') as f:
txt = f.read()

Pygame sound not importing correctly [duplicate]

This question already has answers here:
How should I write a Windows path in a Python string literal?
(5 answers)
Closed 2 years ago.
The community reviewed whether to reopen this question last year and left it closed:
Original close reason(s) were not resolved
I am using Python 3.1 on a Windows 7 machine. Russian is the default system language, and utf-8 is the default encoding.
Looking at the answer to a previous question, I have attempting using the "codecs" module to give me a little luck. Here's a few examples:
>>> g = codecs.open("C:\Users\Eric\Desktop\beeline.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-4: truncated \UXXXXXXXX escape (<pyshell#39>, line 1)
>>> g = codecs.open("C:\Users\Eric\Desktop\Site.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-4: truncated \UXXXXXXXX escape (<pyshell#40>, line 1)
>>> g = codecs.open("C:\Python31\Notes.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 11-12: malformed \N character escape (<pyshell#41>, line 1)
>>> g = codecs.open("C:\Users\Eric\Desktop\Site.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-4: truncated \UXXXXXXXX escape (<pyshell#44>, line 1)
My last idea was, I thought it might have been the fact that Windows "translates" a few folders, such as the "users" folder, into Russian (though typing "users" is still the correct path), so I tried it in the Python31 folder. Still, no luck. Any ideas?

The problem is with the string
"C:\Users\Eric\Desktop\beeline.txt"
Here, \U in "C:\Users... starts an eight-character Unicode escape, such as \U00014321. In your code, the escape is followed by the character 's', which is invalid.
You either need to duplicate all backslashes:
"C:\\Users\\Eric\\Desktop\\beeline.txt"
Or prefix the string with r (to produce a raw string):
r"C:\Users\Eric\Desktop\beeline.txt"

Typical error on Windows because the default user directory is C:\user\<your_user>, so when you want to pass this path as a string argument into a Python function, you get a Unicode error, just because the \u is a Unicode escape. If the next 8 characters after the \u are not numeric this produces an error.
To solve it, just double the backslashes: C:\\user\\<\your_user>...
This will ensure that Python treats the single backslashes as single backslashes.

Prefixing with 'r' works very well, but it needs to be in the correct syntax. For example:
passwordFile = open(r'''C:\Users\Bob\SecretPasswordFile.txt''')
No need for \\ here - maintains readability and works well.

With Python 3 I had this problem:
self.path = 'T:\PythonScripts\Projects\Utilities'
produced this error:
self.path = 'T:\PythonScripts\Projects\Utilities'
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in
position 25-26: truncated \UXXXXXXXX escape
the fix that worked is:
self.path = r'T:\PythonScripts\Projects\Utilities'
It seems the '\U' was producing an error and the 'r' preceding the string turns off the eight-character Unicode escape (for a raw string) which was failing. (This is a bit of an over-simplification, but it works if you don't care about unicode)
Hope this helps someone

Or you could replace '\' with '/' in the path.

path = pd.read_csv(**'C:\Users\mravi\Desktop\filename'**)
The error is because of the path that is mentioned
Add 'r' before the path
path = pd.read_csv(**r'C:\Users\mravi\Desktop\filename'**)
This would work fine.

I had this same error in python 3.2.
I have script for email sending and:
csv.reader(open('work_dir\uslugi1.csv', newline='', encoding='utf-8'))
when I remove first char in file uslugi1.csv works fine.

Refer to openpyxl document, you can do changes as followings.
from openpyxl import Workbook
from openpyxl.drawing.image import Image
wb = Workbook()
ws = wb.active
ws['A1'] = 'Insert a xxx.PNG'
# Reload an image
img = Image(**r**'x:\xxx\xxx\xxx.png')
# Insert to worksheet and anchor next to cells
ws.add_image(img, 'A2')
wb.save(**r**'x:\xxx\xxx.xlsx')

I had same error, just uninstalled and installed again the numpy package, that worked!

I had this error.
I have a main python script which calls in functions from another, 2nd, python script.
At the end of the first script I had a comment block designated with ''' '''.
I was getting this error because of this commenting code block.
I repeated the error multiple times once I found it to ensure this was the error, & it was.
I am still unsure why.

How do I load fiducial node data from Slicer in Python? [duplicate]

This question already has answers here:
How should I write a Windows path in a Python string literal?
(5 answers)
Closed 2 years ago.
The community reviewed whether to reopen this question last year and left it closed:
Original close reason(s) were not resolved
I am using Python 3.1 on a Windows 7 machine. Russian is the default system language, and utf-8 is the default encoding.
Looking at the answer to a previous question, I have attempting using the "codecs" module to give me a little luck. Here's a few examples:
>>> g = codecs.open("C:\Users\Eric\Desktop\beeline.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-4: truncated \UXXXXXXXX escape (<pyshell#39>, line 1)
>>> g = codecs.open("C:\Users\Eric\Desktop\Site.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-4: truncated \UXXXXXXXX escape (<pyshell#40>, line 1)
>>> g = codecs.open("C:\Python31\Notes.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 11-12: malformed \N character escape (<pyshell#41>, line 1)
>>> g = codecs.open("C:\Users\Eric\Desktop\Site.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-4: truncated \UXXXXXXXX escape (<pyshell#44>, line 1)
My last idea was, I thought it might have been the fact that Windows "translates" a few folders, such as the "users" folder, into Russian (though typing "users" is still the correct path), so I tried it in the Python31 folder. Still, no luck. Any ideas?

The problem is with the string
"C:\Users\Eric\Desktop\beeline.txt"
Here, \U in "C:\Users... starts an eight-character Unicode escape, such as \U00014321. In your code, the escape is followed by the character 's', which is invalid.
You either need to duplicate all backslashes:
"C:\\Users\\Eric\\Desktop\\beeline.txt"
Or prefix the string with r (to produce a raw string):
r"C:\Users\Eric\Desktop\beeline.txt"

Typical error on Windows because the default user directory is C:\user\<your_user>, so when you want to pass this path as a string argument into a Python function, you get a Unicode error, just because the \u is a Unicode escape. If the next 8 characters after the \u are not numeric this produces an error.
To solve it, just double the backslashes: C:\\user\\<\your_user>...
This will ensure that Python treats the single backslashes as single backslashes.

Prefixing with 'r' works very well, but it needs to be in the correct syntax. For example:
passwordFile = open(r'''C:\Users\Bob\SecretPasswordFile.txt''')
No need for \\ here - maintains readability and works well.

With Python 3 I had this problem:
self.path = 'T:\PythonScripts\Projects\Utilities'
produced this error:
self.path = 'T:\PythonScripts\Projects\Utilities'
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in
position 25-26: truncated \UXXXXXXXX escape
the fix that worked is:
self.path = r'T:\PythonScripts\Projects\Utilities'
It seems the '\U' was producing an error and the 'r' preceding the string turns off the eight-character Unicode escape (for a raw string) which was failing. (This is a bit of an over-simplification, but it works if you don't care about unicode)
Hope this helps someone

Or you could replace '\' with '/' in the path.

path = pd.read_csv(**'C:\Users\mravi\Desktop\filename'**)
The error is because of the path that is mentioned
Add 'r' before the path
path = pd.read_csv(**r'C:\Users\mravi\Desktop\filename'**)
This would work fine.

I had this same error in python 3.2.
I have script for email sending and:
csv.reader(open('work_dir\uslugi1.csv', newline='', encoding='utf-8'))
when I remove first char in file uslugi1.csv works fine.

Refer to openpyxl document, you can do changes as followings.
from openpyxl import Workbook
from openpyxl.drawing.image import Image
wb = Workbook()
ws = wb.active
ws['A1'] = 'Insert a xxx.PNG'
# Reload an image
img = Image(**r**'x:\xxx\xxx\xxx.png')
# Insert to worksheet and anchor next to cells
ws.add_image(img, 'A2')
wb.save(**r**'x:\xxx\xxx.xlsx')

I had same error, just uninstalled and installed again the numpy package, that worked!

I had this error.
I have a main python script which calls in functions from another, 2nd, python script.
At the end of the first script I had a comment block designated with ''' '''.
I was getting this error because of this commenting code block.
I repeated the error multiple times once I found it to ensure this was the error, & it was.
I am still unsure why.

Function throws a SyntaxError: (unicode error)

I am running the following code in python and it's giving me this error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
def filePro(filename):
f=open(filename,'r')
wordcount=0
for lines in f:
f1=lines.split()
wordcount=wordcount+len(f1)
f.close()
print ('word count:'), str(wordcount)
Please help me.

Unicode literals (String literals in Python 3.x) with \U or \u escape sequence should be one of following forms:
>>> u'\U00000061' # 8 hexadecimals
'a'
>>> u'\u0061' # 4 hexadecimals
'a'
If there's not enough escape sequence, you get a SyntaxError.
>>> u'\u61'
File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-3: truncated \uXXXX escape
>>> u'\U000061'
File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-7: truncated \UXXXXXXXX escape
If you mean literal \ and U. You'd better to use raw string:
>>> r'\u0061'
'\\u0061'
>>> print(r'\u0061')
\u0061
In the code you posted, there's no unicode escape sequence. You should Check other part of your code.

Not sure, not much information provided here, but I guess python is trying to open the file with wrong encoding, you could open the file with the codecs library, use the correct codec to open the file, if I don't know or if it comes from windows I usually use 'cp1252' as this can open most types.
import codecs
def filePro(filename):
f = codecs.open(filename, 'r', 'cp1252'):
wordcount=0
for lines in f:
f1=lines.split()
wordcount=wordcount+len(f1)
f.close()
print ('word count:'), str(wordcount)
Another possibillity is that you have a filename that python translates to code, for example a file name like 'c:\Users\something' here the \U will be interpret. See this answer

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

CSV file can't be read using great expectation - python

The esiast way whould be to add an r to use the \as a sign and not an escape value. import great_expectations as ge df=ge.read_csv(r"C:\Users\TasbeehJ\data\yellow_tripdata_2019-02.csv")

You can try by looking here, https://docs.python.org/3.7/library/csv.html?highlight=csv#module-csv Maybe you can try writing: >>> import csv >>> with open('yellow_tripdata_2019-02.csv', newline='') as csvfile:

Related

change directory in python

How I can convert file with any format to text format using Python 3.6?

Pygame sound not importing correctly [duplicate]

How do I load fiducial node data from Slicer in Python? [duplicate]

Function throws a SyntaxError: (unicode error)

Categories

Resources