Import error for csv file azure databricks

Import error for csv file azure databricks - python

I have a csv file which has a lot of text data. I am trying to import it in azure databricks using python pandas but it is giving me a long list of errors but primarily its telling me this:- ERROR: Internal Python error in the inspect module. However, when I am putting file in local desktop and then importing it on local desktop using jupyter/spyder it is imported without any errors.
I have also put in option of encoding UTF-8 while importing it in azure databricks but its still showing error. Any idea how to tackle this?

problem solved. had to enter encoding=cp1252. Not sure about it why i had to put this option but tried several and this worked. There were several symbols and brackets in the text data fields so this might be useful when importing similar data and facing such problems

Related

import a function from an other file in python

I know that this question has already been asked. But answers below these questions doesn't fix my problem. Here it is:
When I download some code from GitHub, it's always divided into separate files. I understand that it's important to have organized code, which is why I'd like to do the same.
However, whenever I try importing a function from a file, I always seem to get a ModuleNotFoundError error.
The file that I'm trying to import is in the same directory as the file importing the code. This also doesn't work with other code, for example, when I download code from GitHub that organizes code using separate files, it still returns the same error.
I've tried two different python installations (anaconda 3.7.3 and py 3.7.0), but still not luck. FYI I use pzyo to run my files.
Here's an example of how I import another file:
from fun import f
I have tried this as well:
import os
os.chdir("C:/Users/amau4/Desktop/test")
from fon import f
How would I go about fixing this? Thanks in advance!

Error writing to parquet file using pyspark

I am working on windows 10. I installed spark, and the goal is to use pyspark. I have made the following steps:
I have installed Python 3.7 with anaconda -- Python was added to C:\Python37
I download wintils from this link -- winutils is added to C:\winutils\bin
I downloaded spark -- spark was extracted is: C:\spark-3.0.0-preview2-bin-hadoop2.7
I downloaded Java 8 from AdoptOpenJDK
under system variables, I set following variables:
HADOOP_HOME : C:\winutils
SPARK_HOME: C:\spark-3.0.0-preview2-bin-hadoop2.7
JAVA_HOME: C:\PROGRA~1\AdoptOpenJDK\jdk-8.0.242.08-hotspot
And finally, under system path, I added:
%JAVA_HOME%\bin
%SPARK_HOME%\bin
%HADOOP_HOME%\bin
In the terminal:
So I would like to know why I am getting this warning:
unable to load native-hadoop library... And why I couldn't bind on port 4040...
Finally, inside Jupyter Notebook, I am getting the following error when trying to write into Parquet file. This image shows a working example, and the following one shows the code with errors:
And here is DataMaster__3.csv on my disk:
And the DaterMaster_par2222.parquet:
Any help is much appreciated!!

If you are writing the file in csv format, I have found that the best way to do that is using the following approach
LCL_POS.toPandas().to_csv(<path>)
There is another way to save it directly without converting to pandas but the issue is it ends up getting split into multiple files (with weird names so I tend to avoid those). If you are happy to split the file up, its much better to write a parquet file in my opinion.
LCL_POS.repartition(1).write.format("com.databricks.spark.csv").option("header", "true").save(<path>)
Hope that answers your question.

How to prevent an import of a module in jupyter notebook?

I have set up a jupyter notebook server for multiple users to run notebooks. I want to provide modules that can fetch data and do some pre-processing. Since the data and data processing code is proprietary, I don't want the users to have access to the source code which can be done via import inspect module.
I have two questions:
Is there a way to prevent inspect module from loading? I have seen this in Quantopian notebooks where importing inspect module throws an error.
Are there other ways to prevent access to the source code of the modules?

Splitting python code into different files (added picture of directory)

As my python code is getting longer, I would like to split it into a different files for better organization. In the same folder I created 2 files 'firstfile.py' and '2ndfile.py'
Below is the code of 'firstfile'
import pandas as pd
df=pd.DataFrame({'a':[2,4],'b':[2,1]})
Below is the code of '2ndfile'
import firstfile
print(firstfile.df)
Can I know why does the below error appear when i run '2ndfile'?
ImportError: No module named 'firstfile'
Hi, I tried the suggestions below including using dot something but it still does not work. Below is a screen shot of my directory. Is it related to some sys.path problem? I am currently using Spyder 2, python 3.5.

Try importing it with the from . import firstfile. Maybe you have Python3 which doesn't allow for implicit imports.

Switching from using Spyder2 python 3.5 to pyCharm solved the issue.

History saving thread error when trying to open Pandas

I just installed IPython on a remote desktop at work. I had to create a shortcut on my desktop to connect to IPython because the remote desktop does not have internet access. I am able to successfully open the IPython notebook. However, when I try to import pandas
import pandas as pd
I get this error that I have never seen before
The history saving thread hit an unexpected error (OperationalError('database or disk is full',)).History will not be written to the database.
Does this error relate to how it was installed on the remote desktop?

I suffered from this problem for a long time. My dirty fix was to simply restart the kernel and go about my work. However, I did find a way which eliminated it for good. This question seems to have mixed answers for different users. I'll try to list all based on answers elsewhere (all links at the end).
So the issue seems to be because of a certain nbsignatures.db file. And we need to simply remove it to solve the issue. You may find the file here in any one of the locations:
~/.local/share/jupyter/nbsignatures.db (I found mine here)
~/.ipython/profile_default/security/nbsignatures.db
~/Library/Jupyter/nbsignatures.db
All links:
https://github.com/ipython/ipython/issues/9293
IPython Notebook error: Error loading notebook

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Import error for csv file azure databricks - python

problem solved. had to enter encoding=cp1252. Not sure about it why i had to put this option but tried several and this worked. There were several symbols and brackets in the text data fields so this might be useful when importing similar data and facing such problems

Related

import a function from an other file in python

Error writing to parquet file using pyspark

How to prevent an import of a module in jupyter notebook?

Splitting python code into different files (added picture of directory)

History saving thread error when trying to open Pandas

Categories

Resources