I am working with Pandas for the 1st time and don't know much about it.
While trying to read an Excel file, Visual Studio code shows the "missing dependency xlrd". I don't know what to do.
Info:
Anaconda, VS code installed on the same drive. Excel file also on the same drive. I am using Windows 10 64bit.
Very short description. It would be nice if the description was a little more detailed. Try install the module:
pip install xlrd
If using python3 then:
pip3 install xlrd
If you are using conda:
conda install -c anaconda xlrd
May be there are multiple python versions in the system, where requirement might be satisfied for one and not for the other. I faced such problem and python3 rather than pip3 worked for me. Check out this too.
python3 -m pip install xlrd
Then it must work, otherwise, upgrade.
pip3 install --upgrade pandas
pip3 install --upgrade xlrd
I hope this will work.
import xlrd
import pandas as pd
sp = pd.ExcelFile("data.xlsx")
print(sp.parse(sp.sheet_names[0]))
If it doesn't work even after the upgrade, my guess is that there is another problem that is not known from your description. (Please include the full error message in the description as a code block, not in image format.)
First make sure you have all the required libraries installed.
pip install pandas
Pandas also requires the NumPy library
pip install numpy
In order to work with Pandas in your script, you will need to import it into your code. This is done with one line of code:
import pandas as pd
To work with Excel using Pandas, you need an additional object named ExcelFile. ExcelFile is built into the Pandas ecosystem, so you import directly from Pandas:
from pandas import ExcelFile
Recall your path where you have that excel file, example: /Users/Desktop/file.xlsx
Rather than referencing the path inside of the Read_Excel function, keep code clean by storing the path in a variable:
file_path = '/Users/Desktop/file.xlsx'
The Read_Excel function takes the file path of an Excel Workbook and returns a DataFrame object with the contents.
Put it all together and set the DataFrame object to a variable named “df”:
df = pd.read_excel(file_Path)
Lastly, you want to view the DataFrame so print the result. Add a print statement to the end of your script, using the DataFrame variable as the argument
print(df)
Related
Is there any way in python to read excel file like we have data provider in the testng
i am having a test method (using python unit test framework) and from this test i am calling another method which is actually reading the excel sheet , I just want something like data provider so that with every data it should be treated as a new test case
You could use pandas to read the excel files or csv files.
import pandas as pd
excel_data = pd.read_excel('test_file.xlsx')
csv_data = pd.read_csv('test_file.csv')
And the result is DataFrame structure.
Use Pandas to read excel files in Python. From your question I assume you don't know about pandas.
If you have added python to path during installation of IDE. Use pip for installation in the terminal
py -m pip install pandas
The python code is
import pandas as pd
df=pd.read_excel('Data.xlsx')
print(df.head()) # This will print the first 5 rows.
if you want to use jupyter notebook in the terminal
py -m pip install notebook
This will work best. But you need to have pandas installed through pip. For adavanced functions or atleast update question to what you want.What is it that dataprovider does, so as to repeat it in python Specify the fuction
Go through pandas documentation : https://pandas.pydata.org/docs/
I have tried to read an excel file using pandas however I haven't been able to. I am using python version 3.8 and still haven't been able to do it. I want to make the excel file a list in python and then use that list in an option box via tkinter. However without being able to read the file I cannot do this.
The code I'm using is:
import pandas as pd
df = pd.read_excel (r'downloads\Clients - Nybble HelpDesk.xlsx')
print (df)
The error I'm recieving is:
Traceback (most recent call last):
File "C:\Users\Natasha\OneDrive - Nybble.co.uk LTD\Desktop\excel export.py", line 1, in <module>
import pandas as pd
ModuleNotFoundError: No module named 'pandas'
In the command line try pip install pandas to install pandas first. Then re-run your code.
Other installation information is available here:
https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html
you can save time by downloading the anacondas distribution and use the spyder IDE. This will prevent the need to install pandas and will come with a whole host of useful packages that you will likely use in your day to day.
See the link: https://docs.anaconda.com/anaconda/navigator/tutorials/pandas/
First install pandas.
If you are using python shell then, open command prompt and type pip install pandas.
If you are using Anaconda then, open Conda Prompt and type conda install pandas.
You have your pandas library installed now.
Now for your code,
import pandas as pd
excel_file = 'filepath.xls'
df = pd.read_excel(excel_file)
print (df)
Now try it! and be sure to check the path of your file. Save the file in the same folder where you are saving the .py file.
I am a super new user of Python and have been having trouble loading an Excel file to play around with in Python.
I am using Python 3.7 on Windows 10 and have tried things like using the import statement or using pip and pip3 and then install statements. I am so confused on how to do this and no links I've read online are helping.
pip install pandas
pip3 install pandas
import pandas
I just want to upload an Excel file into Python. I'm embarrassed that it's causing me this much stress.
first of all you have to import pandas (assuming that is installed, in anaconda usually is already installed as far as i know)
import pandas as pd
to read more sheets in different dataframes (tables)
xls = pd.ExcelFile(path of your file)
df_schema = pd.read_excel(xls, sheet_name=xls.sheet_names)
df_schema is a dictionary of df in which the key is the name of the sheet and the value is the dataframe.
to read a single sheet the following should work:
xls = pd.ExcelFile(path of your file)
df = pd.read_excel(xls)
When I tried to read a pickle file that saved by a former version of pandas, it yielded an ImportError.
ImportError: No module named 'pandas.core.internals.managers';
'pandas.core.internals' is not a package
There was no hit on stackoverflow so i would like to share my solution for this particular problem.
This error comes off due to encoding of formerly saved pickle file. If you updated pandas to newly amended version, it produces this import error.
I was facing the same error when I was using pandas version 0.23.4.
I have installed pandas 0.24.1 version explicitly by:
pip3 install pandas==0.24.1
This solved my problem(Python version I was using was 3.5)
I had the same problem, but for me, it seemed to come from the pickle package / interaction with the pandas package.
I had Pandas version 0.23.4.
I saved some pickle files with pandas.Dataframe.to_pickle, with python 3.6.6 & Pandas version 0.23.4.
Then I upgraded to python 3.7.2 (Pandas version 0.23.4), and was enabled to read thoses pickle files with pandas.Dataframe.read_pickle.
Next, I upgraded pandas to pandas 0.24.1, and it worked for me. I can read those files again.
Updating pandas would be the best solution for most cases. However if you have limitations updating your pandas version, and you need to consume pandas objects produced and pickled in a higher version, you can add class location map as below.
from pandas.compat.pickle_compat import _class_locations_map
_class_locations_map.update({
('pandas.core.internals.managers', 'BlockManager'): ('pandas.core.internals', 'BlockManager')
})
conda update pandas
If you use conda package manager.
So, just to be clear, I'm very new to python coding... so I'm not exactly sure what's going wrong.
Yesterday, whilst following a tutorial on calling python from R, I successfully installed and used several python packages (e.g., NumPy, pandas, matplotlib etc).
But today, when trying to run the exact same code, I'm getting an error when trying to import pandas (NumPy is importing without any errors). The error states:
ModuleNotFoundError: No module named 'pandas'
I'm not sure what's going on!?
I'm using R-Studio (running on a Mac)... here's a code snippet of how I'm doing it:
library(reticulate)
os <- import("os") # Setting directory
os$getcwd()
repl_python() #used to make it interactive
import numpy as np. # Load numpy package
import pandas as pd # Load pandas package
At this point, it's throwing me an error. I've tried googling the answer and searching here, but to no avail.
Any suggestions as to how I'd fix this problem, or what is going on?
Thanks
Possibly your python path for reticulate changed upon reloading Rstudio. Here is how to set the path manually (filepath for Linux or Mac):
library(reticulate)
path_to_python <- "~/anaconda3/bin/python"
use_python(path_to_python)
https://stackoverflow.com/a/45891929/4549682
You can check your Python path with py_config(): https://rstudio.github.io/reticulate/articles/versions.html#configuration-info
I recommend using Anaconda for your Python distribution (you might have to use Anaconda anyway for reticulate, not sure). Download it from here: https://www.anaconda.com/distribution/#download-section
Then you can create the environment for reticulate to use:
conda_create('r-reticulate', packages = "python=3.5")
I use Python 3.5 for some specific packages, but you can change that version or leave it as just 'python' for the latest version.
https://www.rdocumentation.org/packages/reticulate/versions/1.10/topics/conda-tools
Then you want to install the packages you need (if they aren't already) with
conda_install('re-reticulate', packages = 'numpy')
The way I use something like numpy is
np <- import('numpy')
np$arange(10)
You need to set the second argument of the function use_python, so it should be:
For example, use_python("/users/my_user/Anaconda3/python.exe",required = TRUE)
DON'T forget required = TRUE