I am trying to run an example from ThinkStats2 code. I copied everything from github - the file structure are exactly the same as given on the github.
chap01soln.py
from __future__ import print_function
import numpy as np
import sys
import nsfg
import thinkstats2
def ReadFemResp(dct_file='2002FemResp.dct',
dat_file='2002FemResp.dat.gz',
nrows=None):
"""Reads the NSFG respondent data.
dct_file: string file name
dat_file: string file name
returns: DataFrame
"""
dct = thinkstats2.ReadStataDct(dct_file)
df = dct.ReadFixedWidth(dat_file, compression='gzip', nrows=nrows)
CleanFemResp(df)
return df
def CleanFemResp(df):
"""Recodes variables from the respondent frame.
df: DataFrame
"""
pass
def ValidatePregnum(resp):
"""Validate pregnum in the respondent file.
resp: respondent DataFrame
"""
# read the pregnancy frame
preg = nsfg.ReadFemPreg()
# make the map from caseid to list of pregnancy indices
preg_map = nsfg.MakePregMap(preg)
# iterate through the respondent pregnum series
for index, pregnum in resp.pregnum.iteritems():
caseid = resp.caseid[index]
indices = preg_map[caseid]
# check that pregnum from the respondent file equals
# the number of records in the pregnancy file
if len(indices) != pregnum:
print(caseid, len(indices), pregnum)
return False
return True
def main(script):
"""Tests the functions in this module.
script: string script name
"""
resp = ReadFemResp()
assert(len(resp) == 7643)
assert(resp.pregnum.value_counts()[1] == 1267)
assert(ValidatePregnum(resp))
print('%s: All tests passed.' % script)
if __name__ == '__main__':
main(*sys.argv)
I am getting ImportError like shown below:
Traceback (most recent call last):
File "C:\wamp\www\ThinkStats_py3\chap01soln.py", line 13, in <module>
import nsfg
File "C:\wamp\www\ThinkStats_py3\nsfg.py", line 14, in <module>
import thinkstats2
File "C:\wamp\www\ThinkStats_py3\thinkstats2.py", line 34, in <module>
import thinkplot
File "C:\wamp\www\ThinkStats_py3\thinkplot.py", line 11, in <module>
import matplotlib
File "C:\Python34\lib\site-packages\matplotlib\__init__.py", line 105, in <module>
import six
ImportError: No module named 'six'
All the files are listed under the src folder. No packages under src. I tried adding packages for nsfg and thinkstats but another error appeared. I tried upgrading python 2.7 to python 3.4. I am still getting the same error. I know I need to install the six package for matplotlib but why am I getting import error on nsfg?
You are getting the import error on nsfg because it internally imports matplotlib (not directly, but it imports thinkstats2 , which imports thinkplot which imports matplotlib , which has a dependency on six module) . And that library is not installed on your computer, so the import fails.
Most probably you do not have six module , you can try installing it using - pip install six .
Or get it from here, unzip it and install it using - python setup.py install
Related
I'm a new learner in data science. I havent find out why I got an attribute error. I used python 3.8.3 in Visual Studio Code. I installed Pandas in terminal (pip install Pandas). I dont know what the problem is. Any help will be appreciated.
import pandas as pd
df=pd.DataFrame()
print(df)
All I did was to create an empty dataframe. And I got that:
Traceback (most recent call last):
File "c:/Users/Fatma Elik/Documents/VS Code/BTK/Pandas_dataframe.py", line 20, in <module>
import pandas as pd
File "C:\Users\Fatma Elik\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\__init__.py", line 180, in <module>
import pandas.testing
File "C:\Users\Fatma Elik\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\testing.py", line 5, in <module>
from pandas._testing import (
File "C:\Users\Fatma Elik\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\_testing.py", line 404, in <module>
RANDS_CHARS = np.array(list(string.ascii_letters + string.digits), dtype=(np.str_, 1))
AttributeError: module 'string' has no attribute 'ascii_letters'
Secondly I tried this instead and I got an attribute error again:
import pandas as pd
s1=pd.Series([3,2,0,1])
s2=pd.Series([0,3,7,2])
data=dict(apples=s1,oranges=s2)
df=pd.DataFrame(data)
print(df)
I did Ctrl+Click on string and I find that I already created a py file before. Because I searched on file search engine Windows 10 before, I couldnt have found it. Another simple mistake again :)
It's a little weird, the "string" module is a part of the standard library. Could you try this code?
from string import ascii_letters
print(ascii_letters)
Check it works or not, if it doesn't work, can you enter into this file:
"C:\Users\Fatma Elik\AppData\Local\Programs\Python\Python38-32\lib\string.py", and can you find:
ascii_lowercase = 'abcdefghijklmnopqrstuvwxyz'
ascii_uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
ascii_letters = ascii_lowercase + ascii_uppercase
It should be located from line 25 to 27. If you can't find it, you should try to upgrade your python or reinstall it.
import numpy as np
import matplotlib.pyplot as plt
def main():
x = np.arange(0, 5, 0.1)
y = np.sin(x)
plt.plot(x, y)
if __name__ == '__main__':
main()
Traceback (most recent call last):
File
"/Users/tim/workspace/Python/MachineLearn/test.py", line 2, in <module>
import matplotlib.pyplot as plt
File "/usr/local/lib/python2.7/site-packages/matplotlib/pyplot.py", line 115, in <module>
_backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
File "/usr/local/lib/python2.7/site-packages/matplotlib/backends/__init__.py", line 63, in pylab_setup
[backend_name], 0)
File "/Applications/PyCharm.app/Contents/helpers/pycharm_matplotlib_backend/backend_interagg.py", line 11, in <module>
from datalore.display import display
File "/Applications/PyCharm.app/Contents/helpers/pycharm_display/datalore/display/__init__.py", line 1, in <module>
from .display_ import *
File "/Applications/PyCharm.app/Contents/helpers/pycharm_display/datalore/display/display_.py", line 5, in <module>
from urllib.parse import urlencode
ImportError: No module named parse
Process finished with exit code 1
=================
Python: 2.7.16
PyCharm Professional: 2019.2
=================
btw, the code run in console mode is work
Simple answer: disable "show plots in scientific window" (Settings -> Tools -> Python Scientific) or downgrade the PyCharm or move your project to python3
Remember to add plt.show() in your code.
A little more complicated. You need to write own importing hooks to find that urllib.parse and urllib.request (next line in display_.py file are requested. More you can read here https://xion.org.pl/2012/05/06/hacking-python-imports/
(i'm not enough familiar with python 2 import system to write it)
For python 2 use
from urlparse import urlparse
If you need to write code which is Python2 and Python3 compatible you can use the following import
try:
from urllib.parse import urlparse
except ImportError:
from urlparse import urlparse
In your PyCharm project:
press Ctrl+Alt+s to open the settings
on the left column, select Project Interpreter
on the top right there is a list of python binaries found on your system, pick the right one
eventually click the + button to install additional python modules, in your case, it is parse module is missing so install that one
As mentioned by #Grzegorz Bokota, the problem is coming from the "scientific view mode" of PyCharm. This mode allows to visualise graphs and is thus calling matplotlib, and probably an incompatible version of it if you are using Python 2. This bug has been identified here and it seems that we just have to wait for the next release to get it solved.
I am trying to make some code a bit more modular in Python and am running into one issue which I'm sure is straight forward but I can't seem to see what the problem is.
Suppose I have a script, say MyScript.py:
import pandas as pd
import myFunction as mF
data_frame = mF.data_imp()
print(data_frame)
where myFunction.py contains the following:
def data_imp():
return pd.read_table('myFile.txt', header = None, names = ['column'])
Running MyScript.py in the command line yields the following error:
Traceback (most recent call last):
File "MyScript.py", line 5, in <module>
data_frame = mF.data_imp()
File "/Users/tomack/Documents/python/StackQpd/myFunction.py", line 2, in data_imp
return pd.read_table('myFile.txt', header = None, names = ['column'])
NameError: name 'pd' is not defined
You need to import pandas in your function or script myFunction:
def data_imp():
import pandas as pd
return pd.read_table('myFile.txt', header = None, names = ['column'])
Answers here are right, because your module indeed lacks myFunction import.
If stated more broad this question can also contain following: in case of circular import the only 2 remedies are:
import pandas as pd , but not from pandas import something
Use imports right there in functions you need in-placd
I'm getting this error when trying to import a module from the Prov package.
Here is the contents of my file:
#!/usr/bin/env
import sys
egg_path='/Library/Python/2.7/site-packages/prov-1.5.0-py2.7.egg/prov'
sys.path.append(egg_path)
#... rest of code
import model as prov
def main():
# Create a new provenance document
d1 = ProvDocument() # d1 is now an empty provenance document
# Declaring namespaces for various prefixes used in the example
d1.add_namespace('now', 'http://www.provbook.org/nownews/')
d1.add_namespace('nowpeople', 'http://www.provbook.org/nownews/people/')
d1.add_namespace('bk', 'http://www.provbook.org/ns/#')
# Entity: now:employment-article-v1.html
e1 = d1.entity('now:employment-article-v1.html')
# Agent: nowpeople:Bob
d1.agent('nowpeople:Bob')
And here is the output:
Traceback (most recent call last):
File "prov.py", line 6, in <module>
import model as prov
File "/Library/Python/2.7/site-packages/prov-1.5.0-py2.7.egg/prov/model.py", line 25, in <module>
from prov import Error, serializers
ImportError: cannot import name Error
Any ideas or fixes? I installed Prov using easy_install prov.
You need to rename your module file prov.py. It prevents import of the third-party library because the module name conflicts.
Make sure prov.pyc is removed.
I found the error. The name of my file that I was trying to import into was also called prov.py . It was a circular dependency issue.
Thank you guys for such quick responses!
I am using PyCharm (1.5.4) as my python IDE on MacOS 10.6.4. I am tinkering with some code to manipulate stock price data. As part of that I want to import price data from yahoo by using the DataReader function that comes with Pandas 0.6.0. The code is as follow:
http://www.statalgo.com/2011/09/08/pandas-getting-financial-data-from-yahoo-fred-etc/
from pandas import ols, DataFrame
from pandas.stats.moments import rolling_std
from pandas.io.data import DataReader
import datetime
sp500 = DataReader("^GSPC", "yahoo", start=datetime.datetime(1990, 1, 1))
sp500_returns = sp500["adj clos"].shift(-250)/sp500["adj clos"] - 1
gdp = DataReader("GDP", "fred", start=datetime.datetime(1990, 1, 1))["value"]
gdp_returns = (gdp/gdp.shift(1) - 1)
gdp_std = rolling_std(gdp_returns, 10)
gdp_standard = gdp_returns / gdp_std
gdp_on_sp = ols(y=sp500_returns, x=DataFrame({"gdp": gdp_standard}))
sp500.plot()
gdp.plot()
When I run the code I get the following error:
Traceback (most recent call last):
File "/Users/MyName/PycharmProjects/test/mytest", line 3, in <module>
from pandas.io.data import DataReader
ImportError: No module named data
I see that PyCharm does not know how to unresolve the reference 'data'.
My python paths are set as follows:
import sys
from pprint import pprint as pp
pp(sys.path)
['/private/var/folders/st/stQUFIfOG28bmpY9dCspTk+++TI/-Tmp-',
'/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/scikits.statsmodels-0.3.1-py2.7.egg',
'/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python27.zip',
'/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7',
'/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/plat-darwin',
'/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/plat-mac',
'/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/plat-mac/lib-scriptpackages',
'/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/lib-tk',
'/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/lib-old',
'/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/lib-dynload',
'/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages',
'/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/PIL']
What is puzzling is that PyCharm can resolve pandas.stats.moments but can't resolve pandas.io.data. I checked that both directories have the __init__.py file (the files are blank).
At this point I am not sure how to move forward. Greatly appreciate the help.
UPDATE:
$ cat __egginst__.txt
# egginst metadata
egg_name = 'pandas-0.3.0-3.egg'
prefix = '/Library/Frameworks/EPD64.framework/Versions/7.1'
installed_size = 1454562
rel_files = [
'EGG-INFO/pandas/__egginst__.txt',
'lib/python2.7/site-packages/pandas-0.3.0-3.egg-info',
Seems like deleting PyCharm's python interpreter configuration and re-configuring solved the problem. Strange... but fixed