python scrapy conversion to exe file using pyinstaller

python scrapy conversion to exe file using pyinstaller - python

I am trying to convert a scrapy script to a exe file.
The main.py file looks like this:
from scrapy.crawler import CrawlerProcess
from amazon.spiders.amazon_scraper import Spider
spider = Spider()
process = CrawlerProcess({
'FEED_FORMAT': 'csv',
'FEED_URI': 'data.csv',
'DOWNLOAD_DELAY': 3,
'RANDOMIZE_DOWNLOAD_DELAY': True,
'ROTATING_PROXY_LIST_PATH': 'proxies.txt',
'USER_AGENT_LIST': 'useragents.txt',
'DOWNLOADER_MIDDLEWARES' :
{
'rotating_proxies.middlewares.RotatingProxyMiddleware': 610,
'rotating_proxies.middlewares.BanDetectionMiddleware': 620,
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
'random_useragent.RandomUserAgentMiddleware': 400
}
})
process.crawl(spider)
process.start() # the script will block here until the crawling is finished
The scrapy script looks like any other. I am using pyinstaller.exe --onefile main.py to convert it to an exe file. When I try to open the main.exe file inside dist folder it starts outputing errors:
FileNotFoundError: [Errno 2] No such file or directory: '...\\scrapy\\VERSION'
I can fix it by creating a scrapy folder inside the dist folder and uploading a VERSION file from lib/site-packages/scrapy.
After that, many other errors occur but I can fix them by uploading some scrapy libraries.
In the end it starts outputing error:
ModuleNotFoundError: No module named 'email.mime'
I don`t even know what does it mean. I have never seen it.
I am using:
Python 3.6.5
Scrapy 1.5.0
pyinstaller 3.3.1

I had the same situation.
Instead of trying to make pyinstaller count this file (I failed all my attempts to do it) I decided to check and change some part of scrapy code in order to avoid this error.
I noticed that there is only one place where \scrapy\VERSION file used-- \scrapy\__init__.py
I decided to hardcode that value from scrapy\version by changing scrapy__init__.py
:
#import pkgutil
__version__ = "1.5.0" #pkgutil.get_data(__package__, 'VERSION').decode('ascii').strip()
version_info = tuple(int(v) if v.isdigit() else v
for v in __version__.split('.'))
#del pkgutil
After this change there is no need to store version in external file.
As there is no reference to \scrapy\version file - that error will not occure.
After that I had the same FileNotFoundError: [Errno 2] with \scrapy\mime.types file.
There is the same situation with \scrapy\mime.types - it used only in \scrapy\responsetypes.py
...
#from pkgutil import get_data
...
def __init__(self):
self.classes = {}
self.mimetypes = MimeTypes()
#mimedata = get_data('scrapy', 'mime.types').decode('utf8')
mimedata = """
Copypaste all 750 lines of \scrapy\mime.types here
"""
self.mimetypes.readfp(StringIO(mimedata))
for mimetype, cls in six.iteritems(self.CLASSES):
self.classes[mimetype] = load_object(cls)
This change resolved FileNotFoundError: [Errno 2] with \scrapy\mime.types file.
I agree that hardcode 750 lines of text into python code is not the best decision.
After that I started to recieve ModuleNotFoundError: No module named scrapy.spiderloader . I added "scrapy.spiderloader" into hidden imports parameter of pyinstaller.
Next Issue ModuleNotFoundError: No module named scrapy.statscollectors.
Final version of pyinstaller command for my scrapy script consist of 46 hidden imports - after that I received working .exe file.

Related

python remoteconfig unable to parse file from Gitlab

I am trying to get remoteconfig working, following this guide:
https://pypi.org/project/remoteconfig/
As a control, I have this code that works:
config.read('./config.ini')
for section in config:
print(section)
When I put the same config file in a remote Gitlab, this code does not work:
from remoteconfig import config
config.read('https://myorg.org/path/repo/~/blob/app/config.ini')
for section in config:
print(section)
What could I be doing wrong here? The error msg I am getting is:
configParser.MissingSectionHeaderError: File contains no section headers
So it seems like it's reaching the file path (network/connectivity OK), but not liking what's in that file or possibly the file format? The same exact file works with localconfig.

For now I am going to use the 'gitlab' pip module and simply consume the API for the file (with private_token:
f = project.files.get(file_path='path/file', ref='master'

Python erronously adding file extension to path string

I'm currently struggling with the following problem:
My folder structer looks like:
master
- resources
customFile.fmu
fileCallingFMU.py
While executing the fileCallingFMU.py I pass a path string like
path = "./resources/customFile.fmu"
My script contains a super function where I pass the path variable. But everytime I execute the script it trips with an exception:
Exception has occurred: FileNotFoundError
[Errno 2] No such file or directory: b'2021-11-16_./resources/customFile.fmu.txt'
File "[projectFolder]\fileCallingFMU.py", line 219, in __init__
super().__init__(path, config, log_level)
File "[projectFolder]\fileCallingFMU.py", line 86, in <module>
env = gym.make(env_name)
My urging question now is the following:
Why and how does python manipulate the path variable with a date-prefix and .txt as file extension?!
Hope anyone can enlighten me on this one...
EDIT
I'm trying to get the example of ModelicaGym running.
My fileCallingFMU.py contains the following code:
path = "./resources/customFile.fmu"
env_entry_point = 'cart_pole_env:JModelicaCSCartPoleEnv'
config = {
'path': path,
'm_cart': m_cart,
'm_pole': m_pole,
'theta_0': theta_0,
'theta_dot_0': theta_dot_0,
'time_step': time_step,
'positive_reward': positive_reward,
'negative_reward': negative_reward,
'force': force,
'log_level': log_level
}
from gym.envs.registration import register
env_name = env_name
register(
id=env_name,
entry_point=env_entry_point,
kwargs=config
)
env = gym.make(env_name)
The full code for the entryPoint can be found here.

As jjramsey pointed out the problem was burried within the ModelicaGym library.
The logger could not create a propper log file, because the model name was not properly stored in the self.model variable.
Source of this error lied in the line
self.model_name = model_path.split(os.path.sep)[-1]
due to the fact that the os library was not able to separate my path string
"./resources/customFile.fmu"
After changing it to
".\\resources\\customFile.fmu"
everythig works as expected.
Thanks again!

pyinstaller Error starting service: The service did not respond to the start or control request in a timely fashion

I have been searching since a couple of days for a solution without success.
We have a windows service build to copy some files from one location to another one.
So I build the code shown below with Python 3.7.
The full coding can be found on Github.
When I run the service using python all is working fine, I can install the service and also start the service.
This using commands:
Install the service:
python jis53_backup.py install
Run the service:
python jis53_backup.py start
When I now compile this code using pyinstaller with command:
pyinstaller -F --hidden-import=win32timezone jis53_backup.py
After the exe is created, I can install the service but when trying to start the service I get the error:
Error starting service: The service did not respond to the start or
control request in a timely fashion
I have gone through multiple posts on Stackoverflow and on Google related to this error however, without success. I don't have the option to install the python 3.7 programs on the PC's that would need to run this service. That's why we are trying to get a .exe build.
I have made sure to have the path updated according to the information that I found in the different questions.
Image of path definitions:
I also copied the pywintypes37.dll file.
From -> Python37\Lib\site-packages\pywin32_system32
To -> Python37\Lib\site-packages\win32
Does anyone have any other suggestions on how to get this working?
'''
Windows service to copy a file from one location to another
at a certain interval.
'''
import sys
import time
from distutils.dir_util import copy_tree
import servicemanager
import win32serviceutil
import win32service
from HelperModules.CheckFileExistance import check_folder_exists, create_folder
from HelperModules.ReadConfig import (check_config_file_exists,
create_config_file, read_config_file)
from ServiceBaseClass.SMWinService import SMWinservice
sys.path += ['filecopy_service/ServiceBaseClass',
'filecopy_service/HelperModules']
class Jis53Backup(SMWinservice):
_svc_name_ = "Jis53Backup"
_svc_display_name_ = "JIS53 backup copy"
_svc_description_ = "Service to copy files from server to local drive"
def start(self):
self.conf = read_config_file()
if not check_folder_exists(self.conf['dest']):
create_folder(self.conf['dest'])
self.isrunning = True
def stop(self):
self.isrunning = False
def main(self):
self.ReportServiceStatus(win32service.SERVICE_RUNNING)
while self.isrunning:
# Copy the files from the server to a local folder
# TODO: build function to trigger only when a file is changed.
copy_tree(self.conf['origin'], self.conf['dest'], update=1)
time.sleep(30)
if __name__ == '__main__':
if sys.argv[1] == 'install':
if not check_config_file_exists():
create_config_file()
if len(sys.argv) == 1:
servicemanager.Initialize()
servicemanager.PrepareToHostSingle(Jis53Backup)
servicemanager.StartServiceCtrlDispatcher()
else:
win32serviceutil.HandleCommandLine(Jis53Backup)

I was also facing this issue after compiling using pyinstaller. For me, the issue was that I was using the paths to configs and logs file in dynamic way, for ex:
curr_path = os.path.dirname(os.path.abspath(__file__))
configs_path = os.path.join(curr_path, 'configs', 'app_config.json')
opc_configs_path = os.path.join(curr_path, 'configs', 'opc.json')
log_file_path = os.path.join(curr_path, 'logs', 'application.log')
This was working fine when I was starting the service using python service.py install/start. But after compiling it using pyinstaller, it always gave me error of not starting in timely fashion.
To resolve this, I made all the dynamic paths to static, for ex:
configs_path = 'C:\\Program Files (x86)\\ScantechOPC\\configs\\app_config.json'
opc_configs_path = 'C:\\Program Files (x86)\\ScantechOPC\\configs\\opc.json'
debug_file = 'C:\\Program Files (x86)\\ScantechOPC\\logs\\application.log'
After compiling via pyinstaller, it is now working fine without any error. Looks like when we do dynamic path, it doesn't get the actual path to files and thus it gives error.
Hope this solves your problem too. Thanks

Running Python script with scrapy import from Node child process

I'm attempting to get a simple scraper up and running to gather data and would like to use Python Scrapy. The rest of the app will be through Nodejs/Express, so I would like to call this script on demand when I need fresh/new data.
The python code runs fine locally through piecharm, but I am seeing issues when it is run as a script.
Through node when I run the server locally and hit /name, it fails with "no module named 'scrapy'
When I run the server through the Anaconda prompt this works fine and scrapy is imported with no error.
I have installed scrapy via conda at the location the express server is being run for both 1 and 2.
From what I've read this may have to do with scrapys need of the Twisted reactor, but as I'm new to Python it's not clear to me what the anaconda terminal is doing differently, and what I would need to do properly from the node side in order to use scrapy.
Nodejs:
app.get('/name', callName);
function callName(req, res) {
console.log("test");
var spawn = require('child_process').spawn;
const pyProg = spawn('python', ['pythonscript.py']);
pyProg.stdout.on('data', function(data) {
console.log(data.toString());
res.write(data);
res.end('end');
});
}
//Print URL for accessing server
console.log('Server running at http://127.0.0.1:8000/')
app.listen(process.env.PORT || 8000, () => console.log("Listening on " + (process.env.PORT || 8000)));
Python script:
try:
import sys
import scrapy
data = "python starting"
print(data)
sys.stdout.flush()
except Exception as exception:
print(exception, False)
print(exception.__class__.__name__ + ": " + exception.message)
Update:
When running import scrapy from the Anaconda interpreter (the other from the comments resulted in "no module found")
Traceback (most recent call last):
File "", line 1, in
File "\Anaconda3\lib\site-packages\scrapy__init__.py", line 34, in
from scrapy.spiders import Spider
File "\Anaconda3\lib\site-packages\scrapy\spiders__init__.py", line 10, in
from scrapy.http import Request
File "\Anaconda3\lib\site-packages\scrapy\http__init__.py", line 11, in
from scrapy.http.request.form import FormRequest
File "\Anaconda3\lib\site-packages\scrapy\http\request\form.py", line 11, in
import lxml.html
File "\Anaconda3\lib\site-packages\lxml\html__init__.py", line 54, in
from .. import etree
ImportError: DLL load failed: The specified module could not be found.
So this looks to be not just interpreter related, but perhaps something additional with Anacondas variables it uses for the terminal?

py2exe/py2app and docx don't work together

Installed docx on Windows 7 here:
D:\Program Files (x86)\Python27\Lib\site-packages as shown below:
Installed docx on OS X at /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/docx-0.0.2-py2.7.egg-info as shown below:
Following is the sample script (named as docx_example.py), which runs absolutely fine on the python interpreter:
#!/usr/bin/env python
'''
This file makes an docx (Office 2007) file from scratch, showing off most of python-docx's features.
If you need to make documents from scratch, use this file as a basis for your work.
Part of Python's docx module - http://github.com/mikemaccana/python-docx
See LICENSE for licensing information.
'''
from docx import *
if __name__ == '__main__':
# Default set of relationshipships - these are the minimum components of a document
relationships = relationshiplist()
# Make a new document tree - this is the main part of a Word document
document = newdocument()
# This xpath location is where most interesting content lives
docbody = document.xpath('/w:document/w:body', namespaces=nsprefixes)[0]
# Append two headings and a paragraph
docbody.append(heading('''Welcome to Python's docx module''',1) )
docbody.append(heading('Make and edit docx in 200 lines of pure Python',2))
docbody.append(paragraph('The module was created when I was looking for a Python support for MS Word .doc files on PyPI and Stackoverflow. Unfortunately, the only solutions I could find used:'))
# Add a numbered list
for point in ['''COM automation''','''.net or Java''','''Automating OpenOffice or MS Office''']:
docbody.append(paragraph(point,style='ListNumber'))
docbody.append(paragraph('''For those of us who prefer something simpler, I made docx.'''))
docbody.append(heading('Making documents',2))
docbody.append(paragraph('''The docx module has the following features:'''))
# Add some bullets
for point in ['Paragraphs','Bullets','Numbered lists','Multiple levels of headings','Tables','Document Properties']:
docbody.append(paragraph(point,style='ListBullet'))
docbody.append(paragraph('Tables are just lists of lists, like this:'))
# Append a table
docbody.append(table([['A1','A2','A3'],['B1','B2','B3'],['C1','C2','C3']]))
docbody.append(heading('Editing documents',2))
docbody.append(paragraph('Thanks to the awesomeness of the lxml module, we can:'))
for point in ['Search and replace','Extract plain text of document','Add and delete items anywhere within the document']:
docbody.append(paragraph(point,style='ListBullet'))
# Search and replace
print 'Searching for something in a paragraph ...',
if search(docbody, 'the awesomeness'): print 'found it!'
else: print 'nope.'
print 'Searching for something in a heading ...',
if search(docbody, '200 lines'): print 'found it!'
else: print 'nope.'
print 'Replacing ...',
docbody = replace(docbody,'the awesomeness','the goshdarned awesomeness')
print 'done.'
# Add a pagebreak
docbody.append(pagebreak(type='page', orient='portrait'))
docbody.append(heading('Ideas? Questions? Want to contribute?',2))
docbody.append(paragraph('''Email <python.docx#librelist.com>'''))
# Create our properties, contenttypes, and other support files
coreprops = coreproperties(title='Python docx demo',subject='A practical example of making docx from Python',creator='Mike MacCana',keywords=['python','Office Open XML','Word'])
appprops = appproperties()
contenttypes = contenttypes()
websettings = websettings()
wordrelationships = wordrelationships(relationships)
# Save our document
savedocx(document,coreprops,appprops,contenttypes,websettings,wordrelationships,'docx_example.docx')
Following is the setup script (named as docx_setup.py) to create the standalone (.app in Mac OSX and .exe in Windows 7):
import sys,os
# Globals: START
main_script='docx_example'
dist_dir_main_path=os.path.abspath('./docx-bin')
compression_level=2
optimization_level=2
bundle_parameter=1
skip_archive_parameter=False
emulation_parameter=False
module_cross_reference_parameter=False
ascii_parameter=False
includes_list=['lxml.etree','lxml._elementpath','gzip']
# Globals: STOP
# Global Functions: START
def isDarwin():
return sys.platform=='darwin'
def isLinux():
return sys.platform=='linux2'
def isWindows():
return os.name=='nt'
# Global Functions: STOP
if isDarwin():
from setuptools import setup
# Setup distribution directory: START
dist_dir=os.path.abspath('%s/osx' %(dist_dir_main_path))
if os.path.exists(dist_dir):
os.system('rm -rf %s' %(dist_dir))
os.system('mkdir -p %s' %(dist_dir))
# Setup distribution directory: STOP
APP = ['%s.py' %(main_script)]
OPTIONS={'argv_emulation': False,
'dist_dir': dist_dir,
'includes': includes_list
}
print 'Creating standalone now...'
setup(app=APP,options={'py2app': OPTIONS},setup_requires=['py2app'])
os.system('rm -rf build')
os.system('tar -C %s -czf %s/%s.tgz %s.app' %(dist_dir,dist_dir,main_script,main_script))
os.system('rm -rf %s/%s.app' %(dist_dir,main_script))
print 'Re-distributable Standalone file(s) created at %s/%s.zip. Unzip and start using!!!' %(dist_dir,main_script)
elif isWindows():
from distutils.core import setup
import py2exe
# Setup distribution directory: START
dist_dir=os.path.abspath('%s/win' %(dist_dir_main_path))
if os.path.exists(dist_dir):
os.system('rmdir /S /Q %s' %(dist_dir))
os.system('mkdir %s' %(dist_dir))
# Setup distribution directory: STOP
OPTIONS={'compressed': compression_level,
'optimize': optimization_level,
'bundle_files': bundle_parameter,
'dist_dir': dist_dir,
'xref': module_cross_reference_parameter,
'skip_archive': skip_archive_parameter,
'ascii': ascii_parameter,
'custom_boot_script': '',
'includes': includes_list
}
print 'Creating standalone now...'
setup(options = {'py2exe': OPTIONS},zipfile = None,windows=[{'script': '%s.py' %(main_script)}])
print 'Re-distributable Standalone file(s) created in the following location: %s' %(dist_dir)
os.system('rmdir /S /Q build')
Now comes the real problem.
Following is the error posted on Mac OS X console after trying to use the docx_example.app, created using the command python docx_setup.py py2app:
docx_example: Searching for something in a paragraph ... found it!
docx_example: Searching for something in a heading ... found it!
docx_example: Replacing ... done.
docx_example: Traceback (most recent call last):
docx_example: File "/Users/admin/docx-bin/osx/docx_example.app/Contents/Resources/__boot__.py", line 64, in <module>
docx_example: _run('docx_example.py')
docx_example: File "/Users/admin/docx-bin/osx/docx_example.app/Contents/Resources/__boot__.py", line 36, in _run
docx_example: execfile(path, globals(), globals())
docx_example: File "/Users/admin/docx-bin/osx/docx_example.app/Contents/Resources/docx_example.py", line 75, in <module>
docx_example: savedocx(document,coreprops,appprops,contenttypes,websettings,wordrelationships,'docx_example.docx')
docx_example: File "docx.pyc", line 849, in savedocx
docx_example: AssertionError
docx_example: docx_example Error
docx_example Exited with code: 255
Following is the error posted in docx_example.exe.log file in Windows 7 after trying to use the docx_example.exe, created using the command python docx_setup.py py2exe:
Traceback (most recent call last):
File "docx_example.py", line 75, in <module>
File "docx.pyo", line 854, in savedocx
WindowsError: [Error 3] The system cannot find the path specified: 'D:\\docx_example\\docx_example.exe\\template'
As you can see, both OS X and Windows 7 are referring to something similar here. Please help.

i have found a solution
in api.py
From
_thisdir = os.path.split(__file__)[0]
To
_thisdir = 'C:\Python27\Lib\site-packages\docx'
Or whatever your docx file is

What's going on (at least for py2exe) is something similar to this question.
The documentation on data_files is here.
What you basically have to do is change
setup(options = {'py2exe': OPTIONS},zipfile = None,windows=[{'script': '%s.py' %(main_script)}])
to
data_files = [
('template', 'D:/Program Files (x86)/Python27/Lib/site-packages/docx-template/*'),
]
setup(
options={'py2exe': OPTIONS},
zipfile=None,
windows=[{'script': '%s.py' %(main_script)}],
data_files=data_files
)
The exact place where the template files are may be wrong above, so you might need to adjust it.
But there may be several other sets of data_files you need to include. You may want to go about retrieving them programatically with an os.listdir or os.walk type of command.
As mentioned in the other post, you will also have to change
bundle_parameter=1
to
bundle_parameter=2
at the top of the file.

You can solve the entire problem by using this API which is based in python-docx. The advantage of the API is that this one doesnt have the savedoc function so you will not have any other AssertionError.
For the WindowsError: [Error 3] The system cannot find the path specified: 'D:\\docx_example\\docx_example.exe\\template' error you need to edit the api.py file of docx egg folder which is located in the Python folder of the system (in my computer: C:\Python27\Lib\site-packages\python_docx-0.3.0a5-py2.7.egg\docx)
Changing this:
_thisdir = os.path.split(__file__)[0]
_default_docx_path = os.path.join(_thisdir, 'templates', 'default.docx')
To this:
thisdir = os.getcwd()
_default_docx_path = os.path.join(thisdir, 'templates', 'default.docx')
The first one was taking the actual running program and adding it to the path to locate the templates folder.
C:\myfiles\myprogram.exe\templates\default.docx
The solution takes only the path, not the running program.
C:\myfiles\templates\default.docx
Hope it helps!

Instead of changing some library file, I find it easier and cleaner to tell python-docx explicitly where to look for the template, i.e.:
document = Document('whatever/path/you/choose/to/some.docx')
This effectively solves the py2exe and docx path problem.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python scrapy conversion to exe file using pyinstaller - python

Related

python remoteconfig unable to parse file from Gitlab

Python erronously adding file extension to path string

pyinstaller Error starting service: The service did not respond to the start or control request in a timely fashion

Running Python script with scrapy import from Node child process

py2exe/py2app and docx don't work together

Categories

Resources