Are there any known issues with Python parallel processing on Windows? [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
This is perhaps not the right place to ask this question but I was unsure where else to go with it. My MSc dissertation is aimed at trying to use parallel computing to speed up stochastic biology simulations. I've written some standard format code for parallelisation of the simulations.
This code works on 7 out of 9 of my simulations but seems to get stuck when using pool.map on the other two.
Due to working form home etc and hardware/software constraints the only opperating systems I have available to me are Windows Subsystem for Linux and Windows powershell (which I've both tried). One thing my supervisor suggested is that this issue might be related to my operating system and that Windows might not be as good at parallelisation as Linux. But I'm struggling to find any hard evidence to back this up.
So is anyone aware of any papers or links to other posts that might provide a bit information about whether or not a Windows OS has issues with Python parallel processing?
Cheers

I don't know about formal papers, but have had a lot of practical success using dask from dask.org So whether this is a good answer depends a lot on whether you just want results, or whether you're doing deep research.
I and my team began using dask about a year ago for parallelizing large Pandas/numpy jobs that were taking hours to run (if they didn't run out of memory). Using dask, we were able to cut these down to minutes with successful (i.e., identical) results.
Lots of RAM still recommended, but the parallelization capabilities and process dashboard/feedback is a great step forward.

Related

The Best Python Framework For Data Science [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 months ago.
Improve this question
My goal is to land a job in Data Science and I would like to ask the people who already work in this field and who can give me advise which Python Framework (Flask or Django) should I master / focus on?
My plan is to create machine learning projects and deploy them to a server, and present them as my experience since I don't have any actual work experience in this field. But I don't want to make a mistake spending hours and hours mastering framework that no one use and then learn again.
Thank You.
Both are good options.
Flask for small scope.
Django is complete, has feature for almost everything out of the box.
You might also include in your stack: pandas, spark, tensor flow, Apache Bean, Google Data Flow, and other related stuff.
Start doing small projects from the courses and tutorials to begin a portfolio, always go for the official documentation to tie up things.
The most important is one Python. Getting really good with Python is the most important pre-requisite.
Then learn data Science Python libraries, first NumPy, and then Pandas.
After that move on to advanced tools like TensorFlow, or the programming language R.
One of the best places to learn more about these technologies, take free courses on freecodecamp.org, first do the course on Python computing, then TensorFlow, both of these are great.

Python tools/libraries for Semantic Web: state of the art? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
What are the best (more or less mature, supporting more advanced logic, having acceptable performance, scalable to some extent) open source Semantic Web libraries and tools (RDF storage, reasoning, rules, queries) for Python nowadays? Historically Python tools (cwm) were among the first to appear, but it still seems that everyone uses Java back-ends for performance and Python as mere client if at all. My purpose is to learn the technology and maybe some future use in production system if it proves itself up to the task. The task is not yet defined, but as I see it its building a knowledge base, linked with some external resources, and customized facet-navigable web front-end.
If some building blocks based on Python are not good enough, then what is the suitable piece from Java/C/C++/whatever world.
Typical stack is also of interest, if there are one or two clear winners.
Thanks.
A survey of of Python libraries and tools for Semantic Web programming is available here. It includes libraries for working with RDF as well as Python-friendly triple stores.
Toby Segaran's book Programming the Semantic Web also has a lot of programming examples in Python.
You could check out the pyswip. It could work with the SWI-Prolog. Wish it would fit for requirement. :)
To name some, check out RDFLib and CubicWeb.

Python: Programming 8051 [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
Can I program 8051 using Python?
I'm not getting any of the to program 8051 in python environment.
If anybody knows, please help me.
There is Python-on-a-Chip, but note its "disclaimer":
"The PyMite VM DOES NOT HAVE:
A built-in compiler
Any of Python's libraries (no batteries included)
A ready-to-go solution for the beginner (you need to know C and how to work with microcontrollers)"
Thus, if the questioner's goal for python was to avoid dealing with the strangeness of the 8051, this may not help.
In particular, the 8051 is a "Harvard" style architecture, with separate RAM and ROM codespaces, and with very limited internal RAM, and larger external RAM that can be accessed only via loading the special DPTR register and then reading or writing indirectly, plus there's no external RAM stack support, nor intrinsic support for stack-based variables. Thus, most "general purpose" high-level languages need lots of customization and reworking to run on the 8051.
A good 8051-specific C-compiler can hide many of these low-level details, but you wind up burning lots of cycles to do things that are single instructions on desktop CPUs and even on most newer embedded controller architectures, and even if you can live with that level of in-efficiency, you still need to sort out the various memory spaces and other specifics.
So, getting Python to work on the 8051 is likely to be a challenging project for someone deeply familiar with its quirky architecture. If your goal is to dump a python onto the 8051 to avoid needing to learn these quirks, I'm not sure that is possible. (But, I suppose the C compilers keep getting better and better...)
Python-on-a-Chip looks about as close as you're going to get. It can run on some things that are just a bit beefier than the 8051.

Solution for distributing MANY simple network tasks? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I would like to create some sort of a distributed setup for running a ton of small/simple REST web queries in a production environment. For each 5-10 related queries which are executed from a node, I will generate a very small amount of derived data, which will need to be stored in a standard relational database (such as PostgreSQL).
What platforms are built for this type of problem set? The nature, data sizes, and quantities seem to contradict the mindset of Hadoop. There are also more grid based architectures such as Condor and Sun Grid Engine, which I have seen mentioned. I'm not sure if these platforms have any recovery from errors though (checking if a job succeeds).
What I would really like is a FIFO type queue that I could add jobs to, with the end result of my database getting updated.
Any suggestions on the best tool for the job?
Have you looked at Celery?

Looking for knowledge base integrated with bug tracker in python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Ok, I am not sure I want to use Request Tracker and RTFM, which is a possible solution.
I'd like to have a knowledge base with my bug tracker/todo list , so that when
I solve a problem, I would have a record of its resolution for myself or others later.
What python based solutions are available?
A highly flexible issue tracker in Python I would recommend is "Roundup":
http://roundup.sourceforge.net/.
An example of its use can be seen online at http://bugs.python.org/.
Try Trac
I do have experience using probably 20-30 different bug trackers, installed or hosted and so far if you are up to deal with a big number of bugs and you want to spend less time coding-the-issues-tracker, to get Atlassian Jira, which is free for open-source projects.
Yes, it's not Python, it is Java, starts slowly and requires lots of resources. At the same time, RAM is far less expensive than your own time and if you want to extend the system you can do it in Python by using https://pypi.python.org/pypi/jira-python/
Do you think that Jira is the most used bug tracker for no reason? It wasn't the first on the market, in fact is quite new compared with others.
Once deployed you can focus on improving the workflows instead of patching the bug tracker.
One of the best features that it has is the ability to link to external issues and be able to see their status, without having to click on them. As an warning, for someone coming from another tracekr you may discover that there are some design limitations, like the fact that a bug can have a single assignee. Don't be scared about it, if you look further you will find that there are way to assign tickets to groups of peoples.

Categories

Resources