How difficult is it to make Python web applications scale?
-
I would be interested to learn about any type of challenge people face when scaling python websites, except those that are common to every large scale web application (databases). In particular I am thinking about ultra large scale. (Is YouTube still written in Python?) Do people find it beneficial to switch to stackless, PyPy or Jython, instead of CPython? Is the global interpreter lock an obstacle to scaling python in practice? The mod_wsgi documentation claims it is not. However, I wonder how this works out for other deployment solutions. Also, is it still true if you outsource your static file handling to a different server or a CDN and use python code (think a framework like Django) to handle url? Here is the relevant quote from the mod_wsgi documentation: Although contention for the global interpreter lock (GIL) in Python can causes issues for pure Python programs, it is not generally as big an issue when using Python within Apache. This is because all the underlying infrastructure for accepting requests and mapping the URL to a WSGI application, as well as the handling of requests against static files are all performed by Apache in C code. While this code is being executed the thread will not be holding the Python GIL, thus allowing a greater level of overlapping execution where a system has multiple CPUs or CPUs with multiple cores. This ability to make good use of more than processor, even when using multithreading, is further enchanced by the fact that Apache uses multiple processes for handling requests and not just a single process. Thus, even when there is some contention for the GIL within a specific process, it doesn't stop other processes from being able to run as the GIL is only local to a process and does not extend across processes. Source: http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
-
Answer:
I expect you would see a much greater scalability boost from an async python framework (like tornado http://developers.facebook.com/blog/post/301/) than stackless/pypy or worrying about the GIL. Beyond that it completely depends on the nature of the application. Specifically it depends on how stateful the application is and how much communication and synchronization is necessary between web server instances. Web applications are very rarely computationally intensive. If this happens to be one of the exceptions at least python makes it moderately easy to call a C library.
Jason Kane at Quora Visit the source
Other answers
Scaling a web application written in Python is done using the same technics that used in other languages taking into account the VM features. Most scalable design for a given VM will look for maximizing concurrency which means that it must take advantage of multitasking among which asynchronous code. In CPython, it translates by using several processus (or threads if GIL's harm is not noticeable) each of which hosting the same event-loop (a user space cooperative scheduler) based application. Global Interpreter Lock (GIL) is harmful when the executed code is CPU intensive (like cpu based mpeg decompression) because even if the machine is multicore only one thread per CPython processus will be running at any point in time. At least blocking functions like i/o operations will release GIL and allow another threads from the same process to be executed, upon kernel scheduler orders, that's why GIL might not be noticeable in a multithread Python application running on CPython. Mind the fact that gunicorn supports multiple process but each process can only have one thread. For more information about Python's GIL have a look at https://wiki.python.org/moin/GlobalInterpreterLock Asynchronous code just like any other code taking advantage of multitasking has a bad reputation. In the case of asynchronous code, it's mostly because of JavaScript's anonymous functions, leading to code spaghetti and historically based only on callback functions. This is a changing with the advent of "yield" (cf. Gevent) and "yield from" (cf. asyncio) and ES 6 generators, cf. http://tirania.org/blog/archive/2013/Aug-15.html. Asynchronous callback based code is made nicer to deal with in Python just because you have a really Class based Object Oriented type system. It's false, to say that nobody considers GIL being harmful in a such a context, as process are more costly in terms of memory than threads. If there was no GIL and with proper thread management code, some kind thread arbiter, process with multiple threads will allow to scale better. Process are managed by the kernel and the init system, that said - at larger scale - distributed process management is preferred because easier to integrate with load balancing infrastructure cf. realtime diagnostic tools for distributed systems. When people move to PyPy or Stackless or Jython, it's not always because of GIL: - PyPy is a JIT compiler with a GIL. PyPy is choosed mostly to take advantage of the speed gain. It's also said that it can require less memory, for instance PyPy can compute __slots__ and avoid the need to use a dict for every single non-builtin types (the use of __slots__ also make attribute resolution faster). PyPy aims also to remove GIL through Transacional Memory cf. http://pypy.org/tmdonate.html - Jython doesn't have GIL but it also can take advantage of all programs developped for the JVM - Stackless Python has a GIL but aims to avoid the need to rely on threads for concurrency, it's kind of similar to an event-loop solution except its solution - called microthreads - is more generic and more powerful than a simple event-loop. For instance you can pickle such a microthread and send it in to another process possibly on another machine. I encountered that feature, the readily available ability to move a task to another machine, only in Erlang-VM (aka. BEAM) and Termite Scheme VM that is inspired from BEAM. Stackless is extensively used in CCP's EVE Online game both client and server side. The only Stackless based web framework, I know, is http://www.nagare.org/ and I'm not sure they take advantage of the "task pickling" feature. Checkout https://pypi.python.org/pypi/greenlet for a microthreads for CPython. So, Yes, people can find it beneficial to move to other Python VMs - but not always and only because of the lake of GIL. Scaling a website application will probably need to first address the more general problem of dealing with larger code base which is generally said to be a weak point of Python as a language which I mostly attribute to a lake of tooling or lake of knowledge in available tooling and FUD. Most of the time, when a company moves away from Python (or similar languages) because of the larger code base, they choose a static typed language first because it's said to be easier to manage (a). They also do so because: b) A large potential labor pool cf. http://c2.com/cgi/wiki?ArmyOfProgrammers c) "static languages are faster" d) More readily available tools The major trade offs is a reduction of readily available expressiveness power among which meta-programming. This can be mitigated with proper tooling or proper languages like Scala which is used at Twitter to replace Ruby (and Java), see http://monkey.org/~marius/talks/twittersystems/. I won't argue that "Python is easier to write hence there is less bug hence faster iteration hence better products" and instead argue that all of the above point are addressed or can be addressed except the Army Of Programmers which is dealt separately after: a) "large code base written in a static language is easier to manage", this is mostly due to faster feedback from the editor or compiler taking advantage of static typing. In Python you can take advantage of lint tools like pylint with pylint-brain which among other things do types inference. In the long run, from the language design perspective, it's argued that pluggable and optional static typing is a better solution, cf. http://lambda-the-ultimate.org/node/1311 c) "static languages are faster" I don't think that's a major argument in the sens, that it doesn't take into account the "reduction of expressive power" that is usually an integral part of moving to a "static typed language". But still, Python is slower than C and JVM based languages, even if PyPy and others try to address that and make Python competitive. They are also active reasearch done in compiling Python to optimized machine code like Cython, numba compiler and others. d) "more readily available tools" this is backed by three entangled ideas: - avoid the need for a polyglot team - ease the pain of interacting with other tools - and actually more tools available in the language. A lot of software targeting wanna-be web-scale infrastructures are written and only available in Java. The last argument "Army Of Programmers" is True but: - Finding Python programmers is easier than before. - I can't back up my claim with much experience in Java that's said I think that Python as language is easier to learn and master than Java language. - It's easier to find a good programmer knowing Python and willing to write Python than finding a good programmer willing to write Java. So, yes Python can technically be part and is part of web scale products even if most of the time we can not be sure that it's not just to replace Makefiles cf. http://en.wikipedia.org/wiki/Programming_languages_used_in_most_popular_websites Choosing Python for a web scale application must go through the same (or better) scrutiny that lead some big names to move to a static typed language. Thanks to Jython, I can write database queries in Python for Java based database. I know some Java, anyway, I don't mind using Hadoop and other "web scale" blurbed softwares. I think Python is better tool to deal with asynchronous code than JavaScript, among the many assets Python as language has over JavaScript. That's said, except Tornado, there is no framework ready-made to deal with the c10k problem. The classic frameworks - all of them not meant to be used for the realtime web, catching up has been unsuccessful, impossible or with marginal success. When I write marginal success, I think of Gevent, Python 3 & Stackless.
Amirouche Boubekki
Really it's not that difficult. It all depends on the architecture of the website. Optimizations like Nginx for static and apache for dynamic content Use tornado for serving Api's due to its async nature and can be scaled easily. Use CDN for serving content. Compress js and css Proper use of vertical and horizontal scaling Load balancers and auto scaling features Can be done first and then you can think of other optimizations.
Shobhit Jain
Related Q & A:
- How To Make A Web Site?Best solution by Yahoo! Answers
- How to make a web crawler?Best solution by Stack Overflow
- How do I make a python web program that is on a ubuntu server allow access to the server?Best solution by Yahoo! Answers
- What is the best way to learn how to build websites and web applications with Python?Best solution by Quora
- How to make my college applications more impressive?Best solution by supercollege.com
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.