It's a question of whether you want to use one big, fat lock (aka GIL) or use lots of fine-grained locks. The former is easier to implement compared to the latter. If you want to know more about CPython's GIL, you should check out David Beazley's GIL talks ([1] & [2]). I think video recordings of his talks are available on Youtube as well.
Btw in CPython extensions, you're allowed to release GIL, do a bunch of computation and let other Python threads do their job. When you're done with the computation, you can reacquire the GIL and proceed as normal. This is what CPython does whenever you do bl
It's a question of whether you want to use one big, fat lock (aka GIL) or use lots of fine-grained locks. The former is easier to implement compared to the latter. If you want to know more about CPython's GIL, you should check out David Beazley's GIL talks ([1] & [2]). I think video recordings of his talks are available on Youtube as well.
Btw in CPython extensions, you're allowed to release GIL, do a bunch of computation and let other Python threads do their job. When you're done with the computation, you can reacquire the GIL and proceed as normal. This is what CPython does whenever you do blocking syscalls. Lots of extensions like NumPy release the GIL, do a bunch of computation with multiple threads. But GIL becomes a huge issue if you're trying to do lot of CPU-intensive work in pure Python code.
[1] - www.dabeaz.com/python/UnderstandingGIL.pdf
[2] - www.dabeaz.com/python/GIL.pdf
Java was designed from the beginning (more or less) to be multithreaded. A global lock makes multithreading impossible; it means that some thread can hog the system and cause resource starvation. Even system threads could potentially be starved.
GILs are simple to implement, and effective, but limited. The Java designers had much higher hopes for their threading model.

The Global Interpreter Lock (GIL) is a mechanism used in some programming language interpreters, notably CPython (the reference implementation of Python) and MRI (Matz’s Ruby Interpreter), to ensure that only one thread executes Python or Ruby bytecode at a time. This simplifies memory management and eliminates the complexities of concurrent access to objects. However, it also limits the ability to fully utilize multi-core processors for CPU-bound tasks.
Reasons JVM Does Not Have a GIL:
- Architecture and Design Philosophy:
- The JVM was designed with a focus on enabling high-performance, multi
The Global Interpreter Lock (GIL) is a mechanism used in some programming language interpreters, notably CPython (the reference implementation of Python) and MRI (Matz’s Ruby Interpreter), to ensure that only one thread executes Python or Ruby bytecode at a time. This simplifies memory management and eliminates the complexities of concurrent access to objects. However, it also limits the ability to fully utilize multi-core processors for CPU-bound tasks.
Reasons JVM Does Not Have a GIL:
- Architecture and Design Philosophy:
- The JVM was designed with a focus on enabling high-performance, multi-threaded applications. Java's concurrency model is built around threads, and the JVM provides robust support for multi-threading without a GIL.
- The JVM leverages native operating system threads, allowing it to take advantage of multi-core architectures effectively. - Garbage Collection:
- The JVM employs sophisticated garbage collection techniques that are designed to work in a concurrent environment. The garbage collector can run concurrently with application threads, which reduces the need for a GIL to manage memory safely.
- Different garbage collection algorithms (like G1, ZGC, and Shenandoah) can operate concurrently with application threads, allowing for better performance in multi-threaded environments. - Synchronization Mechanisms:
- Java provides a rich set of synchronization primitives (likesynchronized
blocks,volatile
variables, and higher-level constructs from thejava.util.concurrent
package) that allow developers to manage concurrency without a GIL.
- These primitives are designed to work seamlessly with the underlying thread model, enabling fine-grained control over synchronization and resource access. - Language Features:
- Java was designed from the ground up with multi-threading in mind, allowing developers to create highly concurrent applications. In contrast, Python and Ruby were originally designed for simpler, single-threaded applications and later adapted to support concurrency, leading to the introduction of the GIL as a way to simplify that adaptation. - Performance Considerations:
- The absence of a GIL in the JVM allows for better performance in CPU-bound applications, as it can utilize multiple cores effectively. This is particularly beneficial for applications that require high throughput and low latency.
Conclusion:
In summary, the JVM does not have a GIL because it was designed to support multi-threading from the beginning, employs efficient garbage collection strategies, and provides robust synchronization mechanisms that allow for concurrent execution without the need for a GIL. In contrast, Python and Ruby's GIL arose from their need to simplify memory management and object access in a multi-threaded environment, which can lead to performance bottlenecks in multi-core scenarios.
It's a design decision. Either give the global lock to the running thread, or lock a resource that a thread is using. The jvm uses the latter approach. The JVM implementation of python (jython) does not introduce a GIL. I'm not sure if the same is true for JRuby.
It's a very tough decision to make. Since threads are handled differently per OS, the implementors have to prioritize between the two: a cross-platform language or a cross-platform execution environment. It's already a tremendous effort to implement a language, much more for an execution environment.
Python (the language) doesn't need a GIL (which is why it can perfectly be implemented on JVM [Jython] and .NET [IronPython], and those implementations multithread freely). CPython (the popular implementation) has always used a GIL for ease of coding (esp. the coding of the garbage collection mechanisms) and of integration of non-thread-safe C-coded libraries (there used to be a ton of those around;-).
The Unladen Swallow project, among other ambitious goals, does plan a GIL-free virtual machine for Python -- to quote that site, "In addition, we intend to remove the GIL and fix the state of mul
Python (the language) doesn't need a GIL (which is why it can perfectly be implemented on JVM [Jython] and .NET [IronPython], and those implementations multithread freely). CPython (the popular implementation) has always used a GIL for ease of coding (esp. the coding of the garbage collection mechanisms) and of integration of non-thread-safe C-coded libraries (there used to be a ton of those around;-).
The Unladen Swallow project, among other ambitious goals, does plan a GIL-free virtual machine for Python -- to quote that site, "In addition, we intend to remove the GIL and fix the state of multithreading in Python. We believe this is possible through the implementation of a more sophisticated GC system, something like IBM's Recycler (Bacon et al, 2001)."
Where do I start?
I’m a huge financial nerd, and have spent an embarrassing amount of time talking to people about their money habits.
Here are the biggest mistakes people are making and how to fix them:
Not having a separate high interest savings account
Having a separate account allows you to see the results of all your hard work and keep your money separate so you're less tempted to spend it.
Plus with rates above 5.00%, the interest you can earn compared to most banks really adds up.
Here is a list of the top savings accounts available today. Deposit $5 before moving on because this is one of th
Where do I start?
I’m a huge financial nerd, and have spent an embarrassing amount of time talking to people about their money habits.
Here are the biggest mistakes people are making and how to fix them:
Not having a separate high interest savings account
Having a separate account allows you to see the results of all your hard work and keep your money separate so you're less tempted to spend it.
Plus with rates above 5.00%, the interest you can earn compared to most banks really adds up.
Here is a list of the top savings accounts available today. Deposit $5 before moving on because this is one of the biggest mistakes and easiest ones to fix.
Overpaying on car insurance
You’ve heard it a million times before, but the average American family still overspends by $417/year on car insurance.
If you’ve been with the same insurer for years, chances are you are one of them.
Pull up Coverage.com, a free site that will compare prices for you, answer the questions on the page, and it will show you how much you could be saving.
That’s it. You’ll likely be saving a bunch of money. Here’s a link to give it a try.
Consistently being in debt
If you’ve got $10K+ in debt (credit cards…medical bills…anything really) you could use a debt relief program and potentially reduce by over 20%.
Here’s how to see if you qualify:
Head over to this Debt Relief comparison website here, then simply answer the questions to see if you qualify.
It’s as simple as that. You’ll likely end up paying less than you owed before and you could be debt free in as little as 2 years.
Missing out on free money to invest
It’s no secret that millionaires love investing, but for the rest of us, it can seem out of reach.
Times have changed. There are a number of investing platforms that will give you a bonus to open an account and get started. All you have to do is open the account and invest at least $25, and you could get up to $1000 in bonus.
Pretty sweet deal right? Here is a link to some of the best options.
Having bad credit
A low credit score can come back to bite you in so many ways in the future.
From that next rental application to getting approved for any type of loan or credit card, if you have a bad history with credit, the good news is you can fix it.
Head over to BankRate.com and answer a few questions to see if you qualify. It only takes a few minutes and could save you from a major upset down the line.
How to get started
Hope this helps! Here are the links to get started:
Have a separate savings account
Stop overpaying for car insurance
Finally get out of debt
Start investing with a free bonus
Fix your credit
A2A
The other answers here are good, but they focus on the excuses for the existence of a GIL and not so much on the why.
TL;DR
Threads are hard and error prone and true threading is not always a win, simple mistakes can have a devastating effect on performance, to the point of a deadlock which completely halts the program. The GIL avoids nearly all of this.
Why can only one thread own the interpretation routine?
restated
Why can only one thread run at a time?
When multiple threads run concurrently, they can access the same data at the same time. Consider a Python List, which is actually based on an
A2A
The other answers here are good, but they focus on the excuses for the existence of a GIL and not so much on the why.
TL;DR
Threads are hard and error prone and true threading is not always a win, simple mistakes can have a devastating effect on performance, to the point of a deadlock which completely halts the program. The GIL avoids nearly all of this.
Why can only one thread own the interpretation routine?
restated
Why can only one thread run at a time?
When multiple threads run concurrently, they can access the same data at the same time. Consider a Python List, which is actually based on an underlying array of data. If two threads were to append to that same list at the same time, which thread wins? There is only one last bucket, and as such the thread that was a pico second behind the other would likely be the winner to the append and the data from the first thread would be lost. Even worse consider this append triggered a dynamic increase in the size of the array, which means allocate a new array that is larger than the previous, copy the data from the original and replace the original with the new copy. Now if you have two threads executing that routine at the same time, what is the result going to be. Two new arrays allocated, two copies made, and then finally the replacement, where only one of the threads wins, again with data loss. This is only one of many such situations where what seems simple becomes very complex with threading.
Simply put, two threads cannot run at one time because when they both write to the same data structures, those structures will become corrupt and data will be lost.
But Java and other languages allow it!
Very true, but if you look closely at Java and the other languages that allow true threading, you will find that those languages have different versions of the same type of Collection, one that is “Thread Safe” and another that is not. A snip from the ArrayList JavaDoc.
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more elements, or explicitly resizes the backing array; merely setting the value of an element is not a structural modification.)
ArrayList (Java Platform SE 8 )
In order for Python to provide a non-GIL capability, it would need to then have to provide a “Thread-Safe” version of List, Dict, Set and any other collection that could be used. If they simply made the existing ones thread safe, then it would slow down all applications that used that collection, as the work to make a collection thread safe is not free. If they created a new data structure, that might work but then developers need to become knowledgeable about when to use which. Perhaps that isn’t to hard of a requirement, but then consider all of those existing Python libraries which you are now using which assume a GIL. Now they could no longer be used, because they would not be thread safe.
Thread Safe?
What does it mean to be thread safe? It means that the code is aware that multiple threads of execution are running concurrently, and that any action that might create a problem in that scenario, is synchronized? To be synchronized means that only one thread at a time can execute a critical block of code, code that if two threads executed, would corrupt data or otherwise cause harm. So each thread must wait for the first thread to complete before the next thread can run. In the case of the List above, the first thread to request the append, would obtain a lock. That thread would then resize the list append it’s data and release the lock. Now the second thread given the lock would see that the resize is not required and would only append it’s data, after the data of thread 1. However, just as with the GIL, thread 2 had to stop execution and wait.
Deadlocks anyone?
That is all fine and good, and could even be invisible to the developer, albeit for the inherit slowdown on every access of data. But what happens if you have two threads and two different collections. Both threads need to modify both collections. However, they are not in the same block of code. In thread 1, it first modifies collection a and then modifies collection b. In thread 2 the situation is reversed, with thread 2 first modifying collection b. In the simple scenario, each thread makes it’s modification to the first collection. However, then each thread makes modifications to the second collection without the knowledge that the other thread has already made a modification it is unaware of. While data is not lost due to our new fancy synchronized collections, data may be incorrect in both cases, because each thread assumed that the data was consistent in both collections through it’s operation. In the more complicated scenario where it is made possible to resolve the problem by allowing each thread to lock both collections before releasing the other, insuring that modifications are consistent, when thread 1 locks collection a and thread 2 locks collection b and thread 1 now needs collection b to proceed and thread 2 needs collection a you have a classic deadlock and both threads and any other threads that may need those same resources will not be able to continue. This is a simple illustration, the situations that happen in the real world can be much more complex and discrete.
Why do people use threads in Python?
Often the need for threads is created by an external resource that if waited on would slow a single threaded application to a halt. Asynchronous IO is a potential solution in such cases to remain in a single thread, but asynchronous IO can be devastatingly complex, and even when implemented correctly makes code very difficult to read. On the other hand threads in this situation can allow you to write code that is simple without blocking the entire process on the IO from the external resource. In this case there is no penalty imposed by the GIL, instead the thread will yield while it waits for IO to the other available threads in the program. When IO completes the thread will be reawakened and continue processing as though it never stopped. This is the case for network calls, disk reads, database calls, and external api calls. Often the IO is the vast majority of the time spent in a program and the GIL makes it all easy, and does not impose an undue penalty to performance.
But my workload is CPU bound with little IO!
In that case, the GIL will be an issue for using threads. However, it is not an issue for using Python. Python provides an alternative multi-process module, that uses a very similar interface to the thread interface, and allows you to easily add enough processes to maximize CPU utilization on any server. Each process get’s it’s own GIL so there is no issue with concurrency.
To be fair, there are GIL-less Pythons (and probably Rubies).
But, have you seen what .NET looks like on the inside, where everything is made to be fast and GIL-free? It’s a nightmare, it could only have been written by people who were getting paid for it. The GIL makes the job of the runtime, compiler and interpreter writers much easier.
P.S. Also keep in mind CPython is a reference implementation - yes, it’s a working program, but it’s also supposed to be documentation. You can’t make it efficient without also making it ugly and unsuitable to be used as documentation.
The best freelance digital marketers can be found on Fiverr. Their talented freelancers can provide full web creation, or anything Shopify on your budget and deadline. If you’re looking for someone who can do Magento, Fiverr has the freelancers qualified to do so. If you want to do Dropshipping, PHP, or, GTmetrix, Fiverr can help with that too. Any digital marketing help you need Fiverr has freelancers qualified to take the reins. What are you waiting for? Start today.
I don't know how many people use Jython ... but I'm pretty sure it's a tiny fraction of those who are using CPython and it's probably a pretty small fraction of those who are using Python with NumPy.
(NumPy/SciPy drives a significant portion of Python usage ... and Jython doesn't support it nor the rest of its ecosystem).
But, why do people use NumPy and Numba and the rest of that ecosystem if CPython's is crippled by the GIL?
Oh. It isn't! The performance isn't crippled by the GIL. Portions of NumPy are compiled C and written to provide fine-grained access to release the GIL ... and also to s
I don't know how many people use Jython ... but I'm pretty sure it's a tiny fraction of those who are using CPython and it's probably a pretty small fraction of those who are using Python with NumPy.
(NumPy/SciPy drives a significant portion of Python usage ... and Jython doesn't support it nor the rest of its ecosystem).
But, why do people use NumPy and Numba and the rest of that ecosystem if CPython's is crippled by the GIL?
Oh. It isn't! The performance isn't crippled by the GIL. Portions of NumPy are compiled C and written to provide fine-grained access to release the GIL ... and also to support the SIMD features of those processes which have them.
How about other areas of CPython usage? How often does multi-threading performance really make a difference?
Systems administration tasks (such as those using the Paramiko (ssh) module including Fabric and Ansible ... and various others generally dominated by start-up times for new processes and network connection latencies. Threading isn't usually an issue for those ... and the use of multiprocessing or distributed processes (various jobs queues using Redis and others) are generally preferable to multithreading in those applications.
I find that most questions about the Python GIL display ignorance and a bit of laziness about the issues around threading, processing, and distributed processing. People hear just enough about the GIL to decide that it must be some sort of insurmountable issue which should discourage the use of Python (or CPython) across the board.
The fact is that there are only a narrow range of applications in which multi-threading is potentially preferable to other approaches ... and fewer where it's optimal.
On the other hand there is a huge range of applications for a programming language which is generally easy to learn, relatively easy to read, includes a broad and useful range of standard libraries (batteries are included) and has huge collections of freely available modules. frameworks, libraries and tools.
Jython is Python. But CPython is the tool with that huge corpus of available software (only some of which is portable to Jython). The CPython ecosystem is far more extensive than Jython's.
Also the start-up type for a Jython is practically crippling for most of the simple administrative and utility use-cases at which Python excels. On my MacBook I can run time python -c 'import sys; sys.exit' in under three hundredths of a second (0.029 seconds) or less. Jython generally takes closer to 2 seconds (between 1.81 and 1.99 seconds to run the same code. (That might not seem fair ... but administrative and other command line utilities are a use case where start-up time is important and one second load times would be completely disruptive).
the CPython interpreter uses a GIL because maybe counter-intuitively it offers the best performance and the design is simple for the time when it was written.
When the CPython interpreter was originally written the most heavily used threading model was single CPU intense main producer threads and multiple I/O busy consumer threads - the machines were typically single core and therefore having a GIL in this situation is fine, especially when the Interpreter itself means that there is a lot of shared data between threads all of which are reference counted. Remember everything in Python is a refer
the CPython interpreter uses a GIL because maybe counter-intuitively it offers the best performance and the design is simple for the time when it was written.
When the CPython interpreter was originally written the most heavily used threading model was single CPU intense main producer threads and multiple I/O busy consumer threads - the machines were typically single core and therefore having a GIL in this situation is fine, especially when the Interpreter itself means that there is a lot of shared data between threads all of which are reference counted. Remember everything in Python is a reference counted objects including integers and floats.
Having discrete locks on every object would be intensive and much slower: a re-implementation of CPython with discrete locks per object was around 5% slower.
To use discrete locks (such as JVM does) then a few things needs to happen :
- Massively reduce the amount of shared reference counted objects - some recent changes and proposed changes move in that direction - sub-interpreters for example.
- A rework of the CPython core to use discrete locks.
- A rework of all Python libraries that manipulate the GIL.
These are all big projects and until item 3 is complete the GIL can’t disappear.
Yes, Java can indeed be used as an implementation language for creating a Python-like programming language with its own compiler or interpreter.
Here are the key points on how this can be accomplished:
1. Lexical Analysis:
This is the first stage where the source code is divided into tokens. Java provides libraries such as ANTLR (Another Tool for Language Recognition) which can help in writing lexical analyzers.
2. Syntax Analysis (Parsing):
After tokenization, the next step is parsing, where the tokens are transformed into a syntax tree. Again, tools like ANTLR or JavaCC (Java Compiler Compiler) c
Yes, Java can indeed be used as an implementation language for creating a Python-like programming language with its own compiler or interpreter.
Here are the key points on how this can be accomplished:
1. Lexical Analysis:
This is the first stage where the source code is divided into tokens. Java provides libraries such as ANTLR (Another Tool for Language Recognition) which can help in writing lexical analyzers.
2. Syntax Analysis (Parsing):
After tokenization, the next step is parsing, where the tokens are transformed into a syntax tree. Again, tools like ANTLR or JavaCC (Java Compiler Compiler) can be used to generate parsers.
3. Semantic Analysis:
This involves checking the syntax tree for semantic errors. This can be custom-implemented in Java by traversing the syntax tree and applying rules specific to your Python-like language.
4. Intermediate Code Generation:
In this step, the syntax tree is transformed into an intermediate representation (IR). This IR can then be optimized before generating the final output. Java can be used to create and manipulate this IR.
5. Optimization:
Various optimizations can be applied to the intermediate code to improve performance. Java’s rich set of libraries and efficient algorithms can be used to implement these optimizations.
6. Code Generation:
The intermediate code is then converted into target code. This can be bytecode (if you're building a virtual machine) or machine code (if you're building a compiler). Java can generate Java bytecode dynamically using libraries like ASM or BCEL (Byte Code Engineering Library).
7. Interpreter (if applicable):
If you decide to create an interpreter instead of (or in addition to) a compiler, Java can be used to write the interpreter that executes the IR or the source code directly.
8. Runtime Environment:
If your language requires a runtime environment, Java can be used to create this as well. This includes memory management, garbage collection, etc., leveraging Java’s own runtime capabilities.
Example Tools and Libraries:
- ANTLR:
A powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used for language development.
- JavaCC:
Another parser generator for Java that can help in building compilers and interpreters.
- ASM:
A Java library for generating and manipulating bytecode, useful if your target is Java bytecode.
- BCEL:
Another library for bytecode manipulation.
Steps in Detail:
1. Design the Language Grammar:
- Define the syntax and semantics of your Python-like language.
- Use ANTLR or JavaCC to create a grammar file that defines the language structure.
2. Implement the Lexer and Parser:
- Use the grammar file with ANTLR/JavaCC to generate the lexer and parser.
- Integrate the generated lexer and parser with your Java application.
3. Build the Abstract Syntax Tree (AST):
- Create classes in Java representing the nodes of your AST.
- Use the parser to build an AST from the source code.
4. Semantic Analysis and Transformation:
- Implement visitor or traversal patterns to analyze and transform the AST.
- Ensure type checking, scope resolution, and other semantic checks.
5. Generate Intermediate Representation (IR):
- Define an intermediate representation for your language.
- Implement code generation to transform the AST into IR.
6. Optimize the IR:
- Apply optimization techniques to improve the IR.
- Implement common optimizations like dead code elimination, constant folding, etc.
7. Generate Target Code:
- Convert the optimized IR into target code (e.g., Java bytecode).
- Use libraries like ASM to generate the bytecode if targeting the JVM.
8. Implement the Runtime:
- Develop runtime support for your language (e.g., memory management, standard libraries).
- Leverage Java's runtime features for efficient implementation.
By following these steps, you can create a Python-like language using Java as the implementation language. The choice of Java provides strong type safety, a rich set of libraries, and the ability to generate and manipulate bytecode, making it a suitable choice for building compilers and interpreters.
There’s massive inertia in the system.
And the more software we have, and the more legacy systems we have to support and fix and extend, the slower it evolves.
It’s no different from in the 90s when everyone was asking why COBOL was still around when we had much better languages.
Well, COBOL still is around. (About five years ago, a friend of mine told me that the bank she worked at tried to replace an old COBOL system with C++ and couldn’t because C++ was too slow. Think about that for a second.)
So … finally … in the late 90s, there was a big shift. And Java caught that wave. Java was the winner
There’s massive inertia in the system.
And the more software we have, and the more legacy systems we have to support and fix and extend, the slower it evolves.
It’s no different from in the 90s when everyone was asking why COBOL was still around when we had much better languages.
Well, COBOL still is around. (About five years ago, a friend of mine told me that the bank she worked at tried to replace an old COBOL system with C++ and couldn’t because C++ was too slow. Think about that for a second.)
So … finally … in the late 90s, there was a big shift. And Java caught that wave. Java was the winner at a time of huge expansion, and many enterprises deciding to finally bite the bullet and upgrade their 20 or 30 year old architectures.
It is massively locked in.
Now, where that doesn’t matter. In areas which are new and fast moving and where legacy issues don’t matter much. For example, in robotics, in mobile, on web-services. Java is losing ground to other languages.
Google just made Kotlin an official language for programming Android. I’d be interested to know why Kotlin and not Scala or Clojure. My hunch is that both those languages are seen as bigger and heavier. And perhaps Clojure is still too exotic and difficult for people. (Despite being the nicest language on offer today)
I expect slow encroachment by Clojure (and Kotlin and Scala etc.) where more and more newbuild for the JVM is done using them. But there won’t be the big wave of adoption we saw with Java.
Partly, also, because Java was heavily pushed by a corporate sponsor, Sun, whereas Clojure and Scala etc. are coming from the small companies and communities. I’d guess that Datomic don’t have salesmen going into enterprises promising that Clojure will solve all their problems, the way Sun used to promote Java.
Erlang was designed to be concurrent and distributed from the get-go: the assumption, before the language existed, was that it would permit multiple, independent, processes running on collections of devices. So, the architecture is completely different and much more amenable to concurrent programming.
I use Python all the time, and I like it, but it’s just not good in this department. I’m not a Ruby user, but I have a friend who uses it all the time, and I’m confident he would say the same thing: it’s great when you need to whip out a useful, sequential program, and it has fantastic libraries f
Erlang was designed to be concurrent and distributed from the get-go: the assumption, before the language existed, was that it would permit multiple, independent, processes running on collections of devices. So, the architecture is completely different and much more amenable to concurrent programming.
I use Python all the time, and I like it, but it’s just not good in this department. I’m not a Ruby user, but I have a friend who uses it all the time, and I’m confident he would say the same thing: it’s great when you need to whip out a useful, sequential program, and it has fantastic libraries for all kinds of things. But in Erlang (and Elixir, of course, because it also runs on the Beam), concurrent, distributed things are just much easier, because they are supported top to bottom.
Python and Ruby were designed as normal, sequential, imperative languages. Programs in these languages are thought of conventional, stateful programs, and the interpreter itself is a conventional, stateful program that carries the interpreter state and the program state. These two things are intertwined. There is a global environment that you import things into and can extend at will in a program, for example. When concurrent threads were bolted on to them, you not only have the shared state of the program threads; but all the threads are running in an interpreter that has lots of global state, and those threads are modifying their memory state and the interpreter state, too.
The GIL is a rather crude solution to this problem: one giant lock that protects all the interpreter’s state. Since so many things modify it, effectively only one thread can actually be interpreted at a time.
A giant lock was not a new idea. Some operating systems did the same thing (See Giant lock - Wikipedia) as they moved onto SMP systems. I recall getting a big server machine back in the 1990s that had a global file system lock that mount requests had to get. This made loop-back mounting problematic (yes we deadlocked the file system while others were using the server. Linux had a big kernel lock as it migrated first to SMP systems where user programs were concurrent, but kernel code was not. Eventually, the big kernel lock was eliminated in 2011. (Stay tuned for an echo of this story.)
The Beam, which is the modern Erlang virtual machine, was designed not to interpret a sequential language in a sequential environment. Rather it was designed to be something that implemented pre-emptive, shared nothing, concurrent processes in user space. In many ways, the Erlang runtime system (ERTS) is a mini-operating system that runs in support of an interpreter. Since the language is mostly functional and since processes don’t share anything, there is a lot less contention for shared resources. All that happens in the runtime system. Again, the interpreter was specifically designed to be run to interpret multiple concurrent processes, so they did what any good designer of a concurrent system does: eliminate as much shared state as possible and protect what you can’t eliminate, but they also designed the language semantics to make that easier. For example, there isn’t a global environment that all processes share and can add new variables to whenever they like. If it doesn’t exist, you don’t have to protect it! Each process has its own stack and heap, so there is no global garbage collection. Let me say that again: every Erlang process can run, allocate memory, and run the GC without coordinating with other Erlang processes or with the VM. Again, separate, non-shared things don’t require synchronization.
In fact, ERTS took some time to support SMP. Erlang programs ran concurrently without multi-core support until 2006. Here’s something really cool: when the runtime system started to support multicore computing, applications didn’t have to change. They just ran faster. That’s because of the insulation of the runtime system from the programs it runs — it’s more like an OS. Note the parallel here with Linux: first, a single scheduler could run multiple processes on multicore machines, though the ERTS was not itself distributed across cores; then the ERTS was eventually distributed across cores and user programs just reaped the benefits. This is what Linux would do some years later.
Today, the ERTS runs an Erlang VM scheduler per core and migrates Erlang processes from core to core based on load.
On a final note, the fact that distribution across machines was there from the start affected design decisions, too. Joe Armstrong once said, when asked why they didn’t have shared memory, something like, “in a system that runs on thousands of telephone switches all over Europe, where is the shared memory?” The idea that you can write a program that starts and monitors processes on other computers leads you to make pretty clear separations between processes and to be very clear about how they can communicate. If a process is to run without worrying about whether another process it talks to is on the same computer or one on another floor or in another city, then communication via messages is a natural choice.
If you’re interested in the gory details, look at the The Beam Book: The Erlang Runtime System. I haven’t read the whole thing (yet), but the few times I’ve looked at it, I’ve always learned something interesting.
I only know a bit of Python, but I assume Ruby is similar in this regard.
Both reference counting as a memory management method, and the Global Imterpreter Lock, were design choices made primarily to keep the implementation of CPython simple. Like Linux, Python and Ruby started as private initiatives by a single person.
Today, when scaling up typically means that we use many small cores in the cloud, rather than a single, multi-core CPU, threading performance has become less of an issue, and initiatives like asyncio and the async/await syntax becomes more interesting.
The GIL is per-interpreter, so it doesn't limit scalability that much - you can just spawn multiple processes if you want to use multiple cores. You can then use message passing to communicate between the processes - there is an overhead because you're no longer in a shared address space, but there are some benefits like increased reliability (independent copies of data; fits FP "view of the world", languages like Erlang love it) and flexibility (ZeroMQ can work over a network, probably with no changes to code, except if you need to make your program accommodate the higher latencies of some li
The GIL is per-interpreter, so it doesn't limit scalability that much - you can just spawn multiple processes if you want to use multiple cores. You can then use message passing to communicate between the processes - there is an overhead because you're no longer in a shared address space, but there are some benefits like increased reliability (independent copies of data; fits FP "view of the world", languages like Erlang love it) and flexibility (ZeroMQ can work over a network, probably with no changes to code, except if you need to make your program accommodate the higher latencies of some links).
It should also be noted that many things happen in parallel despite the GIL, like I/O and code that relies on some libraries (for example NumPy, a popular Python library, does all the number crunching outside the GIL and Python - your Python program will be utilizing the CPU very well, with just small constant overheads for getting data "across the border").
About why they don't remove it, like the other answers mentioned, there are implementations without a GIL - but CPython and MRI are reference implementations, and a GIL makes the interpreter and standard library much simpler. Here is an example from Nobody understands the GIL that shows the same program in Ruby, JRuby and Rubinius:
- array = []
- 5.times.map do
- Thread.new do
- 1000.times do
- array << nil
- end
- end
- end.each(&:join)
- puts array.size
- $ ruby pushing_nil.rb
- 5000
- $ jruby pushing_nil.rb
- 4446
- $ rbx pushing_nil.rb
- 3088
Note the sizes reported by JRuby and Rubinius are random - you'll get a different number on every execution. The number reported by MRI will be correct every time.
The same example will probably work fine in languages that never had a GIL - like C++, Java, C#, etc., but their implementations make sure that elementary operations are atomic - this adds tons of code everywhere, from reference counting, to maintaining the size of dynamic arrays, to even utility functions like date-time operations.
Now, an unsafe interpreter isn't so tragic - you will need some synchronization for your own code anyway, so you could easily make it so only one thread adds to that array at a time. If your application happens to be bottlenecked on the GIL, and you're using a reference implementation, it might be worth it to try JRuby or IronPython.
Conclusion: For you, the GIL is bad, except if you're working on the interpreter or using the interpreter as documentation (CPython is surprisingly readable, MRI is probably the same). However, lots of applications don't bottleneck on the GIL, so you can often use the reference implementations - for the rest, there are alternative implementations without GILs.
P.S. Heavy edit thanks to User-11110385825575310654 :)
In theory, yes. JVM executes Java bytecode, and as long as you can transform the script language to proper bytecode (as javac does for Java) JVM can achieve similar performance as Java. However, in practice, this is not easy, especially for dynamic languages like Python. The first part of Charles Nutter's (author of JRuby) blog (http://blog.headius.com/2008/09/first-taste-of-invokedynamic.html?m=1) has a detailed description of this problem. The dynamic type has limited the optimizations JVM can do. In Jython (an implementation of Python on JVM) every runtime object has to be an instance of Py
In theory, yes. JVM executes Java bytecode, and as long as you can transform the script language to proper bytecode (as javac does for Java) JVM can achieve similar performance as Java. However, in practice, this is not easy, especially for dynamic languages like Python. The first part of Charles Nutter's (author of JRuby) blog (http://blog.headius.com/2008/09/first-taste-of-invokedynamic.html?m=1) has a detailed description of this problem. The dynamic type has limited the optimizations JVM can do. In Jython (an implementation of Python on JVM) every runtime object has to be an instance of PyObject or its subclass. A simple integer addition x + y needs to first check x and y's types, then unbox x and y from PyObject to int, add x and y, and lastly box the result to a PyObject. This is definitely slower than Java which only need to perform a single int x + int y. Runtime method binding is another big factor that slows down the JVM - neither reflection nor invokedynamic is good enough performancewise.
I had been worked on another Python implementation on JVM about two years ago called Zippy (https://bitbucket.org/ssllab/zippy), which uses Oracle lab's Truffle infrastructure. During that time, we compared performance of various Python implementations on selected benchmarks, and Jython was far away from getting Java-like performance. On benchmarks we used, Jython didn't even outperform CPython. I don't know the current status of Jython, but I think they still need lots of work to get better and better performance. In my opinion, JVM is a great platform to implement high level languages. It offers GC, multithread, and other runtimes almost for free. But to achieve a peak performance as Java on JVM, there still needs a lot of work.
Perhaps they have, outside of mainstream industry.
It's common when learning computing to 'bond with' a language. Somehow, the way it expresses ideas, and the way you think about solutions lines up so well it seems like an extension of your mind.
It becomes your go to language. It has perceived and real benefits over others. We won't mention the weaknesses; we never do.
Industry doesn't care a fig.
- Can I get lots of programmers?
- Are they cheap?
- Is the language robust?
- Does it have a support system of IDEs, test tools, books, tutorials?
- Is everyone else using it?
This creates an inertia around that whic
Perhaps they have, outside of mainstream industry.
It's common when learning computing to 'bond with' a language. Somehow, the way it expresses ideas, and the way you think about solutions lines up so well it seems like an extension of your mind.
It becomes your go to language. It has perceived and real benefits over others. We won't mention the weaknesses; we never do.
Industry doesn't care a fig.
- Can I get lots of programmers?
- Are they cheap?
- Is the language robust?
- Does it have a support system of IDEs, test tools, books, tutorials?
- Is everyone else using it?
This creates an inertia around that which is popular and 'good enough'.
It seems to change on a roughly 25 year cycle to me. In 1992, we wrote GUI apps n C, and C++ was 'the future'. The web wasn't invented, so we didn't care about web frameworks.
Maybe clojure and Scala will win, but currently, they don't seem to have that momentum behind them.
But the answer to your reasonable question is that industry trends towards lowest risk, not highest intellectual value.
The Python GIL does support multi-threading - hence the multi-threading library.
What the GIL does is ensure that each Python OPCODE is atomic - i.e. two OPCODEs can’t run at the same time.
If you have two threads there is NOTHING to stop the thread scheduler running an OPCode from one thread, and then an opcode from another thread - that is exactly what multi-threading on a single CPU does - each machine code instruction is atomic.
Some opcodes can be long running, which can occasionally make multi-threading feel less reactive than some would like.
I acknowledge that the GIL currently restricts a
The Python GIL does support multi-threading - hence the multi-threading library.
What the GIL does is ensure that each Python OPCODE is atomic - i.e. two OPCODEs can’t run at the same time.
If you have two threads there is NOTHING to stop the thread scheduler running an OPCode from one thread, and then an opcode from another thread - that is exactly what multi-threading on a single CPU does - each machine code instruction is atomic.
Some opcodes can be long running, which can occasionally make multi-threading feel less reactive than some would like.
I acknowledge that the GIL currently restricts all threads from the same process to run on the same core of a multi-core CPU, and this limits the effectiveness of some multi-threading patterns (for instance using a thread to ‘delegate’ CPU heavy work) but there are other patterns that do work effectively - for instance a producer thread, and then multiple I/O heavy consumer threads; and if you need to ‘delegate’ CPU heavy work, you could use a multi-processing architecture.
However the idea that the CPython GIL prevents multi-threading is a myth.
There are several, including Jython, IronPython, and PyPy-STM (Software Transactional Memory - PyPy documentation).
However it’s very likely that your question is misguided. The GIL is one of those issues that people will raise as an objection to Python without any serious foundation in real world software engineering and in ignorance of many ways to work at scale regardless of the global interpreter locking that’s done by (C)Python for each interpreter process.
If you implement a design which scales across multiple processes with the multiprocessing module, then your Python code can leverage th
There are several, including Jython, IronPython, and PyPy-STM (Software Transactional Memory - PyPy documentation).
However it’s very likely that your question is misguided. The GIL is one of those issues that people will raise as an objection to Python without any serious foundation in real world software engineering and in ignorance of many ways to work at scale regardless of the global interpreter locking that’s done by (C)Python for each interpreter process.
If you implement a design which scales across multiple processes with the multiprocessing module, then your Python code can leverage the fine grained locking which has been built into a modern operating system kernel.
If you use appropriate abstractions of your interprocess (and inter-thread) communications, then your design can be implemented to run across multiple nodes and scale across clusters.
These techniques go way beyond the gains to be had by eliminating the GIL.
Also if you’re using native binary modules such as those at the core of NumPy and some other SciPy components then some of those already work around the GIL (can already scale to multiple processors without GIL contention) and some of them are capable of using CUDA and other GPU interfaces and libraries to offload much of their vector processing away from the main CPUs (where the GIL is irrelevant).
So, the question becomes: why do you think you’d see some advantage from a version of Python without the GIL?
Hello,
the only progamlng language which doesn’t any work to be executed in informatic is… binary (procssor specific) code.
And you’ll need to ensure to access to the correct memory address where it lives to run it.
I even don’t speak about assembler, because it also requires “more work” to be understood by the processor.
I’m really speaking about an inintelligible (or like if) suit of considered as ‘1’ ( if power passes through) and ‘0’ ( if not) which is the only thing th processor can deal with…
By the way… you are completly wrong by thinking that OS can execute Ruby or Python directly…
Both are
Hello,
the only progamlng language which doesn’t any work to be executed in informatic is… binary (procssor specific) code.
And you’ll need to ensure to access to the correct memory address where it lives to run it.
I even don’t speak about assembler, because it also requires “more work” to be understood by the processor.
I’m really speaking about an inintelligible (or like if) suit of considered as ‘1’ ( if power passes through) and ‘0’ ( if not) which is the only thing th processor can deal with…
By the way… you are completly wrong by thinking that OS can execute Ruby or Python directly…
Both are interpreted languages just like PHP or others and require an interpreter to be executed
If you don’t install the correct interpreter, you have no way to execute them.
You may have th impression that OS execute code whitout anymore requirement ( when saved in a file whith the correct extension) thanks to “file association”, which allows the OS to kniw it has to launch the correct interpreter in order to execute the file…
C++ is a compiled language. That means that you’ll need to compile again every file having changed since the last time it has been compiled.
But the final result is a file containing (processeur specific) binary instructions ready to be used by the processor.
Java is in the middle on the way between compiled and interpreted languages :
The code you wrote is “compiled” to produce an “intermediary” code which can be … interpreted by th virtual machine.
Some people can say that it gives you the best of both worlds ( compilation Vs interprétation).….
I’m not sure et t doesn’t c'est ve you the worst of them 😉
At the moment I think two main reasons:
- Existing projects expect the GIL to be there, it would break compatibility to remove it.
- Investment. Python doesn’t have a massive corporate sponsor like Java or C#, it’s a lot of work, and someone is going to have to pay for it.
I have extensive experience with Python. GIL is not as big of a problem as you might think. Long before GIL starts showing its ugly head, you have garbage collectors slowing you down, writing files on disks and sending data over doesn't task the GIL. In the largest system I have been involved in it is the database that keeps slowing things down.
There are ways to see if you are hitting the GIL, you can google them. First make sure that is the thing slowing you down. Almost always, it is not the case.
As pointed out below, Jython exists. Unfortunately, it often lags the current version of Python considerably. It has gotten to 2.7, but only because Python itself got stuck at 2.7 for way too long. That puts it only 9 serious versions behind HEAD…
But the assertion of greater speed does not bear out. Jython can be a lot faster, for extremely simple things. But it is in general significantly slower, and the slowdown increases with complexity.
Python is already converted to its own notion of bytecode, skewed to make Python in particular execute quickly, and there are implementations that compile d
As pointed out below, Jython exists. Unfortunately, it often lags the current version of Python considerably. It has gotten to 2.7, but only because Python itself got stuck at 2.7 for way too long. That puts it only 9 serious versions behind HEAD…
But the assertion of greater speed does not bear out. Jython can be a lot faster, for extremely simple things. But it is in general significantly slower, and the slowdown increases with complexity.
Python is already converted to its own notion of bytecode, skewed to make Python in particular execute quickly, and there are implementations that compile down to native code. At the same time Python interfaces easily to C++, so when a standard library written in Python is too bad, it can be rewritten for speed and distributed on a by-platform basis. So the execution speed of individual instructions or library-function calls is not the reason Python is slow. It is slow because of the very-high-level design that comes with its origins as a scripting language.
If an integer variable carries enough extra information to know its type (rather than compiling in information about the type, because we know nothing about the type of anything at compile time), and you can hook an identifier lookup at three different levels (__get__, __getattr__, and __getattribute__), and the object creation process has four independent stages (the metaclass, the class __call__, __new__, and __init__) none of which can generally be skipped, and virtually everything is a hashmap including your list of globals and module contents (so calling dict() or list() is several times slower than creating {} or []), and polymorphism is accomplished by testing what constructor made you (often several times)…
You pay for all that. And it has nothing to do with the bytecode. You simply cannot meet all of the flexibility expectations laid upon you and compile Python into anything quick, because the design decisions are all made for straightforwardness and intuitive cohesion, not for speed.
What’s the difference between C++ and Java? Both are typesafe, object oriented languages (not strictly object oriented in both cases), allowing some parts of functional programming style (lambdas). The difference is their usual enrivonment: C++ is normally compiled into machine code, Java is normally compiled into bytecode (an intermediate code) first, and then, during runtime, compiled in several runs, optimizing more and more, into machine code.
In general, the language definition itself doesn’t specify this behaviour (the Java ecosystem does, but not the Java language): both languages, C++ a
What’s the difference between C++ and Java? Both are typesafe, object oriented languages (not strictly object oriented in both cases), allowing some parts of functional programming style (lambdas). The difference is their usual enrivonment: C++ is normally compiled into machine code, Java is normally compiled into bytecode (an intermediate code) first, and then, during runtime, compiled in several runs, optimizing more and more, into machine code.
In general, the language definition itself doesn’t specify this behaviour (the Java ecosystem does, but not the Java language): both languages, C++ and Java, can be interpreted, compiled or using an intermediate code later on just-in-time compiled into machine code.
The very same is true for Python and Ruby.
Also: “can be” doesn’t mean it is usually done: usually, C++ is compiled at build time to machine code and then executed, while Java is usually compiled into an intermediate code (bytecode in case of Java) that is then bit by bit compiled by the JIT (Just In Time) compiler into machine code.
Both languages have prominent ways of being executed differently: C++ can, as C#, be compiled into “common intermediate language”, which is similar to Java’s bytecode and will also be interpreted and just-in-time compiled then. And Java now has GraalVM which can create machine code and directly executable binaries during build time as well. GraalVM hosts a whole bunch of other languages as well, including Python and Ruby. And then there’s JShell, which could be thought of an interpreter as well (however, there’s bytecode etc. under the hood as well).
Of course you’ll need a C++ compiler and/or the JDK or GraalVM installed on your computer to have it running. However it’s not unusual for a Linux distribution to either have one on board or have it easily installed.
The very same thing is true for Python or Ruby: their interpreters have to be there as well, however it is even more usual to have it on board with a regular Linux distribution. But look at Windows: no Python, no Ruby, no Java, no C++ out of the box. And thinking about JRuby, Ruby can even run on the Java VM.
Note that with Java 21, writing shell script like programs (with an initial shebang line) is becoming even more easy. Right now with a custom option since it’s a preview feature, but now you don’t even need to define a class, access modifiers (public, private etc) or static:
File helloworld
:
- #!/usr/bin/env -S java --enable-preview --source 21
- void main() {
- greet();
- }
- void greet() {
- System.out.println("Hello world.");
- }
Make it executable with chmod +x helloworld
and run ./helloworld
:
- Hello world.
Where’s the difference to Python or Ruby now?
(Thanks to a colleague for this hello world example…)
The only languages directly executed by the operating system I can think of are the old basic interpreters on 8 bit home computers such as the Commodore 64: its basic interpreter sort of *was* the operating system, actually interpreting, generously seen, some kind of intermediate code as well: it didn’t read “p”, “r”, “i”, “n”, “t” when executing a “print” command in a program, instead it pre-interpreted (would go a step too far to call this compiling) these five letters (or any other basic command) into a token that was then stored. Saved space and brought speed, both of which essential to get anything done beyond a hello world program… (However, even on those machines other languages could be used, then only making use of half of the operating system that took care of I/O and those parts that disguised itself as an excuse for a file system…)
Note that there are (or where) a couple of very specialized pieces of hardware directly able (after a more or less simple interpreting step) to execute high level languages: there are processors capable of executing Java bytecode directly, and there are processors capable of executing Forth directly.
So, you're diving into Python, and you've hit upon the infamous Global Interpreter Lock, or GIL, in your studies or maybe interviews. It's a bit of a hot topic, especially when we talk about multithreading in Python. Let's break it down!
Understanding the GIL:
Alright, first things first - what's the Global Interpreter Lock, and why does it exist? Well, in Python, the GIL is like a traffic cop for your threads. It stands guard over your Python code and ensures only one thread executes Python bytecode at a time. Sounds a bit restrictive, right? But it's there for a reason - Python's memory manage
So, you're diving into Python, and you've hit upon the infamous Global Interpreter Lock, or GIL, in your studies or maybe interviews. It's a bit of a hot topic, especially when we talk about multithreading in Python. Let's break it down!
Understanding the GIL:
Alright, first things first - what's the Global Interpreter Lock, and why does it exist? Well, in Python, the GIL is like a traffic cop for your threads. It stands guard over your Python code and ensures only one thread executes Python bytecode at a time. Sounds a bit restrictive, right? But it's there for a reason - Python's memory management isn't thread-safe.
Now, here's the kicker - this lock can sometimes be a bottleneck, especially in CPU-bound tasks where multiple threads could be utilized more efficiently. But don't worry, we'll get to how we can dance around it!
Multithreading in Python:
So, when you're working with multithreading in Python, the GIL comes into play. It's like having multiple chefs in a kitchen, but only one of them can actively chop veggies at a time. Not exactly a parallel cooking party, right? Threads are awesome for I/O-bound tasks like web scraping or fetching data from a database, but for CPU-bound tasks, they can be a bit hamstrung by the GIL.
Working Around the GIL:
Now, let's get into the nitty-gritty of how we can work around this GIL issue.
1. Multiprocessing Module:
Python gives us the `multiprocessing` module, which is like threading but with separate processes. Each process gets its own Python interpreter and memory space, bypassing the GIL. It's a handy way to parallelize CPU-bound tasks.
from multiprocessing import Pool
def cpu_bound_task(x):
# Your CPU-intensive task here
return x*x
if __name__ == "__main__":
with Pool(processes=4) as pool:
result = pool.map(cpu_bound_task, range(10))
2. Asyncio for I/O-Bound Tasks:
When it comes to I/O-bound tasks, the `asyncio` module can be a game-changer. It allows you to write asynchronous code that doesn’t get blocked by the GIL, making efficient use of your CPU.
import asyncio
async def io_bound_task():
# Your I/O-bound task here
await asyncio.sleep(1)
return "Task complete"
async def main():
tasks = [io_bound_task() for _ in range(5)]
await asyncio.gather(*tasks)
asyncio.run(main())
3. Using External Libraries:
Some external libraries are designed to release the GIL during certain operations. For example, the `numpy` library releases the GIL during numerical operations, making it suitable for parallelizing certain tasks.
import numpy as np
def parallel_numpy_task():
array = np.random.rand(1000000)
result = np.sum(array)
return result
4. Cython and Extensions:
If you're up for a bit more advanced stuff, using Cython or writing extensions in languages like C can help. These languages can execute code without the GIL interference.
# Cython example
# my_module.pyx
def cython_cpu_bound_task():
# Your Cythonized CPU-bound task here
return result
from setuptools import setup
from Cython.Build import cythonize
setup(
ext_modules=cythonize("my_module.pyx")
)
5. Jython or IronPython:
If you're feeling adventurous, you could explore Jython (for Java) or IronPython (for .NET). These Python implementations run on different virtual machines that don’t have a GIL, offering parallelism without the GIL bottleneck.
So, there you have it! The GIL can be a bit of a hurdle, especially in CPU-bound scenarios, but Python provides various ways to navigate around it. Whether it's multiprocessing, asyncio, using external libraries, diving into Cython, or exploring alternative Python implementations, you've got options to make your code run more smoothly.
By
- baking the weak memory consistency models of networks into the language
- not sharing any memory except for message boxes
- making messages immutable
In ruby and python, you see a sequentially consistent threading model. In a language where everything might be shared via shared memory, that is not efficiently possible on modern hardware. So on many architectures you end up with a GIL. On the x86-64 family a GIL is not necessary to simulate sequential consistency, so there is actually a hypothetical lock-free implementation of python for x86-64 (the python programs would still use locks, but the pyt
By
- baking the weak memory consistency models of networks into the language
- not sharing any memory except for message boxes
- making messages immutable
In ruby and python, you see a sequentially consistent threading model. In a language where everything might be shared via shared memory, that is not efficiently possible on modern hardware. So on many architectures you end up with a GIL. On the x86-64 family a GIL is not necessary to simulate sequential consistency, so there is actually a hypothetical lock-free implementation of python for x86-64 (the python programs would still use locks, but the python runtime wouldn't). Such implementations might be somewhat faster than locking implementations, but without measurements I can not tell.
On D, C++, Java, C# and others, not everything can be shared; only those things that are annotated as shared (depending on the language, via words such as atomic, volatile) are shared between threads. Thus a global execution lock is also not necessary, and can be replaced by other means of synchronization which when only few accesses are shared, are considerably faster.
There are three things you need to understand for this question’s answer.
- Interpreted vs compiled languages
- Late vs Early binding
- Dependency Hell
Interpreted vs Compiled Languages
Python and Ruby are interpreted.
C and C++ are compiled
First let’s make one thing absolutely clear - Python and Ruby do not need a virtualenv tool anymore than C or C++ (C/++) do. So why is virtualenv an essential tool for the former but not the latter?
Lets look at the real difference. Python and Ruby need a runtime but C/++ executables are standalone.
This is because Ruby and Python are interpreted while C/++ is compiled.
There are three things you need to understand for this question’s answer.
- Interpreted vs compiled languages
- Late vs Early binding
- Dependency Hell
Interpreted vs Compiled Languages
Python and Ruby are interpreted.
C and C++ are compiled
First let’s make one thing absolutely clear - Python and Ruby do not need a virtualenv tool anymore than C or C++ (C/++) do. So why is virtualenv an essential tool for the former but not the latter?
Lets look at the real difference. Python and Ruby need a runtime but C/++ executables are standalone.
This is because Ruby and Python are interpreted while C/++ is compiled. Read more about this distinction here: Compiled language vs interpreted language?
Late vs Early Binding
Python and Ruby bind late.
C and C++ bind early by default.
Second, you need to understand that compiled languages generally use early binding and interpreted languages use late binding. This isn’t a strict rule, you can do late binding in C/++ but early binding is the default.
What does early and late binding actually mean?
Inside your computer ever function is has an address in memory. If you want to run that function you need to know what that address is. It is called the function’s reference.
With early binding the conpiler looks up the reference for that function when the executable is compiled. The function and the rest of the code is bound in at that early stage and cannot be changed.
With late binding that reference is not looked up until is is needed at runtime, much later.
Read more here: What is the difference between early binding and late binding in C++?
Dependency Hell
Late binding can lead to dependency hell. Virtualisation solves this problem.
The third thing to understand is hell created by lots of late bound dependencies. With modern code we use lots of shared libraries. If you and I both parse some XML then we might use a shared library.
Shared libraries get updated all the time. Sometimes those changes break existing code. You can read more here: A definitive guide to API-breaking changes in .NET
Let make up a pretend XML library that you and I use. It has a function called XMLPARSE that takes an XML string and returns a dictionary.
There are 2 versions of XMLPARSE and they are not compatible. The dictionary returned from version 1 is nothing like the dictionary returned from version 2.
If we used early binding, the default in C/++, then this isn’t a problem. The two versions have different references, so no problem.
With Ruby and Python the binding is late. If version 2 and 1 have the same name, then we can only have one of them available.
Lets imagine that you and I have a library up on GitHub. Mine uses Version 1 of XMLPARSE. Yours uses version 2.
Now somebody wants to use both libraries at the same time.
They install my code, which lists V1 of XMLPARSE as a dependency. So they install V1 and all works fine.
Then they install your code. They already have XMLPARSE so they think all is good. It fails and they realise it needs V2 so they upgrade. Now my code is broken. How can they fix my code without breaking yours?
This is a massive problem in any envrioment with shared libraries. It’s called dependency hell: Dependency hell - Wikipedia
This where virtual env comes to the the rescue. Virtualenv allows you to create virtual environment for code to run in.
This means that my library can be placed into one virtual environment alongside version 1 of XMLPARSE while yours is put in a separate virtual environment with version 2.
In conclusion, the actual answer
Virtualenv solves the problem of dependency hell. A problem that is created by late binding.
Ruby and Python are both late binding languages, so they have this problem. This makes virtualenv an indespensibe tool for deploying these languages.
C/++ is early bound by default so this is less of a problem but it still exists. For example, Windows is written mostly in C but it has late bound Dynamic Link Libraries (DLLs). Dependency hell was a massive problem, although it was generally called DLL Hell: DLL Hell - Wikipedia
If you look at the solutions section of that Wikipedia article you will find the same solution as the one provided by virtualenv: Application virtualization - Wikipedia
Python normally does not run on JVM.
The most used Python version (Cython) uses it’s own virtual machine, though you can run Jython (JVM native Python) or try to use GraalVM to run Python code.
The difference in performance lays in static vs dynamic nature of the languages in question. The types of objects you reference in your program (including ‘primitive’ types like ints or doubles) is known statically in Java (as well as object’s layout), but is inherently dynamic in Python. In latter case some of the work the corresponding compiler can do in compile-time is shifted to runtime.
Python is ofte
Python normally does not run on JVM.
The most used Python version (Cython) uses it’s own virtual machine, though you can run Jython (JVM native Python) or try to use GraalVM to run Python code.
The difference in performance lays in static vs dynamic nature of the languages in question. The types of objects you reference in your program (including ‘primitive’ types like ints or doubles) is known statically in Java (as well as object’s layout), but is inherently dynamic in Python. In latter case some of the work the corresponding compiler can do in compile-time is shifted to runtime.
Python is often used in scenarios where most of the heavy lifting (like numeric manipulations) is performed outside of the language itself in heavily optimised external libraries (like numpy). Even java can't always beat this kind of numerical performance (but it can use the same libraries).
BTW, Python running on GraalVM can be faster than “standard” Python when executing pure Python code (but somewhat lags on versions). But the price of calling external libraries is much worse.
“What is the difference between C++ and Java? Does C++ need an interpreter or compiler to be executed, or does it execute directly by the OS like Python and Ruby?”
There are two kinds of compiler, not just one.
A byte compiler parses the code, and for each piece of code a number is emitted.
A native compiler parses the code, and for each piece of code a piece of native code is emitted.
- C++ is compiled using a native compiler. The native code is then run directly on the operating system.
- Python (at least the CPython version) is byte compiled. The byte compiled code is then run on an interpreter.
- Java
“What is the difference between C++ and Java? Does C++ need an interpreter or compiler to be executed, or does it execute directly by the OS like Python and Ruby?”
There are two kinds of compiler, not just one.
A byte compiler parses the code, and for each piece of code a number is emitted.
A native compiler parses the code, and for each piece of code a piece of native code is emitted.
- C++ is compiled using a native compiler. The native code is then run directly on the operating system.
- Python (at least the CPython version) is byte compiled. The byte compiled code is then run on an interpreter.
- Java is byte compiled. It is then run on an interpreter. When the code is run a lot, it is native compiled, so it runs faster. This process of byte compilation followed by native compilation is called JIT (just in time compilation).
- Ruby used to be like Python, byte compiled, but now it is more like Java, JIT compiled
Which Python virtual machine? There is no standardized VM for Python.
CPython comes with its own implementation, but it is not documented for use by other languages. Languages have no reason or incentive to target it, and no guarantee that it won’t change dramatically between releases (breaking outside implementations). It’s intended as an internal reference, and was never meant as a target for other languages to use. It also lacks any compelling performance or technical advantages over the JVM or other widely available options. In short, it’s all downside with no upside.
The PyPy implementation
Which Python virtual machine? There is no standardized VM for Python.
CPython comes with its own implementation, but it is not documented for use by other languages. Languages have no reason or incentive to target it, and no guarantee that it won’t change dramatically between releases (breaking outside implementations). It’s intended as an internal reference, and was never meant as a target for other languages to use. It also lacks any compelling performance or technical advantages over the JVM or other widely available options. In short, it’s all downside with no upside.
The PyPy implementation of python uses a JIT/rpython stack that is documented and does have interesting performance characteristics. Consequently, it has had numerous languages built on it: Javascript, Scheme, Ruby, IO, Prolog, PHP, etc. But it’s a much younger and less mature project than the JVM, with far less corporate backing.
There are other Python implementations, many of which also use VMs that are targeted by a lot of languages (including Python implementations that use the JVM itself, or Javascript VMs, or the Microsoft/C# CLR)
Also remember: for quite a long time Java was bundled in most web browsers as a target client language: if you wanted to have your code run on the client-side, then the JVM and Javascript were your only major options. And Javascript was pretty low performance for a long time.
So a lot of languages targeted the JVM as one of two choices, and the only one with decent performance and server-side support (this was long before the emergence of Node and other options that really made Javascript viable on the server side).
It's actually pretty brilliant. Compiled languages are always going to be faster than interpreted languages. But they have the downside that you need to build a compiler for every target operating system, and Java began life as an embedded language to be run on devices. So they came up with this hybrid concept where you compiled your source into byte code, which was machine independent. The byteco
It's actually pretty brilliant. Compiled languages are always going to be faster than interpreted languages. But they have the downside that you need to build a compiler for every target operating system, and Java began life as an embedded language to be run on devices. So they came up with this hybrid concept where you compiled your source into byte code, which was machine independent. The bytecode was easily mapped to most machine languages so it was then very easy to build interpreters for any target platform. Of cou...
I've been in the internet technology space for close to twenty years, and it never occurred to me too question difference in terminology.
I suspect it has to do with two things - usage and marketing.
Traditionally, JavaScript was embedded in something. Initially, just in a browser. Then in the web application server with SSJS and Microsoft's ASP. Then in various applications, like Flash and so on. There was no standalone script processor for running generic scripts from, say, the command line. It may be that it was seen as something sitting alongside a rendering engine (terminology which predate
I've been in the internet technology space for close to twenty years, and it never occurred to me too question difference in terminology.
I suspect it has to do with two things - usage and marketing.
Traditionally, JavaScript was embedded in something. Initially, just in a browser. Then in the web application server with SSJS and Microsoft's ASP. Then in various applications, like Flash and so on. There was no standalone script processor for running generic scripts from, say, the command line. It may be that it was seen as something sitting alongside a rendering engine (terminology which predates JavaScript). So architecturally, you had a document rendering engine and a script execution engine sitting side by side in the browser.
On the other hand, Ruby and Python were built as standalone interpreters much like Perl or even the various command shells, which could be used for generic scripting, and could be invoked from the command line, although they were certainly more commonly invoked behind a web server. Only with something like node.js do we have something similar that can be invoked from the command line, and even that is more a full-fledged server application frameworks.
Secondly, as dynamic DOM manipulation, Ajax, and the other things that became HTML5 became more popular, JavaScript performance in the browser became more and more important. Moreover, as web browser rendering frameworks became more convergent, for example Safari, Chrome, and Opera, all use WebKit, JavaScript performance and it's execution model became a differentiating factor, so each manufacturer tried to use it to market their browsers, using names like Nitro and V8 to get your heart pumping.
Together, these two things have probably contributed to us calling one an engine and the other an interpreter.
A virtual machine is a virtual computing environment with a specific set of atomic well defined instructions that are supported independent of any specific language and it is generally thought of as a sandbox unto itself. The VM is analogous to an instruction set of a specific CPU and tends to work at a more fundamental level with very basic building blocks of such instructions (or byte codes) that are independent of the next. An instruction executes deterministically based only on the current state of the virtual machine and does not depend on information elsewhere in the instruction stream a
A virtual machine is a virtual computing environment with a specific set of atomic well defined instructions that are supported independent of any specific language and it is generally thought of as a sandbox unto itself. The VM is analogous to an instruction set of a specific CPU and tends to work at a more fundamental level with very basic building blocks of such instructions (or byte codes) that are independent of the next. An instruction executes deterministically based only on the current state of the virtual machine and does not depend on information elsewhere in the instruction stream at that point in time.
An interpreter on the other hand is more sophisticated in that it is tailored to parse a stream of some syntax that is of a specific language and of a specific grammer that must be decoded in the context of the surrounding tokens. You can't look at each byte or even each line in isolation and know exactly what to do next. The tokens in the language can't be taken in isolation like they can relative to the instructions (byte codes) of a VM.
A Java compiler converts Java language into a byte-code stream no different than a C compiler converts C Language programs into assembly code. An interpreter on the other hand doesn't really convert the program into any well defined intermediate form, it just takes the program actions as a matter of the process of interpreting the source.
Another test of the difference between a VM and an interpreter is whether you think of it as being language independent. What we know as the Java VM is not really Java specific. You could make a compiler from other languages that result in byte codes that can be run on the JVM. On the other hand, I don't think we would really think of "compiling" some other language other than Python into Python for interpretation by the Python interpreter.
Because of the sophistication of the interpretation process, this can be a relatively slow process....specifically parsing and identifying the language tokens, etc. and understanding the context of the source to be able to undertake the execution process within the interpreter. To help accelerate such interpreted languages, this is where we can define intermediate forms of pre-parsed, pre-tokenized source code that is more readily directly interpreted. This sort of binary form is still interpreted at execution time, it is just starting from a much less human readable form to improve performance. However, the logic executing that form is not a virtual machine, because those codes still can't be taken in isolation - the context of the surrounding tokens still matter, they are just now in a different more computer efficient form.
For some reason, people keep trying to picture Java as an arcane and dying language. This is specially true amongst non-Java developers.
I agree that java isn't a language designed in the 21st century and it has been stale for a while (until java 8 came out) but it has taken a considerable position on enterprise software, kind of like that of C in operating systems and C++ in games. So, there may be hundreds of languages on top of jvm and some may even produce apps with the same level of guarantees but java itself will not dye. Java is immortal now.
PS: I'm not a java developer and there are doz
For some reason, people keep trying to picture Java as an arcane and dying language. This is specially true amongst non-Java developers.
I agree that java isn't a language designed in the 21st century and it has been stale for a while (until java 8 came out) but it has taken a considerable position on enterprise software, kind of like that of C in operating systems and C++ in games. So, there may be hundreds of languages on top of jvm and some may even produce apps with the same level of guarantees but java itself will not dye. Java is immortal now.
PS: I'm not a java developer and there are dozens of languages I'd prefer working over java.. But I've got to respect the position java earned for itself.
Java and Python are at opposite ends of the spectrum on this issue.
- Java uses private for safety and security, You can mix code from two different sources, and library A can’t abuse the internals of library B. Well, actually, it can, but at least Java tries. Not that this has ever come up in real code anyway. Python, meanwhile, assumes programmers are consenting adults; a convention telling people which members are private and shouldn’t be messed with is good enough. Most other modern OO languages are in between: private is there to protect against accidentally abusing another class—which turns
Java and Python are at opposite ends of the spectrum on this issue.
- Java uses private for safety and security, You can mix code from two different sources, and library A can’t abuse the internals of library B. Well, actually, it can, but at least Java tries. Not that this has ever come up in real code anyway. Python, meanwhile, assumes programmers are consenting adults; a convention telling people which members are private and shouldn’t be messed with is good enough. Most other modern OO languages are in between: private is there to protect against accidentally abusing another class—which turns out to be helpful a lot more often than Python thinks—but if you really want to reinterpret_cast and pull the values out, you can.
- Java also uses private members, with getters and setters, to force people to write encapsulated classes. Writing set_x is supposed to be such a pain that you’ll think twice about whether your x should really be settable, because probably it shouldn’t. Of course nobody does think twice, especially since IDEs just do it for all your members automatically, so it’s just making everyone’s code more verbose and obfuscated for no good reason. Meanwhile, in Python, if you think your x should be settable, you just make it a public member. Most other modern OO languages are in between—they discourage public members, but aren’t dogmatic about it.
- Java’s getter/setter idiom also means that, even if you didn’t really need a setter in version 1.0 of your library, if you turn out to need one in version 1.1 (say, to validate the value), you already have a setter, so you don’t break the API. Python solves that with @property. Most other modern OO languages do the same. Java, every new version, the users suggest properties, and the architects refuse to add them, because it will encourage people to write exactly the kind of bad code everyone always writes.
- Java has a strong separation between interface and implementation, even having separate syntax and semantics for interfaces and classes. Python goes for flexibility—interfaces (ABCs), enums, etc., they’re all just classes, and you can hook classes together in almost any way you can imagine. Private members are a big part of implementation hiding, so Java needs them; Python doesn’t have that need. (Or, if you occasionally do, you do it manually; or, if you need to do a lot of it, you write a mixin class or metaclass that automates the manual stuff. But it rarely comes up.)
It’s worth noting that this whole thing is really much less of a big deal than people thought 20-odd years ago. Deep hierarchies of classes with protected methods all over the place are really important for GUIs, and complex simulations, and… just about nothing else. So, fortunately, the holy wars over whether private is necessary or sometimes useful or evil are pretty much dead.
And meanwhile, there are other, much more massive, differences between Python and Java classes. Java classes are as much about member layout in memory as about functionality; Python classes normally don’t even define members at all; you just set them as needed (often in __init__, but there’s nothing magic about that), and as for member layout, the object is just a namespace, like globals, with its members (usually) in a dict. Python classes make all lookup dynamic, rather than having virtual and override and final. Python has multiple inheritance (with a linearized resolution order), and uses it for all kinds of things; Java doesn’t allow it. Python has metaclasses and hooks all over the place to let you customize how classes and their instances get built; Java keeps as much of that consistent as it can rather than flexible. And so on. Compared to these differences, public/protected/private is nothing.
It Depends (tm).
The simple answer is that Python and Java have different design philosophies.
The more complete answer is either:
- Too complicated for you to understand, or
- To be found in your notes that you took in class. You did take notes, didn't you, to help you answer the assignment questions?
Actually, maybe both of the above apply to you.