How does a high level programming language (such as Java, C, C++) work??

How does a high level programming language (such as Java, C, C++) work? [How are instructions parsed? What does a compiler do? How does it work? How does the information get translated into bits and bytes that run on a chip in the end? How do these translated instructions get executed on the chip and how is the information compiled back and presented as console output to the user? etc.]
Answer:

Computers only understand binary. This is a really clunky way for humans to work. The origional computers were programmed by litterally toggling manual bits on the machine itself. Each generation of languages have sought to make programming easier and more powerfull. The way it works is with the programmer. They decide on which language will do the best job for what they are doing. There are some major catagories of these languages. Compiled vrs interpeted is a major catagory. Which you use depends on your needs. Compiled programs are actually compiled down into the binary codes that will run on your specific OS/Hardware. Interpited programs are compiled to byte code and then when they are run it is "interpeted" by a run time lib of some sort which converts the code to actual binary. Compiled programs obviously run faster as they skip the middle man. Interpited programs run on more than a single platform/OS. They also allow for changes on the fly and are more tolerant to upgrades in hardware/OS. Java, most forms of Basic, Python are some examples of interpited languages. C, Fortran, Assembler are some examples of compiled languages. So depending on your language choice you start to code. If you use a third level/functional language you will write out your basic code design and then compile it to byte or binary. If you are using an OOP language there is an extra step in this. Your classes need to be resolved and external dependancies also need to be resolved. Non OOP languages often have dependancies as well but the overhead for them is considerably less. OOP (Object oriented programming) has the advantage of reusing a great deal of code. It's disadvantage is more complex code and slower compile times. The OOP portions are generally either external references or broken down into literal code removing all OOP features from the compiled output. If it is an interpited language it's left as a stub to an external reference. Any other language it's included into the code, depending on the optimization methods generally it's writtin literally into the code on every call. Lets talk about compiling now. Interpited languages compile down into byte code. That means that they create short cut notation for calls to inbuild functions with any your coding tucked in. When run, this then expands into the dynamic functions with your code describing what to do with these functions. In a compiled language the code is first compiled into objects. This is a totally different term from what is an object in code. Most modern compilors hide the obj files from users so you likely will not see or use one of these files today. In the past they were very usefull files at times. Objects are very similer to byte code in nature. They are shortcut notation for what you want to do with the code. These objs are OS and version specific. An Obj from a Windows machine will not work on Linux and vice versa. They will contain references to functions that do not exist on the other platforms or have different calling parms and names for example. Other than differences like this there are other distinctions. For example the way memory is addresed and handled is differently in different OS's. The OBJ files have to know and use the specific mem model for that platform. The next step is to reference the code. That is you address the functions, resolve dependancies between objects, wrap all your objects into the executable/lib you will be compiling and get it ready for conversion to assembler. The assembler takes over from there. It will convert your instructions into Assembler macro instructions. Each computer contains a set of basic instructions that make a comptuer a computer. These calls are the foundations of any language. There are two main approaches to this. One is a richer set of assembler instuctions which is to say that many functions are already built into the bios. The other is to keep it simple and only include the bare necessities thus making the assembler build include that functionality. RISC is the name of the keep it simple approach. Motorla for example is a RISC processor. Intel chips are not. The assembler will take your object code and convert it to Assembler code. These .asm files are stripped of human labels, comments and are the next closest thing to machine language. In the old days people would hand optimize this code for better performance. I know of nobody that does that today except for a few very specialized tasks. The assembler will next take the code and convert it into machine language. Binary is not machine language, it is only the format machine language is stored in. If they built chips which understood trinary the machine language would still be the same but the format would be different. What machine language is, it is a series of op codes which relate to specific bios calls and it is the data to be used with these calls. Mostly intererupt calls. It is formated to fit that processors memory, register and BIOS. So machine language for an Intel 32 bit machine is different than machine language for an Intel 64 bit or a Sun or a Motorola chip. The results of this are saved to a file. It is marked as an executable by the OS. In widows an executable is any file with a certain extension. In Linux an executable is a file with a certain permission set. Unless told otherwise by the OS the Chp will treat any file it encounters as data and will expect another running program to tell it what to do with this data. Now a user executes this program. The first thing that happens depends on whether it is interpited or compiled. If it is a compled executable then the program loads it's functions, allocates memory, and prepares to run. The low level technical details of use involve the use of registers and bios calls. Certain registers are for data, indexing and for function parms. The stack is where your program lives and how the CPU knows what to do next. Think of the stack like a stack of plates. It is FIFO, that is first in first out. The stack holds both registry contents and executable instructions. Executing Assembler instructions frequently means saving contents of certain registers and then restoring them later on. For the execution of a program though think of the registry as a pile of instructions. The CPU takes one of the stack and does what it says to do. Then it takes the next one off the stack. What it does depends on the instruction it pulls off the stack and the contents of certain registers. An int 13 decimal will do one thing if the AX register has one value and a totally different thing if another value is in the AX for example. (I'm a little fuzzy, been a while since I did Asm programming. Pretty sure it's the AX which contains the minor part of an asm op code. Too tired to double check right now). The program will continue to do this until it is interupted. In multi-tasking OS's the stack is controled by the OS. If an app wrests control from the OS then the OS will crash. Another frequent cause is if an OS does not protect it's memory very well (like Microsoft products) it can be corrupted intentionally or unintentionally by other programs This will lead to garbage or malicious instructions getting onto the stack. What you see on the console is what the programs running tell the console to display. Most modern graphics cards also have a CPU onboard. Some the CPU is better than the core CPU of computers a decade ago. All devices on a computer are a port to the CPU. This can be a virtual device or a real device. Each will have an address and an interupt request Que. The IRQ is to allow the device to talk back to your computer. Since the architecture for some reason limits IRQs, they are frequently shared. A common problem with shared IRQs is the OS will get confused as to which device data from a shared IRQ is coming from. This is called an IRQ conflict. The port is to the computer a sort of chute. It throws stuff through it and will grab stuff back when the IRQ bell rings. So to write to a hard drive for example the CPU will enact special routines which first describe the data to the hard drive then throw the data into the hard drive chute. The CPU will then check back later to see if there is an error but will often continue to shovel data before checking. The hard drive itself is expected to know how to write that data to the disk and check for corruption and such. When it comes to video most video cards have drivers that allow a CPU to send shorthand to the video card. This means that nowdays they can offload much of the processing that once happened in the CPU to the vid card as well as give functionality that CPUs never could do. So the CPU shovels instructions and raw data at the vid card for interpitation. The vid card then translates this into what you see by sending specific signals to the monitor. Some of these are themselves short cuts which the vid card assumes all monitors know how to handle. In modern OS's the monitor is detected and what is sent to the vid card is modified by the monitor type. If the monitor is of an unknown it will send a generic set of instructions which all monitors are supposed to know. In this way having an off brand computer can slightly slow video displays. To save console display, the CPU reads the array of data the vid card uses for display. Some applications actually manipulate this array though it's uncommon today. I am sure due to how late it is, how long it's been since I did any significant Asm programming that I have some technical errs in this. The high level view is solid. The errs will be more of where something happens than what. I apologize for any errs. To learn more about this learn how to program in Asm. It is as close to machine language as most people get. Most people don't want to get that close :) As for what languages are compiled vrs interpited. If you are runnng a windows executable it is most likely interpited. VC for example claims to be compiled but requires run time libs which really make it an interpited language but without the advantages most interpited languages have. All server side languages are not only interpited but interpited multiple times. A PHP script for example is first passed to the web server, then to the PHP module then it is passed back to the web server and then to your browser which finally inteerpits it and passes it on to your CPU. Cold Fusion, asp, .net and so on go through the same process. Only client side scripts avoid this by runnng on your computer.

sasidhar... at Yahoo! Answers Visit the source

Was this solution helpful to you?

Other answers

There are lot many subjects which one studies over 4 years Just to understand all the above u have mentioned. In Brief. A high level language is Humane Readable. This is converted to assembly language (Bits and Bytes representation in English. E.g ADD 2,4. ). This language is specific to a Processor (Intel X86 Assembly language differs from SPARC or Alpha M/c language) Assembly language is converted into Machine language (Bits and Bytes).

prad i

Too many questions in one thread. Break em up please.

Northwoods Thaw

The topic of your question is covered in college courses, where a significant portion of the course involves designing and building a compiler. A compiler (as of the mid 80s when I took the course) is made up of 3 basic parts. 1. The scanner. The scanner reads each character in program and groups the characters together into "tokens." Tokens will be reserved words in the language, such as "if", "for", "class", etc. It also groups symbols, such as "<", "<=", "==", etc. Most characters left over are usually comments and identifiers, such as class names, method names and variables. Scanners define their tokens using regular expressions, which can be converted into "finite automatans" which can sort them out. Using a human analogy, the scanner is like a person seeing a bunch of letters and organizing them into words. 2. The parser. Parsers take the tokens and organizes them using a grammar. Most modern computer languages are defined using a "context free grammar" also called a BNF grammar. It makes sure that if you start a block with a { that it ends with a }. It groups for loops, if statements, switch statements, etc. The parser will determine of your program is consistent with the language. If it isn't, the parser will probably generate a syntax error. Back to the human analogy, when people look at sentences, there's a grammar to it. Such as N-V-N, N-LV-N, N-V-N-N, etc. By parsing the sentence people can tell if the sentence follows the rules of the language or if the words are just random. 3. The semantic analyser. The semantic analyser takes the parsed program and puts meaning to it. It is at this step, that semantic analyser generates the object code. This is the code that really executes on your computer. It could be machine code, or higher level assembly like code, such as the Java Virtual Machine uses. Back to humans, we put meaning to the sentence and decide what the words mean and the message being conveyed by the sentence. The compiler kind does all of this in parallel. As the scanner determines a token, it passes it to the parser. As the parser completes a section of the grammar, it passes it to the semantic analyser. It does this until all characters are read. People do the same as the read a sentence. We see the letters, form them into words, parse the sentence a word at a time and construct the meaning as we read the sentence. After the compiler does all of this, it still doesn't mean your program is going to do what you want it to do; however, if will do exactly what you told it to do, even if that's not what you really meant.

MarleyTheCat

The languages provides you some built-in things like keywords and functions that perform some predefined tasks. You develop a program using this prebuilts. This prebuilts are defined by the language developer in the computer language bits and bytes. Whenever you run a program it is compiled (in case of C,C++) or interpreted then compiled (in case of Java). This step makes sure of no syntatical mistakes in the program and produces a compiled code (.obj file (C,C++) or byte code (Java)). This compiled code is now executed i.e., instructions are performed as said. This process of compilation is done by compiler. The compiler is a device, which translates high-level language to machine-level and vice-versa. These instructions are given to the processor chips using ssome command words. The processor chips are also programmed. Lets first see how a chip works. Every processor has a assembly language associated with it. This language commands are like writing words instead of series of bits or bytes. The command words are the series of bits whose each bit or group of bits defines a specific function. The processor recieves the command word reads each bit and produces the desired output in the form of series of bits and sends to the questioning device. Actually this processor chip works with the voltages; 0 bit representing lower volt or zero volts and 1-bit representing higher voltage or any positive voltage (depends on the chip). On recieving the result from the chip, the compiler converts it into the desired form and gives the output on the screen. I tried to explain with my full potential, hope it will clear your doubts.

gulabo

There is a lot of theory behind that.. In order to understand all concepts you will have to understand DIGITAL systems, Microcontroller Electronic Applications, Electical Electronic Circuits, Amplifiers and Transistors. In short - the answer to your question is way too lengthy and cannot be explained in one sitting!!! All i can explain to you is: The commands you type in C/C++ programming gets converted to computer language, digital bits '0' and '1'. The computer buffer receives these signals and are processed, the input is fed and an output signal is generated in the digital form i.e. '0' and '1'. This is again converted to user readable form. This is the actual process. The processing involved inside the CPU is making use of various transistors, gates that add, subtract, multiply, divide the bits. Multiplexers, Demultiplexers are used. There are several other devices involved. You don't have to worry about it now, You will get ur chance once you are into Engineering Studies.

Night Wolf

Related Q & A:

How to translate high level entity relationship into schema?Best solution by Database Administrators
What is a "modern" programming language?Best solution by Quora
Can you suggest me a Composition book for O level English Language?Best solution by Yahoo! Answers
Is java or visual basic a machine level, low level, high level or binary level programming language?Best solution by Quora
What is the different between low level language and high level language in a computer programing?

Just Added Q & A:

How many active mobile subscribers are there in China?Best solution by Quora
How to find the right vacation?Best solution by bookit.com
How To Make Your Own Primer?Best solution by thekrazycouponlady.com
How do you get the domain & range?Best solution by ChaCha
How do you open pop up blockers?Best solution by Yahoo! Answers

For every problem there is a solution! Proved by Solucija.

Got an issue and looking for advice?
Ask Solucija to search every corner of the Web for help.
Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.