How to write a parser in C?

How do I write a Parser in C#?

  • How do I go about writing a Parser (Recursive Descent?) in C#? For now I just want a simple parser that parser arithmetic expressions (and reads variables?). Though later I intend to write an xml and html parser (for learning purposes). I am doing this because of the wide range of stuff in which parsers are useful. Web development. Programming Language Interpreters. Inhouse Tools. Gaming Engines. Map and Tile Editors etc. So what is the basic theory of writing parsers? And how do I implement one in C#? Is C# the right language for parsers (I once wrote a simple arithmetic parser in C++ and it was efficient, will JIT compilation prove equally good?). Any helpful resources and articles. And best of all code examples (or links to code examples). Note: Out of curiosity, has anyone answering this question ever implemented a parser in C#?

  • Answer:

    I have implemented several parsers in C# - hand-written and tool generated. A very good introductory tutorial on parsing in general is http://compilers.iecc.com/crenshaw/ - it demonstrates how to build a recursive descent parser; and the concepts are easily translated from his language (I think it was Pascal) to C# for any competent developer. This will teach you how a recursive descent parser works, but it is completely impractical to write a full programming language parser by hand. You should look into some tools to generate the code for you - if you are determined to write a http://en.wikipedia.org/wiki/Recursive_descent_parser (http://www.codeproject.com/KB/recipes/TinyPG.aspx, http://ssw.jku.at/coco/, http://irony.codeplex.com/). Keep in mind that there are other ways to write parsers now, that usually perform better - and have easier definitions (e.g. http://en.wikipedia.org/wiki/Pratt_parser or http://sandersn.com/blog//index.php/2009/06/27/monadic_parsing_in_python). On the topic of whether C# is up for the task - C# has some of the best text libraries out there. A lot of the parsers today (in other languages) have an obscene amount of code to deal with Unicode etc. I won't comment too much on JITted code because it can get quite religeous - however you should be just fine. http://ironjs.wordpress.com/ is a good example of a parser/runtime on the CLR (even though its written in F#) and its performance is just shy of Google V8. Side Note: Markup parsers are completely different beasts when compared to language parsers - they are, in the majority of the cases, written by hand - and at the scanner/parser level very simple; they are not usually recursive descent - and especially in the case of XML it is better if you don't write a recursive descent parser (to avoid stack overflows, and because a 'flat' parser can be used in SAX/push mode).

ApprenticeHacker at Stack Overflow Visit the source

Was this solution helpful to you?

Other answers

Well... where to start with this one.... First off, writing a parser, well that's a very broad statement especially with the question your asking. Your opening statement was that you wanted a simple arithmatic "parser" , well technically that's not a parser, it's a lexical analyzer, similar to what you may use for creating a new language. ( http://en.wikipedia.org/wiki/Lexical_analysis ) I understand however exactly where the confusion of them being the same thing may come from. It's important to note, that Lexical analysis is ALSO what you'll want to understand if your going to write language/script parsers too, this is strictly not parsing because you are interpreting the instructions as opposed to making use of them. Back to the parsing question.... This is what you'll be doing if your taking a rigidly defined file structure to extract information from it. In general you really don't have to write a parser for XML / HTML, beacuse there are already a ton of them around, and more so if your parsing XML produced by the .NET run time, then you don't even need to parse, you just need to "serialise" and "de-serialise". In the interests of learning however, parsing XML (Or anything similar like html) is very straight forward in most cases. if we start with the following XML: <movies> <movie id="1"> <name>Tron</name> </movie> <movie id="2"> <name>Tron Legacy</name> </movie> <movies> we can load the data into an XElement as follows: XElement myXML = XElement.Load("mymovies.xml"); you can then get at the 'movies' root element using 'myXML.Root' MOre interesting however, you can use Linq easily to get the nested tags: var myElements = from p in myXML.Root.Elements("movie") select p; Will give you a var of XElements each containing one '...' which you can get at using somthing like: foreach(var v in myElements) { Console.WriteLine(string.Format("ID {0} = {1}",(int)v.Attributes["id"],(string)v.Element("movie")); } For anything else other than XML like data structures, then I'm afraid your going to have to start learning the art of regular expressions, a tool like "Regular Expression Coach" will help you imensly ( http://weitz.de/regex-coach/ ) or one of the more uptodate similar tools. You'll also need to become familiar with the .NET regular expression objects, ( http://www.codeproject.com/KB/dotnet/regextutorial.aspx ) should give you a good head start. Once you know how your reg-ex stuff works then in most cases it's a simple case case of reading in the files one line at a time and making sense of them using which ever method you feel comfortable with. A good free source of file formats for almost anything you can imagine can be found at ( http://www.wotsit.org/ )

shawty

C# is almost a decent functional language, so it is not such a big deal to implement something like Parsec in it. Here is one of the examples of how to do it: http://jparsec.codehaus.org/NParsec+Tutorial It is also possible to implement a combinator-based http://pdos.csail.mit.edu/~baford/packrat/thesis/, in a very similar way, but this time keeping a global parsing state somewhere instead of doing a pure functional stuff. In my (very basic and ad hoc) implementation it was reasonably fast, but of course a code generator like http://code.google.com/p/peg-sharp/ must perform better.

SK-logic

I know that I am a little late, but I just published a parser/grammar/AST generator library named Ve Parser. you can find it at http://veparser.codeplex.com or add to your project by typing 'Install-Package veparser' in Package Manager Console. This library is kind of Recursive Descent Parser that is intended to be easy to use and flexible. As its source is available to you, you can learn from its source codes. I hope it helps.

Sam

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.