Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
Perl Programming

Perl 6 Grammars and Regular Expressions 202

An anonymous reader writes "Perl 6 is finally coming within reach. This article gives you a tour of the grammars and regular expressions of the Perl 6 language, comparing them with the currently available Parse::RecDescent module for Perl 5. Find out what will be new with Perl 6 regular expressions and how to make use of the new, powerful incarnation of the Perl scripting language."
This discussion has been archived. No new comments can be posted.

Perl 6 Grammars and Regular Expressions

Comments Filter:
  • by Zorilla ( 791636 ) on Monday November 08, 2004 @01:23PM (#10755961)
    HXGF*&#$()#P*&ULJKDFHV)(&*#$utrhk:jlhdsf(p*&#$OJDF >KLJDFP)(*$#&pyu:

    Crap, I think I just accidentally programmed a web browser in Perl
    • by b12arr0 ( 3064 ) * on Monday November 08, 2004 @01:27PM (#10756017) Homepage
      Uh, that's not a web browser, it's clearly a web server.
    • Re:Perl goodness (Score:3, Insightful)

      by Black Perl ( 12686 )
      Just so people know, Perl gets its reputation for being line noise largely from its early adoption of regular expressions. For example:
      But now this syntax has made it into just about every other language. And so now you can accidentally program a web browser in any language.
      • Re:Perl goodness (Score:5, Insightful)

        by jandrese ( 485 ) * <kensama@vt.edu> on Monday November 08, 2004 @03:10PM (#10757171) Homepage Journal
        There are two things about regular expressions:
        1. Perl chose a keystroke-efficent syntax that makes them unreadable to anybody who doesn't know how to read them. It also made them very compact and easy to write for anybody who does know how to read them. They look very intimidating, but underneath they are usually easier to understand than the C like perl code surrounding it.
        2. They are amazingly useful. Seriously, if you have never learned about Regular Expressions you owe yourself a lesson in how they work and what they do. I've seen people spend days working on stuff that can be written (more efficently!) in a regular expression in a matter of minutes. Pattern matching is the sort of thing that every general purpose language should have, it is a shame that the basic Regular Expression libraries that comes with most Unixes is such a piece of crap. Who wants to deal with the arcance invocation method, the extremely limited syntax, or the syntatic sugar like: "[[:digit:]]{2}:[[:space:]][[:space:]]*[[:alpha:]] *" when you could write "\d{2}:\s+\w*"?
        • Re:Perl goodness (Score:4, Insightful)

          by ajs ( 35943 ) <ajs@ajsBOYSEN.com minus berry> on Monday November 08, 2004 @05:34PM (#10759420) Homepage Journal
          Perl chose a keystroke-efficent syntax that makes [regular expresssions] unreadable

          No, it most certainly did not. Regular expressions as they exist in Perl today are a direct descendant of POSIX regular expressions which derive from the original work done by Ken Thompson (which resulted in the grep program, which stands for "global regular expression print"). That syntax further dates back to the giants in the field of computational theory, and was specialized only slightly for text matching.

          grep, awk, sed, ed, vi, emacs, and dozens of other programs and languages for Unix used this notation before Perl came along and adopted it, so let's not pretend that this syntax is somehow Perl's doing.

          The extended regular expression syntax of today IS perl's doing and in almost all cases it has been a process of making regular expressions both more powerful and more readable, culminating in Perl6's rule syntax which is highly readable by comparison.
        • I'm sure you're right. In pursuit of a similar time saving conciseness I'm going to start dropping all my variable names down to a single letter.

      • Re:Perl goodness (Score:3, Insightful)

        A lot of people don't seem to know that you don't have to use slashes as your delimiters. I use curly braces, myself, which would make the example you gave a little clearer:
        And of course, you could always use the x modifier (i.e., s{}{}x) to split the regular expression across multiple lines and document it.
      • Re:Perl goodness (Score:3, Insightful)

        by bedessen ( 411686 )
        When you see a regular expression like that it's a good indicator that the person that wrote it wasn't very familiar with how to write good REs. The above suffers from "leaning toothpick syndrome." If you are trying to match the '/' character, then don't use it as the delimiter of the RE. For example, compare the following REs, which are equivalent:




        Using ',' for the delimiter of the RE means you don't have to backslash-quote the forward slash to use it in a match.


    • by warrax_666 ( 144623 ) on Monday November 08, 2004 @02:52PM (#10756993)
      ... so when I need a webserver, I just

      $ cat /dev/urandom | perl

      It usually works in 3 tries or less.
    • Re:Perl goodness (Score:5, Interesting)

      by Anonymous Coward on Monday November 08, 2004 @03:22PM (#10757319)
      A web browser? That's:
      perl -MHTML::Strip -MIO::All -e 'print HTML::Strip->new->parse(io($ARGV[0])->scalar )'
      A web server? That's:
      perl -MIO::All -e 'io(":8080")->fork->accept->(sub { $_[0] < io(-x $1 ? "./$1 |" : $1) if /^GET \/(.*) / })'
  • Grammar (Score:4, Insightful)

    by dprust ( 316840 ) * on Monday November 08, 2004 @01:23PM (#10755965)
    It is good to see PERL focussing on what makes it great. There is no other language, IMHO, that handles text input as well as PERL does. Adding this level of processing just makes it even more powerful.
    • Re:Grammar (Score:2, Funny)

      by Anonymous Coward
      Perl is made great by its ability to provoke raging flame wars over ephemeral points:

      It's "perl", you strongly-typed, weak-minded illiterate; not "PERL". Everyone knows that as of Sep. 17, 1999, the perl community decided arbitrarily (and overnight) that it isn't an acronym anymore.
    • Yeah, now that I've RTFA, I realize just how cool these advances are. They've basically taken some of LISP and built it into Perl, but added a few extensions and predefined strings on top of it. Besides looking MUCH cleaner and being MUCH easier to read/maintain, it should be much more powerful for programmers that know LISP.
  • hrm... (Score:5, Funny)

    by Anonymous Coward on Monday November 08, 2004 @01:24PM (#10755973)
    "...zztop-wants-a-perl-necklace dept."

    i do not think that means what you think it means.
  • Big problem (Score:2, Interesting)

    by Smallpond ( 221300 )
    Perl 6 will support Perl 5 regular expressions by using the :p5 modifier.

    Meaning that it is not backward compatible without modifying your source code.

    Note to those who are going to respond "Just install both!": look at the first line of your perl scripts.
    • Re:Big problem (Score:3, Informative)

      by WWWWolf ( 2428 )

      The idea of :p5 is not just that you can take Perl 5 code and modify it to make it work.

      The idea is that if you don't bother to write a zillion-rule grammar to match whatever you're trying to match, you can still use the P5-style regular expressions you know and love. It's another case of Not Swatting A Fly With The Nuke.

    • Re:Big problem (Score:5, Informative)

      by Speare ( 84249 ) on Monday November 08, 2004 @01:37PM (#10756138) Homepage Journal
      Um, ALL PERL CODE IS TREATED AS PERL5 CODE unless you use a specific Perl 6 keyword in your script. Perl 6 interpreters will not require you modify your scripts AT ALL to use Perl 5 scripts.

      Therefore, it's just Perl 6 scripts which want to use Perl 5 regular expression syntax, which would want to use the :p5 modifier.

      Don't get your knickers in a bunch.

    • Re:Big problem (Score:5, Informative)

      by Zaak ( 46001 ) on Monday November 08, 2004 @02:06PM (#10756432) Homepage
      Meaning that it is not backward compatible without modifying your source code.

      Thus spake Larry Wall in Apocalypse 5:
      ...we took several large steps in Perl 5 to enhance regex capabilities. We took one large step forwards with the /x option, which allowed whitespace between regex tokens. But we also took several large steps sideways with the (?...) extension syntax. I call them steps sideways, but they were simultaneously steps forward in terms of functionality and steps backwards in terms of readability. At the time, I rationalized it all in the name of backward compatibility, and perhaps that approach was correct for that time and place. It's not correct now, since the Perl 6 approach is to break everything that needs breaking all at once.

      And unfortunately, there's a lot of regex culture that needs breaking.

      And from Apocalypse 1:
      It would be rather bad to suddenly give working code a brand new set of semantics. The answer, I believe, is that it has to be impossible by definition to accidentally feed Perl 5 code to Perl 6. That is, Perl 6 must assume it is being fed Perl 5 code until it knows otherwise.

      In other words, it is backwards compatible, it isn't backwards compatible, and when you install Perl 6, you are installing both.

    • The first line of most of my Perl programs is


      I admit that's an ancient version of Perl, but unfortunately that's what I'm stuck with here. At home it might say perl5.8.5 or so.

      I realized a long time ago [perl.org] that I'd better have every program I wrote tied to specific installation of a specific version of Perl, to avoid problems in installing future versions or new modules. Has nothing to do with Perl 6; it's just good configuration management. I can at any time install another

    • Re:Big problem (Score:4, Informative)

      by ajs ( 35943 ) <ajs@ajsBOYSEN.com minus berry> on Monday November 08, 2004 @05:22PM (#10759240) Homepage Journal
      As others have pointed out, Perl 6 interpreters (at least the default one that is Parrot-based) will hand your code off to Ponie [poniecode.org] or something like it by default. You will have to start your program with the module keyword or the use 6 statement to force Perl 6 behavior, or use a special binary (e.g. something like /usr/bin/perl6).

      The :p5 modifier is not there for backward compatibility so much as to allow the programmer to choose the model of regular expression to use. There are trade-offs. Here are two Perl 5 regular expressions:
      which are written in Perl 6:
      m{^[\w+\d|\S+[\'s]?]$ }
      Note that Perl 5 syntax is actually a bit nicer for the first one, so you can continue to use Perl 5 syntax there. In the second case, the new bracket-operator is very handy for enclosing sub-expressions that don't have to be remembered in the positional variables (the same as the Perl 5 (?:...) operator). You can even mix them:
      $r1 = rx:p5{[a-z][A-Z]+};
      $r2 = rx{[\w+\d|\S+[\'s]?]};
      $r3 = rx{^[<$r1>|<$r2>]$};
      Perl 6 is about making the things that you're going to need to do the most often much easier and much more supportable in very large projects. Relax and enjoy it, it's going to be a great ride.
  • by winkydink ( 650484 ) * <sv.dude@gmail.com> on Monday November 08, 2004 @01:36PM (#10756117) Homepage Journal
    What does Perl6 offer a satisfied Perl5 user? Is it faster? Smaller?

    To this user, the last several releases (5.x) have looked more like opportunities for continuing royalty streams for perl authors (new versions of old books) than significant releases.

    • by Speare ( 84249 ) on Monday November 08, 2004 @01:41PM (#10756180) Homepage Journal
      From what I've seen, it's more amenable to modular libraries and structured design. As for basic scripting where you may not even use a "package" statement, you probably won't care.
    • See

      http://it.slashdot.org/comments.pl?sid=128918&ci d= 10756138
    • by WWWWolf ( 2428 ) <wwwwolf@iki.fi> on Monday November 08, 2004 @01:55PM (#10756339) Homepage

      Yeah, Perl 5 hasn't changed that much over time. But it has been around for a while. Perl 6 is just different.

      From what I have seen from the announcements, the Perl 6 syntax looks far cleaner, probably more consistent and less ugly. Some of the new tricks look genuinely handy. For example, if it seems like type checking would be a good idea, you can have it if you want it, even on compile time!

      Especially the regular expressions side seems pretty interesting, as noted in this article. Regular expressions have always been a poor but effective replacement for grammar-based parsing, and now finally Perl is going to have both integrated. There's probably going to be less whining about line noise.

      And then there's something that I find especially interesting, though it hasn't been explained in detail yet: Complete tuning of the object system. In case you haven't noticed, Perl 5's object system is a complete and utter mess that looks and smells like it has been added as an afterthought, and rest assured it's going to be changed radically for better in Perl 6. I'm definitely waiting eagerly to see what Perl 6's take is going to look like - I sure hope it's something like Ruby, only it smells like a camel =)

      • In case you haven't noticed, Perl 5's object system is a complete and utter mess that looks and smells like it has been added as an afterthought

        If you even consider it an object system; I use it daily and I'm still skeptical about calling it object oriented programming. Reminds me really of ADT with C with some new 'features' added to make it slightly easier. Not that I don't like it, I still find it very useful but....

    • You left out the other two possible improvements.

      Programs/Languages can be:



      More powerfull (more available features)

      Simpler to use/understand

      From reading this article, it mainly focuses on the last two that you did not mention. Perl 6 is trying to be a little bit more powerfull and a little bit easier to use /understand.

      • If I am a current, satisfied user, the last two are less important than the first two, as current & satisfied infers that I have all the features I need and already know how to use/understand.
    • You have to divide further. Let me illustrate:

      Reasons to convert to Ponie (Perl 6 on Parrot):
      • Access to code written in other high-level languages without glue code.
      • Just in time compilation to machine code (no interpretation unless you eval a string at run-time!)
      • Cleaner access to C and C++ libraries without glue code.

      Reasons to convert from Ponie to Perl 6:

      • Vastly superior OO model, especially when trying to interface to multiple large object trees.
      • Debuggability improvements throughout the l
    • > What does Perl6 offer a satisfied Perl5 user? Is it faster? Smaller?

      It features better support for key paradigms, including object-oriented
      programming (finally, a real object model), functional programming (we're
      getting continuations), and even some improvements for contextual programming.
      In other words, Perl6 will be a substitute not just for Perl5 but also for
      Scheme and Smalltalk.

      Also, the whole Parrot thingydoo is going to allow software written in one
      language to seamlessly use libraries written i
  • aw hell (Score:5, Funny)

    by The Unabageler ( 669502 ) <josh.3io@com> on Monday November 08, 2004 @01:39PM (#10756158) Homepage
    I'm going to have to rewrite my sig.
  • by Sebastopol ( 189276 ) on Monday November 08, 2004 @01:43PM (#10756205) Homepage
    I'm surprised by the regex grammar. It looks a lot like how I use boost::spirit::rule for parsing regex in C++:


    # note this is just a language example, not an accurate name matcher
    grammar Names
    rule name :w { };
    rule singlename { + };

    C++::boost::spirit--- // rule for parsing a token string
    rule split = *(*space_p >>
    (+graph_p)[append(tok)] >>

    msg "Parsing input\n"; // 1. Parse declarations
    while (!header_ok && getline(input, line) && input.good())
    parse(line.c_str(), split);

    There are even grammar classes in Spirit.

    I sure hope perl6 is faster! ;-)

    • Perl 6 is probably producing a GLR parser, as Parse::RecDescent is a GLR parser (it means it would be slower, but more flexible).

      Isn't Spirit a LALR parser? Or an LL(1) parser?

      It's not going to be faster than Spirit, because GLR parsers are slower than every other kind of parser.

      On the other hand, you don't have to do all the wierd stuff you have to do with Spirit because it's mostly just syntactic sugar on top of C++, and therefore uses only C++ syntax (which doesn't look much like the natural pseudoco
      • The intent is that grammars default to recursive descent, but that it be possible to ask for various kinds of optimizations via pragma. The grammar for parsing Perl 6 itself will be a hybrid between top-down and bottom-up techniques to maximize both speed and flexibility.
    • by ajs ( 35943 ) <ajs@ajsBOYSEN.com minus berry> on Monday November 08, 2004 @05:51PM (#10759671) Homepage Journal
      Perl 6 will probably not be faster than boost, but keep in mind that you also gain the power of a fully dynamic programming language in Perl 6's rules. Rules act as closures and can also contain Perl 6 code. Hypothetical variables are really going to blow people's minds (I know they took me a while to grasp, and when I did, I just sat around saying "wow" for a while :-)...)
  • Regular Expressions to Context Sensitive (at least) Parsing. I'm not a big Perl Geek, but I use it on a daily basis.
  • Adoption (Score:5, Interesting)

    by base_chakra ( 230686 ) * on Monday November 08, 2004 @01:47PM (#10756251)
    Years ago, Eric Raymond wrote [python.org]:
    "Perl XS is acknowledged to be a nasty mess. My guess is the Perl guys would drop it like a hot rock for our [Python's] stuff --
    that would be as clear a win for them as co-opting Perl-style regexps was for us." [emphasis added]
    Maybe I misinterpreted ESR's intended message, but it would be disappointing if hypercompetition prevented Perl's already-influential regex extensions from exerting a positive influence on other platforms. Raymond seems to imply that the Python team only grudgingly included support for Perl-style regex. I understand that developement teams in similar niches each want to make a big splash in the industry, hopefully Python's great increase in popularity has softened the survivalist attitude that seems to characterize this Raymond quote from Python-Dev. Evolving regex can benefit everyone.

    Note to those ready to mod me Troll/Flamebait: I'm not trying to pick on Python, I just happened to be acquainted with this candid quote.
    • Re:Adoption (Score:5, Insightful)

      by Black Perl ( 12686 ) on Monday November 08, 2004 @02:26PM (#10756687)
      Yes, you did misinterpret the message. Eric Raymond was a former Perl programmer, and is now a Python programmer. He was saying that Python's native-code-binding facility is superior than Perl's XS, and it would benefit Perl to adopt it. He mentions that Python benefitted from adopting Perl's regex syntax. Nowhere does he say or imply it was "grudgingly" done.

      By the way, not long after he wrote that, Perl coders started using the Inline:: modules like Inline::C [cpan.org] instead of XS, which is very easy to use. I do not know if this was an adoption of Python's technique, but I don't think so.
      • Yes, you did misinterpret the message. Eric Raymond was a former Perl programmer, and is now a Python programmer. He was saying that Python's native-code-binding facility is superior than Perl's XS, and it would benefit Perl to adopt it.

        Thanks for mentioning that. You are absolutely right, and shortly after I posted the message I stuck my foot in my mouth when I saw to my horror that I had gotten it totally backwards and maligned Eric Raymond in the process!! Another casualty of the rush to post while the
    • Re:Adoption (Score:4, Informative)

      by kavau ( 554682 ) on Monday November 08, 2004 @03:00PM (#10757074) Homepage
      I don't know the context of the quote, but to me it reads more like this: "Python benefited greatly from adopting Perl technology in the past. I hope the Perl guys will be as open-minded as we are."

      Not much hypercompetition there, if you ask me. But then, it might as well be me who misunderstood the quote.

      • Re:Adoption (Score:3, Interesting)

        by ajs ( 35943 )
        Exactly. Python and Perl are not really competitors in the strictest sense. They both build on each other. In many ways, I think Larry would have made some of the choices that Python did, had he started out in the 80s knowing what he knows now, and that's evidenced by how much of Perl 6 draws from Python (as well as Ruby, Scheme, LISP, Smalltalk, C++ and Java).

        Of course, the basic approaches to language design follow different philosophies (Perl's is one of inclusion, Python's is one of exclusion... both a
  • by imnoteddy ( 568836 ) on Monday November 08, 2004 @01:53PM (#10756329)
    I can understand a desire for adding grammars that are more powerful than regular expressions in Perl 6 but it opens up a whole new can of worms.

    The grammars appear to be in a class called "context free languages"(CFGs). Some CFGs are ambiguous in the sense that a given "sentence" can be derived from more than one set of rules. Traditional tools such as yacc/bison tell you where there is ambiguity in your rules - even then it isn't always easy to remove the ambiguity (trust me on this). If the Perl 6 system doesn't help the programmer debug the grammar he/she will not be happy when the parsing doesn't work as expected.

    In addition, the article ends the description of features with "And much more...". It appears that Perl 6 grammars are more powerful than CFGs. If they can simulate a Turing machine...

    • What bugs me is they don't describe the type of parser being generated. Parse::RecDescent does just what it says... it generates recursive decent parsers. However, recursive descent parsers are not as powerful as the bottom-up parsers generated by, for example, Yacc/Bison (LL vs LR).
      • However, recursive descent parsers are not as powerful as the bottom-up parsers generated by, for example, Yacc/Bison (LL vs LR).

        That's backwards. Recursive decent with backtracking can parse all LL(k) grammars for arbitrary k. OTOH, yacc/bison can only parse LR(1) which, although sufficient for most realistic grammars, definitely is not as general as a full LL(k) method.

        Left-recursive grammars are a red herring -- you can always eliminate the recursion, and with backtracking you can deal with arbitrar

    • Perl 6 grammars are a full citizen of the language on a level with subroutines and classes (loosely speaking, in Perl 6, rule:grammar::method:class, actually). They're effectively Turing-complete as a result, since Perl 6 is obviously Turing-complete.

      Perl 5 "regexps", by contrast, are more of a specialized second language bolted onto the side (I use quotes since Perl 5 regexps are already marginally more powerful than "pure" regexps).
      • I use quotes since Perl 5 regexps are already marginally more powerful than "pure" regexps

        Are you sure? I looked into this because my instinct told me you were right and I wanted to know how much more powerful but then I found this line in the Camel Book: "The Perl Engine uses a nondeterministic finite-state automaton (NFA) to find a match" (Programming Perl 2nd ed., page 60). If correct that would suggest that Perl regexps and "pure" automata regexps are equivalent.
        • Re:Yes. (Score:3, Interesting)

          Don't forget that perl patterns support recursion. Perl can match (at least some) context-free grammars...

          #match context free grammar --[0{N}1{N}] (e.g. 01, 0011, 000111, etc.)
          $cf = qr/01|0(??{$cf})1/;

          print "matched $init\n" if ($init=~m/^$cf$/)

          ...and (some) context-sensitive grammars...

          #match context sensitive grammar --[0{N}1{N}2{N}] (012, 001122, 000111222,...)
          $t=qr/12|1( ??{$t})2/;
          $cs=qr/(?=$h 2+$)(?=0+$t$)/x;

          print "matched $init\n" if ($init=~m/^

  • Pet Project (Score:3, Funny)

    by orlyonok ( 729444 ) on Monday November 08, 2004 @02:15PM (#10756553)
    I'm studing seriously the posibility of tackling a whorty coding proyect, the rewriting of the entire LINUX kernel on a languaje very much but not unlike C and was considering doing it in C-INTERCAL but after seing things like this http://ozonehouse.com/mark/blog/code/PeriodicTable .html [ozonehouse.com] , I changed my mind and will use PERL 6 instead.
  • by Anonymous Coward on Monday November 08, 2004 @03:46PM (#10757684)
    I get sick of the 'standard' backlash every time a Perl article is posted. Why do people have such a problem with Perl? It's an excellent, high-level general purpose programming language with a huge range of extension modules available [cpan.org]. I have personally used Perl for many projects, as do TicketMaster [ticketmaster.com], ValueClick [valueclick.com], Morgan Stanley [morganstanley.com] and Ryanair [ryanair.com] and I've also learnt a lot about software engineering and computing through Perl.

    Yes, it does include a lot of symbols, but there is payback to learning them, and really most programs won't use much beyond $ % # () [] {}. Unlike some languages [java.com], Perl is not what I would describe as a 'bondage' language. If you want to program sloppy, you can program sloppy. That's fine by Perl. And this generousity is what gives Perl its bad reputation. This is funny since I and most knowledgeable Perl programmers can write perfectly clear and maintainable code. The way we do this is no secret--it's just by commenting appropriately, using meaningful identifier names and following the Perl style guidelines [cpan.org].

    People can mock Perl all they like, but it is still a widely used powerful programming language and I am more productive in it than any other language. As a parting comment, a Cisco employee once told me (off the record of course!) that "Cisco would fall apart without Perl".

    • Why do people have such a problem with Perl? It's an excellent, high-level general purpose programming language with a huge range of extension modules available. I have personally used Perl for many projects, as do TicketMaster, ValueClick, Morgan Stanley and Ryanair

      How compelling, you just named some of Slashdot's favorite companies.

    • Personally I have a love/hate relationship with Perl.

      The purist Computer Scientist in me loathes it. It is ugly, dangerous and has a weird botched together syntax.

      Just as I self rightously convince myself of these self evident facts, some real cool trick saves the day and the wild inner hacker in me starts telling the CS part to stop being such a bore :)

      Perl is a great language, though like all powerful tools it can be dangerous if misused. It is sometimes ugly, and just as often beautiful.

      All that said
      • The purist Computer Scientist in me loathes it.

        "Much as I hate to say it, the Computer Science view of language design has gotten too inbred in recent years. The Computer Scientists should pay more attention to the Linguists, who have a much better handle on how people prefer to communicate."

        --Larry Wall
      • Dangerous? (Score:3, Interesting)

        by TheLink ( 130905 )
        How's perl dangerous?

        In my experience (as a IT security guy) C and PHP are more dangerous than perl.

        C - "runs arbitrary code of the attacker's choice" given _common_ stupid programmer mistakes.

        PHP - developers fond of features that encourage bad/insecure ways of doing things - e.g. magic quotes, global track vars. Take away such popular PHPisms and PHP starts to look like perl ;).