Forgot your password?
typodupeerror
Programming IT Technology

Python 3.0 Released 357

Posted by samzenpus
from the break-out-the-cigars dept.
licorna writes "The 3.0 version of Python (also known as Python3k and Python3000) just got released few hours ago. It's the first ever intentionally backwards-incompatible Python release."
This discussion has been archived. No new comments can be posted.

Python 3.0 Released

Comments Filter:
  • Libraries (Score:5, Interesting)

    by explodymatt (1408163) on Thursday December 04, 2008 @08:03AM (#25987653)
    Python 3 being out is great, they've fixed a few things that allow bad programming, but does anyone know how long it will take for the libs to start getting ported? Especially numpy and scipy
  • And now to wait (Score:2, Interesting)

    by Anonymous Coward on Thursday December 04, 2008 @08:15AM (#25987745)

    Sounds great! Now to wait a few weeks while smart people find and fix all the security holes, so I can go and safely get version 3.1.

  • by Ancient_Hacker (751168) on Thursday December 04, 2008 @08:18AM (#25987767)

    Yes, Python 3.0 is a break.

    But in the past and forseeable future, Python has been exceedingly helpful, much more than most languages, during upgrades.

    Usually one has several months to try out new features-- they're in the current version but turned off until you ask for them with "future_builtins".

    Plus there's often a backwards feature in the next version to revert back to old behavior.

    Not to mention a -3 option to point out the lines in your old program that will need changing for version 3.

    But sometimes the changes are so big they can't be encompassed by a compiler switch. Such it is with 3.0.

     

  • Re:Libraries (Score:4, Interesting)

    by gzipped_tar (1151931) on Thursday December 04, 2008 @08:21AM (#25987789) Journal

    IIRC numpy and scipy have dependencies on other libraries that are not 2.6-clean. They also have a lot of issue themselves. Currently it's not a priority for them to migrate.

    Can't remember when did I read about that... and I'm too lazy to dig it out from their Trac :-P

  • by makapuf (412290) on Thursday December 04, 2008 @08:31AM (#25987885)

    But sometimes the changes are so big they can't be encompassed by a compiler switch. Such it is with 3.0.

    While I agree with your post, here it's not a problem with implementation but with syntax and backward compatibility within a given python version.
    The idea is that some needed changes cannot be made backward-compatible (new keywords, ...). So you group them and call that a new version of the language. I doubt you couldn't implement most of it with compiler switches.

  • Yay, Unicode! (Score:5, Interesting)

    by shutdown -p now (807394) on Thursday December 04, 2008 @08:44AM (#25988007) Journal

    Reworked Unicode support is a big deal. It was there before, of course (unlike Ruby - meh), but all those Unicode strings vs 8-bit strings, and the associated comparison issues, complicated things overmuch. Not to mention the ugly u"" syntax for Unicode string literals which was too eerily like C++ in that respect. Good to see it move to doing things the Right Way by clearly separating strings and byte arrays, and standardizing on Unicode for the former.

    Now, if only we could convince Matz that his idea for Unicode support in Ruby 2.0 - where every string is a sequence of bytes with an associated encoding, so every string in the program can have its own encoding (and two arbitrary objects of type "string" may not even be comparable as a result) - is a recipe for disaster, and take hint from Python 3...

  • print function (Score:4, Interesting)

    by togofspookware (464119) on Thursday December 04, 2008 @09:03AM (#25988181) Homepage

    First thing mentioned on the 'what's new' page (http://docs.python.org/dev/3.0/whatsnew/3.0.html)is that you'll have to change your code from

        print x, y, z,

    to

        print(x, y, z, end="")

    I can see the value of making things more consistent, but it seems to me whenever they update things in Python, it's usually to make programming in it a little bit harder.

    Why not make print a function, but then change the language to not require parentheses for any function call? You'd still have to use them when calling a function with zero arguments, and in sub-expressions, but to not require parens for top-level function calls would, if nothing else, make playing around in interactive mode or with short scripts a lot more pleasant.

    Granted, I come from a Ruby background, so I may not know what I'm talking about. My experience with Python is trying to write some scripts on my OLPC, where the craptacular rubber keyboard made typing parentheses all the more agonizing. I finally caved and installed Ruby so I could get some work done. Maybe people who prefer Python really like typing parens. And underscores.

  • Re:print function (Score:3, Interesting)

    by gzipped_tar (1151931) on Thursday December 04, 2008 @09:19AM (#25988315) Journal

    The IPython (nothing Apple-related) interactive shell hacked the Python lexer to allow exactly this. You type this at the shell prompt:

    foo a, b, c

    it will be interpreted as a call foo(a, b, c).

    IPython still has some bugs with this feature, though. It can be turned out, but I still prefer it in interactive use just as you've mentioned.

    Anyway, I think the current Python syntax is OK.

  • Re:print function (Score:4, Interesting)

    by maxume (22995) on Thursday December 04, 2008 @09:23AM (#25988375)

    I would say that it makes typing python a little bit harder, but I would also argue that it makes programming python easier, not harder (it eliminates print as a statement, but it also eliminates special syntax that existed only for redirecting print output, and makes it trivial to change the default behavior of print within a module (by defining a local print function)).

  • by m50d (797211) on Thursday December 04, 2008 @09:51AM (#25988675) Homepage Journal
    It does make it a pain in the ass to play around and test with because often cut-n-paste (from random sources) completely fucks up the indention which you then have to fix.

    Cut-n-paste is not a good way to learn.

    Between Python's extremely verbose syntax (not very script-friendly-like)

    It's not extremely verbose; take a look at Java if you want that. If you compare with e.g. perl, yes it's longer, but the difference is because it's using words rather than random characters, which in my book is worth it for the ease of remembering wtf to write. Compare it with Ruby or, *struggles to think of another scripting language* TCL, say, and the verbosity is pretty similar.

    and relatively poor performance...

    Really? It's not going to win races against C, but performance is very much on a par with say Perl (which yes, has a lot of improvements coming in v6, but that's not here yet), and ahead of other similar languages. Couple with the fact that it's easier to bind from python than any of the alternatives, and you end up with code that in practice is as fast as you could write anywhere (because you use e.g. NumPy, which just binds to the fastest libraries available for doing what it does).

    Of course python does sacrifice some things - but the ease of code writing and most of all maintainability are well worth it in most cases, in my experience.

  • ubuntu make fail (Score:2, Interesting)

    by rla3rd (596810) on Thursday December 04, 2008 @11:18AM (#25989883)
    too bad it doesnt install from source out of the box, even with libgdbm-dev installed

    make
    running build
    running build_ext

    Failed to find the necessary bits to build these modules:
    _dbm
    To find the necessary bits, look in setup.py in detect_modules() for the module's name.

    see bug here [python.org]. Why they would announce a release that wouldn't build for a major distribution such as ubuntu baffles me.
  • Re:Yay, Unicode! (Score:3, Interesting)

    by shutdown -p now (807394) on Thursday December 04, 2008 @01:08PM (#25991733) Journal

    since methods exist to examine what the encoding of a string is, and to change it, how would there be a disaster unless the coder was sloppy?

    Assume a simple case: a function taking two strings as arguments. In Ruby 2.0, you cannot safely concatenate those two strings, or even compare them (because encodings may be incompatible). You cannot properly interpret it, because the set of possible encodings is not closed (the client may pass you a string with an encoding he defined himself). You cannot convert it to some common encoding that is safe to process, because there may not be a common encoding (Ruby intends to support some Japanese encodings that do not have a well-defined Unicode mapping). You cannot even safely pass it on another library function, because it may not be able to handle a string in arbitrary encoding for the reasons mentioned above. In effect, it means that Ruby 2.0 are "arrays of characters", where a "character" is some opaque value from which no meaning can be derived in a general case.

    Note that the above means that this Ruby code has a bug of sorts for 1.9.1+:

    def foo(str)
      if str == "abc" # oops! who says str encoding is compatible with ASCII?
    end

    Cute, eh?

  • Re:Libraries (Score:3, Interesting)

    by blincoln (592401) on Thursday December 04, 2008 @01:10PM (#25991757) Homepage Journal

    I can't help but think it was designed by someone who was pissed off that people didn't format their code the way he formatted his code. Since his way was obviously the "right" way, why not write a language that forces you to do it that way? Problem solved!

    This is actually the main reason I haven't worked with Python beyond tweaking a few existing scripts. The funny thing is that (unless I'm misremembering the syntax) I already code using that style in other languages. But the idea of forcing that style on everyone annoys me enough to put me off of the language as a whole.

    I was really hoping that 3.0 would remove that petty stupidity. Doing so would even retain backwards compatibility with prior versions!

  • Re:Yay, Unicode! (Score:3, Interesting)

    by shutdown -p now (807394) on Thursday December 04, 2008 @01:15PM (#25991845) Journal

    If I understand Unicode correctly, the entire point is that Unicode provides a code point space, which defines all the possible characters available.

    You understand almost correctly :) The problem here is, what is a "possible character"? It is in many ways a political issue, and apparently some people aren't happy about the way Unicode handled some characters. One particular sore point is that of Han unification [wikipedia.org] - basically, Unicode assigned a single codepoint for every Han glyph, whether it's used in Chinese, Japanese, or Korean. Japanese were particularly unhappy about it.

  • Re:Libraries (Score:3, Interesting)

    by steveha (103154) on Thursday December 04, 2008 @02:03PM (#25992631) Homepage

    I wonder if Fortran may eventually be replaced by Python.

    A few years ago, when I was first getting into Python, I read an article where a guy from a science research lab talked about his lab's transition from Fortran to Python. Python has some nifty heavy-duty math modules, written in C; and everyone at the lab who tried out the Python stuff strongly preferred it to Fortran.

    Since C code is doing all the heavy lifting, it's nice and fast. Since Python is interactive, scientists can use it as a really-powerful desk calculator. And since Python has a clean and friendly syntax, it's easier to write and debug Python programs than Fortran.

    I really wish I had saved a copy of that article, or at least its URL. I've tried Google searching for it, and I find many hits on using Python in labs but I haven't found the article.

    steveha

  • Re:Yay, Unicode! (Score:3, Interesting)

    by shutdown -p now (807394) on Thursday December 04, 2008 @03:22PM (#25993755) Journal

    The statement is an error as the types don't match. Quite a few people claimed this in response to my previous posts.

    They are correct. "UTF-8 String" is not really an UTF-8 constant, it's just a plain Unicode string now. It makes sense, too, as comparing a byte array with a string is not generally well-defined operation. And yes, of course, it's a breaking change, and is on the changelog [python.org].

    Now you can still have byte array literals if you want them, but they are opt-in via "b" prefix (much like Unicode strings were opt-in via "u" in 2.x). So:

    if byte_string == b"UTF-8 constant":

    works.

    It also appears to be impossible to make an unadorned string constant that contains an *invalid* UTF-8 encoding, since the translation is done at compile time, so no changes to the current encoding will help.

    Well, if it's invalid, it's no longer UTF-8, right? So not a valid Unicode string anyway - why would you want it to pretend to be one? You can still make a byte array like that (though of course it will fail if you then try to decode it as if it was UTF-8 - because it's not).

    In Python 2.0 and in most other languages "\xC2\xA2" is a cent-sign

    True for Python 2.0, false for "most other languages". It's not true for most post-Java mainstream and/or generally well-known languages (C#, VB, Haskell, R6RS - to name a few). So Python is simply standardizing on what's already widely accepted. Of course, it also makes most sense when you deal with Unicode strings - forget about bytes, work with codepoints. In-memory representation of the string shouldn't be your concern, anyway.

    Also the documentation claims that b"\u00A2" is invalid, but that makes it really difficult to make byte string constants containing arbitrary UTF-8 in a more readable way.

    Well, of course it's invalid - it's a byte array, not a string! And why do you think that it would have to be UTF-8 even if it was allowed? Why not UTF-16 or UCS4?

    Of course, nothing stops you from using str.encode, e.g.: "\u00A2".encode("utf-8") - which is quite explicit about what's going on, and yet short enough at the same time. By the way, if you omit the argument to encode, it will just use the default system encoding for non-wide-chars, which is usually precisely what you want on Unix.

  • Re:Yay, Unicode! (Score:3, Interesting)

    by spitzak (4019) on Thursday December 04, 2008 @05:59PM (#25995869) Homepage

    Reading the changelog, it sure does sound like b"abc"=="abc" will produce an error. I do find this extremely suprising as I would think this would break enormous amounts of software.

    It sounds like Python 3.0 will throw an error if you read a file that contains invalid UTF-8, until the program is rewritten to read the file as "bytes". Then it will throw errors when you convert the bytes to "str", until you rewrite the functions reading the files to return bytes instead of str. Then the users will hit this problem in that their code will no longer compile. I can't see this being any good.

    Checking the web pages, I am certainly not alone in this worry. A more popular solution however seems to be to stop throwing errors. The conversion to Unicode would instead translate invalid bytes to U+DCxx (ie unpaired UTF-16 lower-half surrogates). This would avoid the exceptions and also make the translation lossless. I have examined this before and it has a big problem in that the translation of (possibly invalid) UTF-16 to UTF-8 is no longer lossless (imagine the UTF-16 had a sequence of these invalid symbols that actually match a valid UTF-8 encoding), which might lead to bad security holes.

    if it's invalid, it's no longer UTF-8, right?

    You are parroting the same crap used by people who don't like UTF-8 and try to make it more difficult than it really is. It is indeed UTF-8, just because it has errors in it does not make it not be UTF-8, anymore than a misspelled word makes this post not be English.

    It's not true for most post-Java mainstream and/or generally well-known languages

    You seem to have forgotten languages called "C" and "C++". I heard they were pretty popular...

    I think you might also check exactly what some of those languages do, you can't put more than \xff into most of them so they are actually doing exactly what I am saying, except they are assuming ISO-8859-1 as the encoding. If the encoding can be changed to UTF-8 then it would work exactly like I am stating. (if values greater than 0xff are accepted they could ignore the encoding and you would remain compatible).

    What you are saying is that there is no difference between \x and \u, which seems pretty stupid to me.

    The main reason I want this is so that a string constant can be changed between bytes and unicode by just changing the 'b' to a 'u'. This is also why I want \uXXXX to work in byte strings.

    On b"\u00A2": Well, of course it's invalid - it's a byte array, not a string! And why do you think that it would have to be UTF-8 even if it was allowed? Why not UTF-16 or UCS4?

    The compiler is already assuming UTF-8 when it parses u"abÂ" so I see no reason it can't assume UTF-8 here as well.

  • Re:Yay, Unicode! (Score:3, Interesting)

    by spitzak (4019) on Friday December 05, 2008 @04:12PM (#26007353) Homepage

    If your input file is supposed to be UTF-8 text, and is not, then surely it's an error?

    UTF-8 with errors is STILL UTF-8. It just is not "valid UTF-8" which is a mostly uninteresting subset. The set of UTF-8 strings is every single possible byte sequence. The set of "valid UTF-8" strings is a SUBSET that a tiny portion of software (mostly validators) should have to care about.

    People are trying to make this far more difficult than it really is by somehow saying that we must restrict ourselves to that subset at a very low level. That is wrong and is the main reason why there is so much confusion about UTF-8. Nobody seems to care that UTF-16 can have illegal sequences (Python handles them without complaint) and nobody cared for 10 years that the Japanese encodings could have illegal sequences. But for some reason UTF-8 brings out this complaint over and over again. I suspect the problem is that people have invested too much effort in UTF-16 and don't want to admit they made a huge mistake, and the only way is to try to make UTF-8 hard.

    But, of course, as soon as you want to start treating it as an actual string - so that you can say things such as "give me the 10th character" (and not "10th byte") - it has to be valid, otherwise all string-specific operations would simply be undefined.

    Well of course. Therefore THAT function should throw the damn exception! Not every single string manipulation!!!!

    Also you amazingly did the same bogus example of "move by 10 characters" I have seen before. Please look at real software and you will see that NOBODY EVER MOVES BY "10 CHARACTERS". 1 maybe. Otherwise the only use EVER of such code is because "10 characters" was previously calculated by another function looking at the EXACT SAME STRING and therefore a byte offset or UTF-16 word offset or whatever will work just as well.

    L"\xC2\xA2" is not a cent sign in either C or C++. It's a wide (string with two characters.

    It is byte values converted using ISO-8859-1 encoding. What I want is the ability to change that encoding.

    The compiler isn't assuming UTF-8, the code which reads the file as a sequence of characters (before lexing, much less parsing, takes place) does that.

    That is wrong, because it would not be possible to create a byte string containing an invalid UTF-8 sequence. This would break any software that has a string constant with ISO-8859-1 encoding in it (the programmer will still need to put a 'b' in front of it, but that is a lot easier and readable than going and replacing all the foreign letters with \x sequences).

    In any case I don't see any reason why the Lexer should assume a different locale than the parser. That would be pretty confusing.

Vitamin C deficiency is apauling.

Working...