Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Programming Python

Julia Users Most Likely To Defect To Python for Data Science (zdnet.com) 32

The open-source project behind Julia, a programming language for data scientists, has revealed which languages users would shift to if they decided no longer to use Julia. From a report: Julia, a zippy programming language that has roots at MIT, has published the results of its 2020 annual user survey. The study aims to uncover the preferences of those who are building programs in the language. [...] Last year, 73% of Julia users said they would use Python if they weren't using Julia, but this year 76% nominated Python as the other language. MATLAB, another Julia rival in statistical analysis, saw its share of Julia users as a top alternative language drop from 35% to 31% over the past year, but C++ saw its share on this metric rise from 28% to 31%. Meanwhile, R, a popular statistical programming language with a dedicated crowd, also declined from 27% to 25%.
This discussion has been archived. No new comments can be posted.

Julia Users Most Likely To Defect To Python for Data Science

Comments Filter:
  • by jythie ( 914043 ) on Wednesday August 26, 2020 @01:35PM (#60443107)
    Python, R, and MATLAB I can see, but C++ kinda surprises me. Maybe it is getting popular among people doing GPU heavy work?
    • Re:C++? (Score:4, Insightful)

      by damn_registrars ( 1103043 ) <damn.registrars@gmail.com> on Wednesday August 26, 2020 @01:42PM (#60443135) Homepage Journal

      Python, R, and MATLAB I can see, but C++ kinda surprises me. Maybe it is getting popular among people doing GPU heavy work?

      My guess was most of them learned C++ in undergrad and then started using Julia in their jobs; hence they'd go back to C++ before picking up another language. I guess I could write a Perl script to try to better parse their responses if I can find their raw data anywhere...

    • by gweihir ( 88907 )

      Python, R, and MATLAB I can see, but C++ kinda surprises me. Maybe it is getting popular among people doing GPU heavy work?

      Does not really make sense. Most would get that as a library and creating Python modules in C (or C++) is not really hard.

    • by Ksevio ( 865461 )

      Probably to pick up a C++ library for faster processing

    • by goombah99 ( 560566 ) on Wednesday August 26, 2020 @04:17PM (#60443841)

      Julia is JIT and really emphasizes type. Put those together and you get something that isn't what you expect a typed language to behave like. Specifically you can write loosely, not declaring types if you like. But what happens under the hood is that every time you call the same function with a different type signature it gets re-compiled on the fly. So inside each functions the types are perfectly known and all speed benefits, memory benefits, type safety and dispatch rules work. You just got it all for free even though you are writing as sloppy as python or any duck typed language.

      What's even nicer or rather amazing about that is that it doesn't have to know about a type ahead of time when the function is written. So for example, you get nearly all of autograd (automatic differentiation) for free. Every function you ever wrote can simply be passed a new type, a tracked float, instead of a float and it will recompile for that. Or for any mixture of tracked and tracked vars you pass in your call. Thus you don't need to work inside a walled garden like tensorflow types to be able to get this benefit. You already have it without re-writing any code.

      This is why hideously complex algorithms that already exist (like say multivariate numerical integration) all can use autograd in Julia but break in python.

      Of course you also can write typed code too. In fact all your function declarations and undeclared instantaneous variables already have a type, it's called "Any" type and is the global supertype of all others. But you can declare in the function def or declared variable that it must be an int or a float or whatever. So all typing mechanics is present however much you want to take advantage. Thus for example, all functions can be type signature dispatched.

      And finally 1. all the numerical stuff isn't an afterthought like numpy, 2. you don't need bandaids like numba to for optimization and 3. the package maintainer is built into the language so the interfaces are language aware (even so it can also access all of conda or other package managers under the hood as well).

      Where Julia drives people crazy is really just one, I think very bad, design choice. For some reason they wanted it to be like Fortran 90 or Matlab. Thus besides idiosyncrasiate like variables can by unicode greek letter, and arrays start at 1 not 0 by default, the really pernicious distraction is that there's no classes.
      Really. there's no classes.
      It breaks your brain because it does exactly the opposite of encapsulation. For example, normally you'd think that any object should have a "to_string" method so Print statements would be able to call that. In julia, you instead overload the system keyword (like say print) with a function that gets the dispatch when the to_string method would be needed.
      At first you just want to slit your wrists. But after a month or so you find this isn't bonkers at all. Indeed it is actually kind of a class after all. It looks like a class. You create a type. then of course you can't do much with it unless you have some type specific methods. So you write underneath that. and after a while it just looks like a Class code block. That to_string method is right there in that code block just like it would be in a class.
      So it really looks just like a class. There's just no class keyword wrapping it.

      If you are familiar with add-in style of writing python or javascript objects, it looks like that.

      And it is way way, way way, way way, more sane than for example C++ with it's bizarro rules about what has to be in the header, what in the .cpp, and how constants get declared in classes. C++ has to be the most brain breaking language ever and people get used to it.

      So julia is quite awesome. wickedly fast. Just as easy as python's sloppy type. But it's a rough month to get over the confusion of indexing from 1 and the add-in style of overload based method development.

      Julia would be about the perfect language if they had just not tried that matlab/fortran 90 retro style. However I think this probably is also why it works too. Without the class wrapper, all those functions get JIT compiled right not latently like in a class.

      • by goombah99 ( 560566 ) on Wednesday August 26, 2020 @04:39PM (#60443931)

        In Python if you need high performance the standard is to use things like numpy, scipy, tensorflow etc. There is even stuff like cython to code some performance critical parts in c which don't already have high performance versions.

        those are band aids. You can't use any type of variable with numpy, or tensor flow or numba. Those libs have to know the type of the variable beforehand and normally it has to be a type declared in the library or a helper library. For example, Tensor flow can't do autograd on any var that isn't a tensor flow var and and calls the pre-written tensor flow methods. You can't just do a tensor multiply on any type. Julia can. Sure you can flounder around creating your own types that obey the interface for tensor flow but that's throwing the work on to you. When you are lucky someone has done this for you by say teaching tensor flow to dress numpy arrays.

          Julia does the work of recompiling the other algorithms for your new type. There's nothing else like it.

      • Julia was designed to cater to mathematicians in first place so the arrays that begin with 1 make more sense. In reality 0-based arrays are an unfortunate baggage from an implementation detail in old language. Its more logical to begin to count with 1 than with 0. It would be very simple to add object notation to Julia to have a.function(b, c, d) instead of function(a, b, c, d). Dylan which is a language with a similar design to Julia has it. This would make the code look more familiar without losing Julias
    • Having used Julia, Python, R, MATLAB and C/C++, I cannot see the motivation to switch to R. It's a nasty language. Its main benefit was direct support for statistics that other languages didn't support directly. But that lead has been lost to Python.

      • by jythie ( 914043 )
        Yeah, learning R was one of the most surreal experiences in my career, and I have trouble picturing why people would use it unless they were coming in from SAS or something. On the other hand I've worked several projects where the other team(s) were really into R, so it seems to appeal to people with certain backgrounds.
        • I was using it to do some statistical tests my wife needed when she was doing her PhD. The alternatives used in the department were SPSS and SAS which were GUI driven statistical packages for people who can't program. Those packages could not do MANOVA (Multivariate ANOVA). R could. I didn't think it was going to be a big thing, but of course a boat load of data-handling cruft grew up around the core analysis code. R had horrible data handling features. I ended up formatting data in python to make the R cod

    • Re: (Score:1, Offtopic)

      Hey... My pussy gets so wet, look at .... Wanna come inside? >> kutt.it/KQWC8j
  • It makes sense... (Score:4, Interesting)

    by Junta ( 36770 ) on Wednesday August 26, 2020 @02:16PM (#60443231)

    Python basically displaced Fortran as it had a surprisingly extended life in technical computing.

    But the convenience of python appealed to those folks despite it perhaps not being as purpose designed for technical computing.

    Julia is basically 'Ok, but what if we did Fortran *now*, what would it look like?' and finally there was a language geared toward technical computing but modernized in a pretty compelling way.

    Things are a bit more than that, but attempts to paint Julia in a different light I think detracts from the purpose of the language.

    • But will your python code still run in 10 years? (https://developers.slashdot.org/story/20/08/25/1958241/will-your-code-run-ten-years-from-now)

      FORTRAN code from the 1970's will still compile and run on modern FORTRAN compilers.

    • by Creepy ( 93888 )

      My problem with Python is often I pull a program with dependencies and they haven't listed any of the dependencies and don't have a proper package to get them (not that python ever had good dependency handling until recently). I usually start building the package, find I'm missing some library, download that library and start to build again, rinse, repeat. Then there are built in python bindings in some Linux's that force a certain version, but I need a newer one in the path. If you can get Robot Framework

    • by jma05 ( 897351 )

      Julia is more of a modern, open source MATLAB, more than a modern Fortran.

  • by tonique ( 1176513 ) on Wednesday August 26, 2020 @02:28PM (#60443271)
    An anecdote: I wanted to try some maths testing large numbers, using arbitrary precision. The calculation involves a formula for integers (n) for each integer (r=3-1,000,000) and then calculating another value and testing whether that is an integer. Perl could calculate n=1-10,000 in 10 seconds, and Python (under Sage [sagemath.org]) could calculate n=1-10,000 in 12 seconds. Julia, which I certainly don't know the best practices of, could calculate n=1-1,000,000 in 4 seconds. My recommendation is that if you need lots of number crunching, take a look at Julia.
    • In Python if you need high performance the standard is to use things like numpy, scipy, tensorflow etc. Basically, don't reinvent the wheel. There is even stuff like cython to code some performance critical parts in c which don't already have high performance versions.

      • Thanks. I tried not to imply I know even Python too well...
      • by g01d4 ( 888748 )

        Basically, don't reinvent the wheel

        Exactly. Scientific programming languages are a collection of built-in routines that easily integrate with those written externally. Some of the more basic languages used for scientific programming were ones that quickly built up a large external collection creating something of a network effect. A lot of code in a scientific application consists of glue that binds these routines together. I think in large part the availability of external routines is what drives popularit

        • Part of the reason I used Python for my PhD work is so many high performance libraries available. You can use numpy and scipy backed by MKL, Tensorflow with CUDA etc. Most of the performance ends up in the low level code and Python essentially ends up for command and control. We even use python on the supercomputers here and see no real performance impact from doing it.

          Since the performance ends up pretty much the same as coding it all in C++, C or Fortran but written far faster and more maintainable it is

      • In Python if you need high performance the standard is to use things like numpy, scipy, tensorflow etc. Basically, don't reinvent the wheel. There is even stuff like cython to code some performance critical parts in c which don't already have high performance versions.

        those are total band aid. You can't use any type of varible with numpy, or tensor flow or numba. Those libs have to know the type of the variable and normally it has to be a type declared in the library. For example, Tensor flow can't do autograd on any var that isn't a tensor flow var and and calls the pre-written tensor flow methods. Julia can. Sure you can forbile around creating your own types that obey the interface for tensor flow but that's throwing the work on to you. Julia does the work of re

        • There's nothing else like it.

          That's debatable considering how many things it stole from Lisp. And things like transparent automatic differentiation are practically a weekend homework in Lisp.

    • My recommendation is that if you need some heavy number crunching call a BLAS library -- or the library best suited for task at hand. It shouldn't matter much if you call it from Perl, Python, or Julia.

      • Most of the calls in numpy and scipy that can be implemented with BLAS are implemented with BLAS. If you use the anaconda release of python the default is to link to a high performance BLAS like MKL.

        You are right though in pretty much any language you will call BLAS and the language that you coded the call in matters very little.

  • I was an architect in my company's data team in a previous role and experimented with both R and Julia. My "home" language is Python and I always end up comparing each language to it, especially since pandas/numpy and graphic libraries are so mature. I can do R and Matlab (and Octave) easily enough, but there were some adoption issues in my group (primarily Python/Java folks).

    Julia is pretty nice. It's trivially easy to get graphs going (https://www.digitalhermit.com/mathematics/learning-julia). There were

  • I'm pretty sure that these statistics mostly reflect what languages Julia developers were using prior to starting to use Julia. If you came from Python, wouldn't you just go back to using Python?

Truly simple systems... require infinite testing. -- Norman Augustine

Working...