Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Microsoft Programming

In-Database R Coming To SQL Server 2016 94

theodp writes: Wondering what kind of things Microsoft might do with its purchase of Revolution Analytics? Over at the Revolutions blog, David Smith announces that in-database R is coming to SQL Server 2016. "With this update," Smith writes, "data scientists will no longer need to extract data from SQL server via ODBC to analyze it with R. Instead, you will be able to take your R code to the data, where it will be run inside a sandbox process within SQL Server itself. This eliminates the time and storage required to move the data, and gives you all the power of R and CRAN packages to apply to your database." It'll no doubt intrigue Data Scientist types, but the devil's in the final details, which Microsoft was still cagey about when it talked-the-not-exactly-glitch-free-talk (starts @57:00) earlier this month at Ignite. So, brush up your R, kids, and you can see how Microsoft walks the in-database-walk when SQL Server 2016 public preview rolls out this summer.
This discussion has been archived. No new comments can be posted.

In-Database R Coming To SQL Server 2016

Comments Filter:
  • by Anonymous Coward

    Check out http://alteryx.com/ which is already doing in database R with Oracle and Hadoop (Spark R.) Its great that Microsoft is joining the club, but they aren't exactly the 1st.

    • by Anonymous Coward

      PostgreSQL has had PL/R since 2003.

      • Re:Alteryx (Score:5, Interesting)

        by Skinkie ( 815924 ) on Saturday May 16, 2015 @02:26PM (#49706749) Homepage
        MonetDB has a nice comparison on different in and out of database performance: https://www.monetdb.org/conten... [monetdb.org]
      • PostgreSQL has had PL/R since 2003.

        Which is nice but doesn't really do anything for you if you're not using PostgreSQL, for example those using SQL Server.

        • Re:Alteryx (Score:5, Insightful)

          by phantomfive ( 622387 ) on Saturday May 16, 2015 @04:33PM (#49707367) Journal

          for example those using SQL Server.

          Though to be fair, that was a questionable decision to begin with. You just don't get any value for your subscription fees.

          Databases are one area that open source is beating closed source.

          • by Shados ( 741919 )

            Vertica says HELLO! Even though its -absurdly- expensive, it runs circle around anything open source.

            Though in general, large (really large) databases is an area where you actually want commercial support, because things can go wrong in the most fucked up ways.

            Open source dbs have companies doing that support, but few have the kind of manpower I'd want when things go very sour.

            • Vertica says HELLO! Even though its -absurdly- expensive, it runs circle around anything open source.

              Vertica is a data warehouse

              Open source dbs have companies doing that support, but few have the kind of manpower I'd want when things go very sour.

              If you sincerely need help from Oracle/Microsoft/HP to deal with your database problems, then your technical expertise isn't very high.

              • by Shados ( 741919 )

                The line is so thin between data warehouse and transactional dbs. Heck, in this case the only difference is how data is stored and which type of query is fast and which is slow. You can insert, run SQL (we use Postgres as a mock to run persistance layer tests, because its so close to Vertica), all in real time. Close enough.

                And even the biggest of big data giants sometimes end up with issues where you need help. When you need to write a patch for your RDBMS, its nice to be able to have a vendor to do it, op

                • The line is so thin between data warehouse and transactional dbs. Heck, in this case the only difference is how data is stored and which type of query is fast and which is slow.

                  No, that is actually the difference lol

                • And even the biggest of big data giants sometimes end up with issues where you need help. When you need to write a patch for your RDBMS, its nice to be able to have a vendor to do it, open source or not. Not many companies keep Postgres core developers in house (

                  I'm interested though, is this an issue you've run into?

                • by Bengie ( 1121981 )
                  Derp, the only difference between a transaction database and data warehouse is the datastructures and algorithms.. herpa derpa.

                  And the only difference between a train and semi is the engine and body.
          • by Bengie ( 1121981 )
            When I was doing research into databases and total cost of ownership, Postgres was pretty much the best until about $100k, then MS-SQL caught up and it was pretty much a tie. MySQL was pretty bad the entire way through. There were a few other databases, but they were both uncommon and not ever better.

            With Postgres and MS-SQL being pretty much a tie on TCO, just choose whichever best fits your situation. Postgres does have a low barrier of entry and can do some pretty nifty things, but those things increa
            • Postgres was pretty much the best until about $100k, then MS-SQL caught up and it was pretty much a tie.

              How did MS-SQL catch up?

  • How about introducing schema-less tables in 2016? Are we going to have to store fuzzy data in a silly full-text search enabled field forever?
    • SQL Server 2016 will have a Json column type, so its most of the way there.

  • by Cassini2 ( 956052 ) on Saturday May 16, 2015 @02:05PM (#49706625)

    The problem with R is that everything is a vector. When you hit something as big as a multi-terabyte database, the vector doesn't fit in memory anymore. An interpreted language like R, and even many compiled languages, expect memory accesses to be quick. However, if the data accesses are requiring SQL calls, then the R-SQL server marriage will be very slow. I'm sure they will be able to do some small demonstrations that look quick, but once the database becomes large, then things will be very slow.

    On the good news side, there are some operations like average and standard deviation that reduce into loops of sums. Those should map onto SQL queries relatively well.

    On the bad news side, a popular operation is to build a covariance matrix. With a large data set, it is easy to create a covariance matrix that does not fit into RAM.

    R would be a better match against an distributed database (NoSQL, MongoDB), where the memory requirements of the vectors could be split across multiple computers. Although, that too might require some changes to R.

    • by Anonymous Coward

      The problem with R is that everything is a vector. When you hit something as big as a multi-terabyte database, the vector doesn't fit in memory anymore.

      library(bigmemory)

      Create, store, access, and manipulate massive matrices. Matrices are, by default, allocated to shared memory and may use memory-mapped files. Packages biganalytics, synchronicity, bigalgebra, and bigtabulate provide advanced functionality.

    • by jbolden ( 176878 )

      RDBMS engines are designed to convert routines of in memory row by row or group by group statistical operations and figure out good (optimal) disk / memory organizations. That's one of the things they are very very good at.

    • DBAs won't like it and will disable it in most corporate environments. This in effect lets the users/developers "inside" their precious servers where they are the ultimate power in a way they can't fully control (and lets face it Control Freak is a job requirement for a DBA). Add to that the potential to bring a server to its knees with a badly written fragment of code and the possibility of security holes in a new component and they will have all the ammo they need to convince their bosses that it is a B

      • I would imagine one would only be performing datamining/statistical analysis on the data warehouse server, not the transactional database server.

  • Wouldn't Microsoft need to release SQL Server under GPL by including R?
    • An implementation of R is GPL, but that doesn't extend to all independent implementations, such as the one MS is writing to do this.

    • Re:Isn't R GPL? (Score:5, Informative)

      by lakeland ( 218447 ) <lakeland@acm.org> on Saturday May 16, 2015 @04:47PM (#49707437) Homepage

      No - MS will only need to release any changes they make to R.

      This sort of thing comes up quite often and largely comes down to coupling. If Microsoft included R code in the binary of SQL Server then they would run into complications. However as long as they keep R on its own and arrange interprocess communication sensibly, they will not be affected by the GPL.

      It's quite likely MS will modify R, e.g. writing low level routines for getting data out of SQL without needing to go via ODBC and those sort of changes will need to be released. It's also possible MS will want things like .RData readers for putting into SQL and similar - and they might choose to do a clean-room implementation of such bits rather than calling out to R for the loading code in order to avoid too tight coupling.

      Incidentially, this has been done before. The PgR project gives Postgres (BSD) has tight coupling with R (GPL) without requiring Postgres to be relicenced. Tableau also released similar features, though they don't add much value at this stage.

      • Oracle already does this too, embedding R as part of Oracle Advanced Analytics, but only if your boss can afford to sell your kidney. Looks like MS is falling behind.
  • by alen ( 225700 ) on Saturday May 16, 2015 @02:53PM (#49706903)

    expect it to be in the enterprise version at $7000 a physical CPU core

  • I'm curious whether it will be exposed via OLAP - when I was doing some proteomics work with MS OLAP some years back, the retrieval speed was stellar, but the math libraries were pathetic, which seemed pretty sad for something allegedly aimed at analytics. (Yes, I know, most people assumed business analytics, but there's an awful lot of potential for scientific analysis, especially with large, messy datasets.)

    • I'm guessing they'll slowly phase out OLAP.

      OLAP got its stellar retrieval speed through lots of precomputation and that just isn't compatible with where the whole big data stuff is going. I'd guess instead they will bring in a NoSQL database as a per-table query engine and use that as the OLAP replacement.

  • by Anonymous Coward

    Embrace, Extend, Extinguish [wikipedia.org]

    Microsoft, just like they did to Lotus 123, Wordperfect, just like they did to Java with their J++ before getting spanked, just like they tried to do with C++, just like they're trying to do with porting Android and IOS apps to their OS, they're doing it again -- creating a Roach Motel of software in which the developer or user can check into the Microsoft Roach Motel OS, but they sure cannot check out.

    What is so egregiously evil about this? They're taking an Open Source product

  • Being able to remotely transmit commands in a new general-purpose programming language to the server that stores your irreplaceable data? What could possibly go wrong?

    Also, how do you say "Robert'); DROP TABLE Students;" in R?

  • Wake me up when SQL Server comes with an MP3 player built in.

Keep up the good work! But please don't ask me to help.

Working...