Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
Databases Programming Math

SQL and NoSQL are Two Sides of the Same Coin 259

Posted by Unknown Lamer
from the dualistic-monadic-monsters-from-the-sea dept.
An anonymous reader writes "NoSQL databases have become a hot topic with their promise to solve the problem of distilling valuable information and business insight from big data in a scalable and programmer-friendly way. Microsoft researchers Erik Meijer and Gavin Bierman ... present a mathematical model and standardized query language that could be used to unify SQL and NoSQL data models." Unify is not quite correct; the article shows that relational SQL and key-value NoSQL models are mathematically dual, and provides a monadic query language for what they have coined coSQL.
This discussion has been archived. No new comments can be posted.

SQL and NoSQL are Two Sides of the Same Coin

Comments Filter:
  • by lakeland (218447) <lakeland@acm.org> on Thursday April 07, 2011 @06:05PM (#35751528) Homepage

    And we DO still use SQL. And we do so because it works

    I disagree.

    There are some things which are fast for a computer to do but are slow and awkward to do in SQL. You can see quite a few of them in SAS supported by data steps (e.g. decent RBAR support). Another is say I want to get the latest transactions for each customer, I have to do something like

    select customer_id,max(txn_datetime) latest_txn_datetime
    from fact_txn
    group by customer_id

    -- I've now got the latest transaction time but because system reversals and the like are often set to happen at the same time as the transaction I have to next...

    select customer_id,max(txn_key) latest_txn_key
    from fact_txn
    join (select customer_id,max(txn_datetime) latest_txn_datetime
    from fact_txn
    group by customer_id) latest_txn
    on (fact_txn.customer_id = latest_txn.customer_id)
    where fact_txn.txn_datetime = latest_txn.latest_txn_datetime
    group by fact_txn.customer_id

    -- Now we've got the right transaction key, but we still have to join again to get the actual transactions

    select fact_txn2.*
    from fact_txn fact_txn2
    where exists
        select 1 from
    (select customer_id,max(txn_key) latest_txn_key
    from fact_txn
    join (select customer_id,max(txn_datetime) latest_txn_datetime
    from fact_txn
    group by customer_id) latest_txn
    on (fact_txn.customer_id = latest_txn.customer_id)
    where fact_txn.txn_datetime = latest_txn.latest_txn_datetime
    group by fact_txn.customer_id) txn_keys_of_interest
    where fact_txn2.txn_key = txn_keys_of_interest.latest_txn_key;

    I can tidy this up a bit if needed - using CTEs or whatnot but the simple fact remains that a lot of the time I want to write a one liner like:

    select * from fact_txn having txn_key = max(txn_key) over (PARTITION by customer_id order by txn_datetime)

  • by nahdude812 (88157) * on Friday April 08, 2011 @05:57AM (#35755388) Homepage

    Like Angel'o'sphere said, if you can adapt your database, the problem becomes trivial. Make sure that at least for a given customer, each subsequent transaction ID is greater than the prior transaction ID (if this is not already the case, then add a new field populated by a sequence so that you have a field where it is the case).

    Here's the solution with a sub-select (because it's easier to read, it can be converted to a join for efficiency):
    SELECT
            transactions.fieldNames
    FROM transactions
    WHERE
            (transactions.customerID, transactions.transactionID) IN (
                    SELECT customerID, MAX(transactionID)
                    FROM transactions
                    GROUP BY customerID
            )

    If, as you suggest, you need it for specific date ranges, then add those to the sub-select. Like I said, for most RDBMS's this would be faster if converted to a join (and basically every sub-select can be converted to a join). For some RDBMS's they would convert it to a join as part of the execution planning anyway (I believe Postgres and Oracle do this).

    Arguments like these actually only serve to strengthen RDBMS's case over NoSQL. Database engineers have been solving these problems easily and efficiently for years, but a new generation likes to think in new patterns. Not that there's anything wrong with that - except there is a certain tendency to try to put a square peg in a round hole, a complaint when it doesn't fit right, and a sigh from the guys who've been carving pegs so they fit snugly all along.

    Key/value storage does have advantages over traditional RDBMS designs (assuming the RDBMS is designed and utilized properly), but those advantages are things like linear scalability, and very few cases where a task on the K/V side is substantially faster to complete than a properly designed solution on the RDBMS side - at least not until you are talking tens or hundreds of billions of records on 100+ CPU clusters (this is the linear scalability advantage).

You're already carrying the sphere!

Working...