Forgot your password?
typodupeerror
Databases

Diagramming Tool For SQL Select Statements 156

Posted by kdawson
from the unique-like-a-snowflake dept.
alxtoth writes "Snowflake is a new BSD-licensed tool that parses SQL Select statements and generates a diagram. It shows parts of the underlying SQL directly in the diagram. For example: x=30, GROUP BY (year), SUM (sales), HAVING MIN (age) > 18. The primary reason for the tool was to avoid Cartesian joins and loops in SQL written by hand, with many joined tables. The database will execute such a statement, if syntactically correct, resulting in runaway queries that can bring the database down. If you sit close to the DBAs, you can hear them screaming... "
This discussion has been archived. No new comments can be posted.

Diagramming Tool For SQL Select Statements

Comments Filter:
  • by Shados (741919) on Sunday August 03, 2008 @07:55PM (#24461071)

    No single query will ever bring a (real) RDBMS down. Even on a terrabyte of data or more, doing a crazy multi-hundred-table cross join, you're not going to bring it down.

    Now, it could seriously slow down a production server, but... you're not pushing untested SQL on a production server now, are you? Right? Riiiiiiiiiiight?

    So at worse, you're slowing down your own localhost development database engine for everyone else trying to access it (read: no one).

    Not much for the DBA to scream about...

  • by hkz (1266066) on Sunday August 03, 2008 @07:59PM (#24461105)

    A link to an alpha project on Sourceforge that was created three days ago and doesn't even have its own website? That apparently outputs LaTeX tables instead of something readable without having to compile it first, like HTML, SVG, or even indented text? I know it's silly to expect every story to be about a cure for cancer, but come on...

  • by Anonymous Coward on Sunday August 03, 2008 @08:12PM (#24461201)

    No single query will ever bring a (real) RDBMS down. Even on a terrabyte of data or more, doing a crazy multi-hundred-table cross join, you're not going to bring it down.

    A real ACID-compliant database, no. MySQL, maybe.

    Now, it could seriously slow down a production server, but... you're not pushing untested SQL on a production server now, are you? Right? Riiiiiiiiiiight?

    Unfortunately sometimes you do need to run new queries against production servers. Of course, with a real database like MSSQL or Oracle, you can see how a query will execute, what path the optimizer will follow, and what the cost of the query will be.

  • by Hackerlish (1308763) on Sunday August 03, 2008 @08:18PM (#24461237)

    Watch out! Anyone pointing out how a kdawson story isn't news gets moderated down as a troll. I can't even work out how this got out of the firehose.

  • by LSD-OBS (183415) on Sunday August 03, 2008 @08:23PM (#24461267)

    Yup, not cool.

    Word to the wise: if you're going to actually start advertising a project, please make sure you have some binaries built for some common relevant platforms, and make sure you have some decent information online even if it's just an ugly page with screenshots or examples of what it does.

    In this case, we're talking about some scripts written in Python. At least let people know this on the front page, and list the project dependancies! ie, GraphViz, or whatever.

    This way, your potential users won't immediately discard it due to a lack of compelling information, and your potential (future) developers can see how far you've got and maybe get inspiration to chip in and help!

    That said, this sounds like it should be a great tool for beginner or intermediate SQL users, and I look forward to throwing a few of our mammoth 12-table-join queries at for much fun.

  • Re:Or... (Score:2, Informative)

    by mino (180832) on Sunday August 03, 2008 @08:33PM (#24461329) Homepage

    That, and not using medium and low duty databases lile MSSQL and MySQL can go a very long way to keeping users happy.

    Honestly, to describe MSSQL as "medium and low duty" is pretty rich. You'd best believe I'm happy to bash MS as much as the next guy but SQL Server is a high-performing, highly maintainable, high-availability database and doesn't deserve to be mentioned in the same sentence as MySQL.

    Hell, MSSQL might actually be the only truly good product MS make -- in fact, it probably is. It's not a toy and people who assume it is, just because it comes from MS (I'm not saying this is what you're doing, but people DO do this) just show that they don't know what they're talking about.

  • by MBCook (132727) <foobarsoft@foobarsoft.com> on Sunday August 03, 2008 @08:39PM (#24461369) Homepage

    Explain/describe exists in MySQL, it's just very hard to do.

    Is it possible to bring Oracle down? I would think so, it would just take a lot (note: assuming normal hardware, not a large high-power cluster). Is it possible to take MySQL down? Easily. It can be surprisingly easy to lock the server completely. Even when you select off one set of tables (A) and want to insert into another set (B, possibly in a different schema/DB) it is possible to have things locked. It's very easy. We haven't seen a crashing bug in MySQL in a while (fun: a query that formated dates with the date format function could reliably crash MySQL 4.0 or 4.1 (don't remember which).

    Does explain help? No. On Oracle it may help. In Postgres it seems to help. I have no experience with MSSQL. In MySQL you have to watch out. While it can be useful, it is very limited.

    It's row counts can be horribly useless. It can list 1.2 million rows when in fact it can take a fraction of a second to get the data because it's all in an index in memory.

    Worse: it will run the query for you. Under some circumstances (using a subquery can do it, using more than one level of subquery is almost guaranteed to do it) it will just run the inner query and then use that to produce results. This means that describe/explain can lock the database and take hours to return (if you had a query that was bad enough and didn't kill the describe/explain). It's all the fun of running the real query, without the results actually presented to you.

    Note: We're using 5.0 (since 5.1 isn't production ready yet). Some of this may be fixed.

  • Are you serious? (Score:2, Informative)

    by SpasticWeasel (897004) on Sunday August 03, 2008 @08:51PM (#24461437)
    So SQL Server has had a graphical execution plan view for ever, and it's better than this lameness. But of course its not free, and we all know that free software is better, even when it sucks. Seriously, compare this to the real tools included with a serious RDBMS, and I have to question why this was even posted. It's almost farcical.
  • Re:WTF (Score:3, Informative)

    by ahmusch (777177) on Sunday August 03, 2008 @08:52PM (#24461443)
    Really? Most of us would call recursive SQL "looping" SQL, and something like this in Oracle is recursive:

    SELECT LPAD(' ', 2*LEVEL, ' ' ) || ename empName, dname, job,
    sys_connect_by_path( ename, '/' ) cbp
    FROM emp e, dept d
    WHERE e.deptno = d.deptno
    CONNECT BY PRIOR empno = mgr
    ORDER SIBLINGS BY job;

    Heck, even ANSI finally got into recursive SQL using the WITH clause:

    with TransClosedEdges (tail, head) as
    ( select tail, head from Edges
    union all
    select e.tail, ee.head from Edges e, TransClosedEdges ee
    where e.head = ee.tail
    )
    select distinct * from TransClosedEdges;

    Now let's imagine queries with multiple levels of nesting using such clauses - after all, any SELECT statement can generally be used in any FROM clause.

    Now, perhaps you're Chris Date or Fabian Pascal and are truly concerned with the completeness of SQL as implementing the relational model. For the rest of us, however, recursive SQL can answer interesting questions without getting into the nastiness of procedural code.

    Oh, and considering the default join in virtually any SQL database is a nested-loop join, I'd say all databases loop by default. And a statement as innocuous as :

    select * from a, b, c;

    Can absolutely crater cpu and I/O performance. If each has 1,000 rows and there's not enough memory, there's 1,000,001 table scans. Hope your disk is fast.

  • Re:EXPLAIN (Score:3, Informative)

    by Craig Ringer (302899) on Sunday August 03, 2008 @09:23PM (#24461595) Homepage Journal

    Another comment here revealed part of why someone might think a tool like this was useful:

    In MySQL, EXPLAIN apparently works more like PostgreSQL's EXPLAIN ANALYZE (and related features in other RDBMSs). MySQL's EXPLAIN actually executes the query rather than just running it through the query planner. The documentation [mysql.com] even warns that data modification is possible with EXPLAIN in some circumstances.

    If your database gives you no way to ask the query planner what it will do without actually executing the query, something like this begins to look faintly useful. Personally, though, I can't imagine voluntarily using such a database.

  • by russotto (537200) on Sunday August 03, 2008 @09:33PM (#24461649) Journal

    Is it possible to bring Oracle down? I would think so, it would just take a lot (note: assuming normal hardware, not a large high-power cluster).

    Oh yes, Oracle can be brought to a grinding halt (even on substantial hardware) by a big nasty query. It may not be crashed, but it's nonresponsive. Especially annoying when there is no need for the cartesian product; Oracle's pessimizer just chose to do one when something else was MUCH more appropriate. Alas this tool would not catch that situation (but EXPLAIN PLAN does).

  • by killjoe (766577) on Sunday August 03, 2008 @09:55PM (#24461765)

    Depends on your database. I know I have been able to bring SQL server down with a query.

    Try this...

    begin transaction

    update rows set a=b where x=y

    commit transaction.

    On your workstation this could run really fast because you only have ten records. On the production database server this could crush the server if you had a few million records effected.

  • by GoofyBoy (44399) on Sunday August 03, 2008 @10:05PM (#24461847) Journal

    >In practice, there's no difference.

    To a DBA its a big difference.

    1. Just a massive slow down - login (SQL Server there is a Dedicated Administrator Connection, don't think I've had problems connecting with a problem Oracle db as long as I can get on the OS (partly because sessions are processes)), and just kill the process. The DB should clean everything up. (as long as its not a toy db; I'm looking at you MySQL.)
    2. A crash - then you have to go through a whole number of steps to bring it up and then verify the data is ok, then let everyone back in. There may be an backup involved if you are unlucky. You definitely want to figure out what happened.

  • by Shados (741919) on Sunday August 03, 2008 @10:40PM (#24462103)

    Maybe my poor queries writing skills are bad :) Because I've seriously -tried- before... cross joins on 100+ tables, all of which containing several douzen gigs of data, totally multiple terabytes...the scheduling was good enough to give the query very low priority, leaving the server ok.

    If you use (in SQL Server at least) the default settings, that will basically render your database useless... but if you use the newer locking strategies from 2005 (which had been available in Oracle for ages), the tables won't be locked, and everything will be fine. Laggish for sure, but the server definately won't tank.

    Somewhere I used to work for, I would even run millions of inserts, continually, on our staging server (which was shared among a 50 or so devs) because I was testing an ETL routine... the server was slower for sure when I messed up and did a multi cross join on my insert source on a friday evening, but it never brought it down.

  • by Craig Ringer (302899) on Monday August 04, 2008 @02:08AM (#24463221) Homepage Journal

    craig:~$ psql
    Welcome to psql 8.3.3, the PostgreSQL interactive terminal.
    craig=> set statement_timeout=1000;
    SET
    craig=> SELECT generate_series(0,100000000000000000);
    ERROR: canceling statement due to statement timeout

  • by Craig Ringer (302899) on Monday August 04, 2008 @02:19AM (#24463279) Homepage Journal

    I think you might've missed the point.

    The term SELECT statement generally refers to the whole statement, including FROM, WHERE, HAVING, etc clauses.

    This is pretty clear in context, as it'd be nonsensical to produce a graphical explain tool for the result field list in the SELECT clause its self.

    That's why the parent said SELECT statement not SELECT clause .

    As it happens the same issues regarding the need for planner knowledge etc are true for DML like INSERT, UPDATE and DELETE. It's not about SELECT at all, but rather any non-DDL query.

  • by Blakey Rat (99501) on Monday August 04, 2008 @10:24AM (#24467327)

    Ditto. I downloaded it to take a look and see how good it was at parsing T-SQL, since we have a few saved T-SQL queries with WHILE loops in them. I gave up after seeing it's... nothing. Just a Python script. It requires Graphviz, Python, and Pyparsing (even though it comes with pyparsing!? WTF!), and even more damning is that you can't use it for ad-hoc queries, the query has to be saved into a file first.

    Someone slap a GUI on this that lets you paste in a query, and bundle all the requirements along with the package, and then we might have something. Right now, I'll just stick with MS SQL Server's query grapher.

Little known fact about Middle Earth: The Hobbits had a very sophisticated computer network! It was a Tolkien Ring...

Working...