Follow Slashdot stories on Twitter


Forgot your password?
Graphics Databases Programming Software IT

Visualizing Complex Data Sets? 180

markmcb writes "A year ago my company began using SAP as its ERP system, and there is still a great deal of focus on cleaning up the 'master data' that ultimately drives everything the system does. The issue we face is that the master data set is gigantic and not easy to wrap one's mind around. As powerful as SAP is, I find it does little to aid with useful visualization of data. I recently employed a custom solution using Ruby and Graphviz to help build graphs of master data flow from manual extracts, but I'm wondering what other people are doing to get similar results. Have you found good out-of-the-box solutions in things like data warehouses, or is this just one of those situations where customization has to fill a gap?"
This discussion has been archived. No new comments can be posted.

Visualizing Complex Data Sets?

Comments Filter:
  • by Anonymous Coward on Monday January 19, 2009 @10:53PM (#26524337)
    Portraits of complex networks []

    Abstract: We propose a method for characterizing large complex networks by introducing a new matrix structure, unique for a given network, which encodes structural information; provides useful visualization, even for very large networks; and allows for rigorous statistical comparison between networks. Dynamic processes such as percolation can be visualized using animations. Applications to graph theory are discussed, as are generalizations to weighted networks, real-world network similarity testing, and applicability to the graph isomorphism problem.

    • by Anonymous Coward on Monday January 19, 2009 @11:55PM (#26524819)
      If you think that's worth looking at... how about this? []
    • Abstract: We propose a method for characterizing large complex networks by introducing a new matrix structure, unique for a given network, which encodes structural information

      I should probably go follow your link, but on the face of it, this sounds like a 60's paper about the adjacency matrix :)

    • by tucuxi ( 1146347 )

      Truly nice way of getting high-level information from graphs. I'll probably use it sometime in the future, as I actually work with graph viz, and was looking for ways to visually compare related graphs.

      And for those too lazy to actually open the PDF and look at the pictures, the idea is to draw histograms of 1st order vertex degree up to nth order vertex degree (where order-l vertex degree gets defined as how many other vertices can be reached in exactly 'l' forward hops, as found by a breadth-first search

  • get rich slow (Score:1, Insightful)

    by it_begins ( 1227690 )
    SAP's "German engineering" stems from the philosophy of the more efficient the better.

    Unfortunately, this means that it is much too utilitarian (and ultimately, why products like Peoplesoft are making headway).

    If you find that you have developed a good product to help with operating SAP, you can sell it as a third party add on. Many of the popular add on's were created out of a sense of frustration with the "mother product".

  • PtolemyPlot (Score:2, Informative)

    by technofix ( 588726 )
    PtolemyPlot and Java.
  • by Anonymous Coward on Monday January 19, 2009 @10:59PM (#26524393)

    Have fun!


      Have fun!

      Seems to be excellent inspiration on visualizing data..

      canâ(TM)t open the page âoeâ because it could not connect to the server âoevisualcomplexity.comâ

      it apparently advocates the RIAA approach.. pretend its not there at all.

  • i have experience rendering massive datasets via OpenGL and this where a lot of visualization still happens in government and big business.

    These can be incorporated into other general shelf visualization tools or just be used standalone on any major platform as long as the machine has the horsepower, including, not suprisingly, a powerful GPU.

    the first computer i started doing visualzation on was a SGI. imagine that.
  • by zappepcs ( 820751 ) on Monday January 19, 2009 @11:03PM (#26524421) Journal

    How are you supposed to handle the data if you do not understand it? Sure, there can be too much to see/think about at one time, but if you don't understand it, how can you visualize it usefully?

    I am asking because I have a problem: Where I work, I understand the data and I make efforts to visualize it for others. The trouble starts when they don't understand the data and it's sources and limitations, so what they see in my visualization is all they know of it, and they make assumptions about it. I've even had people worry that the network is down because there were holes in the collected data which then showed up in the visualizations.

    If anyone has some good URLs for such thinking, I'd be grateful.

    I simply do not understand how you can visualize data for people if you yourself do not understand it.

    • Re: (Score:3, Interesting)

      here is a *sample* of some of my early work that i did long ago when i was just starting out. i dont have any mature 100% working screenshots but you get the idea.

      the lat, lon and depth values are courtest of NOAA, freely available. this is a screenshot of a real time frame in openGL of the world with each vertex pair colored by depth. you can rotate it, probe it and a few other things.

      link []
    • If give you a scatterplot of X vs Y you would instantly be able to see what kind of relationship exists (if any). You might notice an exponential relation, linear relation, or that they are the exact same. For more statistical things, you might notice two distinct groupings, correlations, or completely uncorrelated date.

      So your boss doesn't know what the relationship between X and Y are, but he wants to save $ and increase profits. You tell him that based on this relationship that you proved exists (whate

      • by TapeCutter ( 624760 ) on Tuesday January 20, 2009 @04:44AM (#26526301) Journal
        Sorry but I think the GP is spot on.

        What you are doing in your post is investigating the data until you UNDERSTAND what is usefull and then presenting (visualising) it for you're boss, who probably adds another layer of "visualization" for his boss, etc. (ie: You are acting as human visualisation tool that the boss can use to visualise the output of silicon visualisation tools)

        To scale up you're simple X/Y plot of two variables to corporate size you propose using a visualization tool that UNDERSTANDS database structures and UNDERSTANDS the fact that to plot strings against integers you need a default transform, etc, etc. You are handed a bunch of DB's with hundereds of tables, thousands of columns and countless transaction transforms ferrying data from one DB to the other.

        So you start with all possible pairs to see if there is a nice easy curve that can relate them. You get 10,000 statistically significant relationships - the problem posed in TFS is how do you now visualize all those graphs to find the relevant relationships without UNDERSTANDING the data.

        As to TFS, visualization relies on data minning which will never be "solved" because given enough data you can always add one more level of UNDERSTANDING (see: Godel []). This is not to say that trying to solve it is pointless. On the contrary, google news is excellent and accessible example of how far things have progressed in the last couple of decades.

        Simply presenting multiple known facts/relationships in an easily accessible format takes a deep UNDERSTANDING of the data. Even if you do UNDERSTAND the facts/relationships, creating the format is an art that has few masters [].
    • You should do some 'Exploratory Data Visualization'. Use GGobi ( it's free, works fast, can handle lots of data, lots of different formats.

      There are a bunch of different options, but when I'm trying to figure out what some basic data relationships, I use the scatterplot matrix (2D matrix of 2D plots) or the 2D tour.

  • R language (Score:5, Informative)

    by QuietLagoon ( 813062 ) on Monday January 19, 2009 @11:03PM (#26524423)
    There was a thread about the R language [] a couple of weeks ago. Look it up and read it....
    • Re:R language (Score:5, Informative)

      by koutbo6 ( 1134545 ) on Monday January 19, 2009 @11:11PM (#26524503)
      I second that. If you are visualizing graphs be sure to get the igraph package which can be used with R, Python, C, or Ruby. []
      Processing is another package that is geared towards data visualization which java developers might find easier use []
    • I third this. It is a great way to look at data and determine meaningfulness of said data.
      To many times I've seen people jump to conclusions about a small part of a data set, without looking at large sets of data.
      Some people don't understand it until you put their fingers on the chart and make them follow the line.
    • by flynt ( 248848 )

      Also agree about R. Also, consider hiring a statistical consultant. They do this kind of thing all the time.

    • by rev063 ( 591509 )

      I'd also suggest R. One of the problems with visualizing complex data sets is that, almost by definition, the prepackaged graphics tools don't allow you to create custom-designed graphics that suit the particular data-set you're working with. But with a bit of programming in R you can get amazing results.

      There are some R packages that can help too -- I write about one of them, ggplot [], here. (Disclaimer: I work for a company that provides support for R.)

  • by Mithrandir ( 3459 ) on Monday January 19, 2009 @11:12PM (#26524509) Homepage

    The infovis community has been dealing with these subjects for years. There's many different visualisation techniques around. Here's a list of the past conferences and the papers: []

    Plenty of good products out there, but the one that I like most is from Tableau Software (

    • Tableau Desktop is an interactive analysis and visualization product that connects to relational and cube data sources to help people see and understand their data. There was a webinar [] (slides - PDF []) back in November 2008 covering Blastrac Global's success in using Tableau with their ERP system.

      Disclaimer: I work at Tableau Software, so I encourage you to see for yourself with a free trial: []
  • I used Xgobi ( for a lot of things back in the day. It gave me the ability to 'see' and understand high dimensional data sets quite easily when I was looking at computer vision research.

  • Spotfire (Score:3, Informative)

    by DebateG ( 1001165 ) on Monday January 19, 2009 @11:24PM (#26524583)
    I work in biology, and we use Spotfire DecisionSite [] to visualize and analyze a lot of our massive genetic data. It's a very powerful program that I barely know how to use. It seems to have packages able to analyze pretty much anything you want, and you can even write your own scripts to help things along.
  • by Shados ( 741919 ) on Monday January 19, 2009 @11:28PM (#26524607)

    Wouldn't any everyday cube browser along with any tool to detect base dimentions in a datawarehouse schema do the trick? You may have to add a few custom dimentions on your own depending on how shitty the master data is (I don't think that can be helped, no matter the solution, if a dimention is "these two fields multiplied together times a magic number appended to the value of another table", you need to know, no tool will guess), but aside that?

    Thats usually what I do anyway. I dump my data in a datawarehouse, use whatever built in wizard can auto-generate dimensions, then play with them in a cube browser. Works for even pretty archaic home-made multi-thousand-tables-without-normalization ERP systems I had to work with in the past anyhow.

  • Not sure what you can use to create a visualization, but the information you need is in the IMG.

    I don't have a need to develop a visualization of the whole of our SAP implementation, just my little FI-CO corner of it, and that's a big enough pain

  • by Anonymous Coward on Monday January 19, 2009 @11:35PM (#26524649)

    Your ERP isn't supposed to directly analyze the data. You're supposed to use a Business Intelligence software package for that. This being SAP, I believe they'll try to sell you Hyperion.

    • They'll push you towards SAP BW. A beast at best. Very strong tool, but like anything SAP, difficult to master.
    • by afidel ( 530433 )
      Not likely after Oracle bought Hyperion. We're a JDE shop and went with OBIEE since our major modules aren't yet supported in Hyperion and with the acquisition noone knows what the timeline will be for adding them so we are rolling our own with OBIEE. So far the framework seems to be good, just not sure how it will do when it hits production loads since we are still in early development.
      • Disclosure - I work for Oracle, though not for the OBIEE team. The people I know that work with OBIEE repeatedly claim the best scaling BI architecture in the industry. They even went so far as to describe in fairly deep detail the specific technical reasons for it, which I believe added up to being able to horizontally scale the software components to meet your specific performance needs.
        • by afidel ( 530433 )
          Oh, if we give Oracle enough money I'm sure we can scale to whatever level we would want, the problem is bloody cost. The fact that OBIEE AND the database are licensed per core is almost criminal. Six figures per CPU is just so expensive it blows my mind, especially since most of the business logic in OBIEE is being handled by our own custom code.
  • I'll try to answer your question without the key info needed: "What is the data your modeling?"

    You're on the right track...

    Either way, from experience i'd say you're answer is "this just one of those situations where customization has to fill a gap"

    Be warned though, out of the box solutions do exactly what's on the box. Anything else is going to be modeled by you, or customized (usually at a high rate), by the vendor.

    That being said, I've used oracles' solution []

    • Hey, you managed to use both "your" and "you're" both correctly and incorrectly in one post!

      I think you must be taking the idea of stochastic grammar somewhere it doesn't belong...

  • by Anonymous Coward

    Just take the first 65k rows and dump them into excel and create a pivot table.

  • Can I suggest you look at Centruflow [], which is an application designed to analyse dynamic data in a nice, user friendly way.
  • ...which I just took offline for a quick database upgrade. Er, sorry, will be back online soon!

  • Take a look at Prefuse []. I haven't used it myself (I considered it for a project), but it may have the right mix of a good Java API and flexibility/customizability that you're looking for. As a bonus, it's BSD licensed. YMMV. Good luck.
  • by sleeponthemic ( 1253494 ) on Tuesday January 20, 2009 @12:01AM (#26524867) Homepage
    Into a matrix screensaver.
    • by gringer ( 252588 )

      I'm not sure if you're jesting about that, but I wrote a patch for xscreensaver to allow just that functionality for the xmatrix hack. Looks great for things like 'ps -eo command'. See any version from 5.04 onwards.

      I wrote this after noticing the DNA encoding and thinking, "hey. Wouldn't it be great if I could feed this with real DNA sequence?"

      Unfortunately, I had to send the same stream to each of the feeders on the screen, which means it can only show one vertical line of data, rather than 40 or so.

      • I am joking. When I read the article I thought back to the bit where one of the guys in the ship is looking at the matrix visualisation and making out like he could read and translate the scrolling data just as if he were watching television.

        Good idea, by the way.
  • I've had really good success using an information visualization tool called Starlight on a number of projects like this. Everything from process modeling to military intelligence. It's a commercial spin-out from the DOE PNL lab information visualization research in Washington State.

  • I'm sure it's my mathematics background, but when I saw the headline I assumed the author would be discussing something involving the square root of negative one, to which my response was, "Silly author, you can't visualize four dimensions. (Sober.)"
    • Re: (Score:2, Insightful)

      by SillyPerson ( 920121 )

      I'm sure it's my mathematics background, but when I saw the headline I assumed the author would be discussing something involving the square root of negative one, to which my response was, "Silly author, you can't visualize four dimensions. (Sober.)"

      You have a mathematical background and can not visualize four dimensions? Here is how you do it: Just visualize the problem in n dimensions and then set n=4.

  • IBM data explorer (Score:3, Interesting)

    by shish ( 588640 ) on Tuesday January 20, 2009 @12:15AM (#26524959) Homepage
    I have no idea how I stumbled across this [], but it looks very pretty...
  • Hello, Visualizing large data sets can be readily solved if you have following items available:

    Both tools combined allow you to easily visualize large data sets and adjust the resolution of your data.

  • You could use Pentaho with one of the SAP plugins.
  • by FurtiveGlancer ( 1274746 ) <AdHocTechGuy AT aol DOT com> on Tuesday January 20, 2009 @12:27AM (#26525013) Journal
    Large, highly complex data sets are best described on the back of four cocktail napkins or on a fixed white board in a shared conference room. ~
  • by hemp ( 36945 )

    Take a look at Essbase []. It is now owned by Oracle and is used by finance departments at most Fortune 100 companies.

    As you have do doubt discovered, SAP is great for transaction level detail, but kinda sucks at the big picture and doing "what ifs". Essbase's tight integration with MS Excel and very cool reporting tools makes it a much easier to analyze your data than looking at spending reports from SAP.

    Mainly implemented by budgeting and finance groups, Essbase is not

  • It all depends on what output you need?

    DAD software [] has the ability to customize data types, multiple inheritance of objects, and to define different relationship types.

    You can then trace along object relationships bringing back a dynamic graphic depending on what you want to show (and spit out to PDF).
  • Have a look at Processing [], and the book Visualising Data [] by Ben Fry.
  • What kind of data is it? What are you trying to figure out by looking at the data? What type of people will be looking at it? Depending on these answers, I may recommend one of the leading BI tools on the market. IBM Cognos SAP Business Objects Microstrategy These COTS solutions are focused on visualizing masses of data, usually for some type of pattern discovery or decision making.
    • Normally, BI and data warehousing tools in general are used for visualising things like sales of beer vs temperature or cost of toilet roll relative to the same quarter last year. Basically, transactional data and trends.

      I think what the OP is looking for is either something to generically draw interrelationships between tables, or between the defferent views on the material master, BoMs and the like. In either case the problem is that when you do it with any non-trivial data you'll need to hire a basketb

  • a company that competed with SAP. This is a problem that is industry-wide.

    The solution you probably want is to make sure your SAP is set up to use a common relational database, then use another tool (Crystal Reports, Seagate, etc.) to visualize your data in ways that are not already built-in to your ERP system.
    • I'm a SAP consultant and have "cleaned-up" several data sets over the years. I'm lucky in that all of my customers are running it on Unix with DB2.

      I wrote a series of PHP scripts that go through everything and present inside a somewhat simplified web interface. I also use Crystal Reports to provide "cleaner" copies.

      But, at the end of the day, it's more of a brute-force exercise then anything. Providing a simpler interface then R/3 is the first step, but you have to have users that are willing to use. What I

      • I was trying to offer a "simple" solution... although of course it has its drawbacks.

        Our biggest customer complaint was that we did not use a "common" relational database (SQL Server, Oracle, etc.). Therefore, we had to provide all the data views the customer might desire which is not feasible.

        The simplest solution to this dilemma was to modify our system to use a database that the customer could access themselves... read-only of course. Of course the relational databases were slower, and the modifica
  • I first saw a video of Hans Rosling [], who had some very unique ways of visualizing data that would otherwise be useless to a simple mind such as mine.

    After I watched that, I found a piece of software called Tableau []. I downloaded the trial version, and really liked how easy it made visualizing data for me. I can take the data I have, and Tableau will see how it's connected and allow you to generate visual reports of the data. I'm not saying that it'll work for everything, but it certainly does what I need i

  • depending upon the problem domain, a very useful (albeit expensive) set of tools is StarLight, written for the US Government: []

    highly recommended if you've got tough visualization problems. this tends to get used for the *really* interesting visualization challenges.

  • I am pimping my own employer's product here, and I'm admittedly biased, but we've got a phenomenal web-based/SaaS solution to this exact problem. We've done work for clients with billions and billions of rows of data (like 50+GB) and we've got a unique database that can generate reports in seconds that could take upwards of fifteen minutes on a SQL-backed solution. You can take any report, drill down arbitrarily into the data below, flip through the datasets, arbitrarily flip axes, filter out unwanted da

  • You haven't stated what you're needing this for, I assume it's not just for your own consumption. I work in Business Intelligence (Kimball Method Dimensional Modeling etc) and we use PeopleSoft ERP in our workplace. We have found that the best way of displaying/using this type of eclectic data is to model it in star schemas and put it in data cubes. This way the people who use the data can really use the data for analytical purposes... any other way just makes more work for us IT people, this is great fo
  • Cytoscape (Score:5, Informative)

    by adamkennedy ( 121032 ) <adamk@c[ ].org ['pan' in gap]> on Tuesday January 20, 2009 @02:24AM (#26525705) Homepage

    I had a similar situation to yours recently, except I was trying to detangle a horridly complex product substitution graph for a logistics company.

    I used a bunch of Perl to crunch the raw databases into various abstract graph structures, but instead of graphviz or something created by/for developers, I found that the best software for graph visualisation is the stuff that the genetics and bio people use.

    The standout for me was a program called Cytoscape [] which can import enormous graph datasets and then gives you literally dozens of different automated layout algorithms to play with (most of which I'd never heard of, but it's easy to just go through them one at a time till something works)

    It's got lots of plugins for talking to genetics databases and such, but if you ignore all that and use Perl/Ruby/whatever for the data production part of the problem, it's a great way to visualise it.

  • Looks like SAP tricked another sucker.

    A company I worked at several years ago migrated to SAP. It took several hundred million dollars, 6 years, AND the companies main branch was already using SAP. All to replace an MVS system that cost under $5M a year to run, did more, and was much faster.

    SAP is NOT a business application. It's a programming environment where you get to build and customize your own. Then those German Wunderkids break your customizations every time there is an SAP change.

    A "good" business

    • Re: (Score:3, Insightful)

      by Hognoxious ( 631665 )

      SAP is NOT a business application. It's a programming environment where you get to build and customize your own.

      Looks like you work for another company that tried to reimplement their old system word for word and step by step in SAP.

      And customization has a specific meaning in SAP that doesn't involve any coding. It appears you don't know that, which doesn't improve your credibility.

      A "good" business software package allows you to customize "it" to match your business processes.

      If you want that, don't use a

  • How about sql-fairy [] that's open source and very complete?
  • Of all the products out there, Business Objects strikes me as the best solution to quickly engage you and provide strictly the useful information your looking for. They were also recently acquired by SAP so I would recommend you ask someone at your company what the corporate availability is to their their products. Maybe get in touch with the SAP account executive. If your company doesn't already have the availability to use the product you would probably qualify for some reduced price incentive. The Busine
  • Have you looked at data mining solutions? Someone mentioned Pentaho already, but there's also:

    Rapid Miner []

    Orange Data Miner []

    all of which are packed with enterprisey features. But you may have to learn some stats. Once you get past what you can do with the pre-packaged stats methods, then head for R [], or write a RapidMiner plugin in python.

  • Check out Stephen Few's blog [] Good info there. For my money, Tableau is the way to go. It's cheap enough and easy to implement. And it reads practically everything. The visualizations are cool too.
  • There's a product called JMP []. It's relatively inexpensive. It's great at visualizing (especially statistical) data. They've got a 30-day trial. Check it out.

    disclosure: I have an association JMP's parent company.

    • by d3vi1 ( 710592 )

      It also runs on Linux MacOS and Windows. It works great with large data sets and supports a lot of data sources (ranging from XLS files to SAS databases)

  • SAP MDM ? BI ? (Score:2, Insightful)

    by obUser ( 1095169 )
    Since you've already bought licences for SAP ERP, you could get a bargain on the Master Data Mgt component. It also offers support to control the Master Data harmonisation process, which you probably need if you have such a large amount of data.
  • Professor Hans Rosling used an impressive tool to visualize data from UN and other sources to debunk myths about the third world. I don't know what the tool is called, whether it is available, open source or what not. I got the impression from the presentation that it was specially written for the task. No idea if the data sets qualify as complex, but if I had to visualize data I'd certainly check out if this tool is available.
  • General Dynamics offers a product called CoMotion that allows you to visually explore your data and find interesting patterns and trends. []

    CoMotion is a commercial fork of Visage, a collaborative visualization platform designed at Carnegie Mellon University and MAYA Design []:

  • I don't use their product but [] makes a data visualization tool and has a good blog about it, with some interesting Java dev tips thrown in. It might be overkill for the data discussed in the article summary, but sounds pretty badass.

    One really interesting blog article [] talks about something called the "VAST Interactive Challenge", which as near as I can tell is a competition for data visualization tools to go head-to-head a

  • Check out OpenDX [], its visualization capabilities are way beyond Graphviz's and it provides a GUI. It's an open source version of IBM's famous Visualization Data Explorer (initially released in 1991), which IBM converted into an open source project a couple of years ago.

    Quoting the site: "OpenDX is a uniquely powerful, full-featured software package for the visualization of scientific, engineering and analytical data: Its open system design is built on familiar standard interface environments. And its sophis

Thus spake the master programmer: "When a program is being tested, it is too late to make design changes." -- Geoffrey James, "The Tao of Programming"