Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Cloud Databases

The Joys and Hype of Hadoop 55

theodp writes "Investors have poured over $2 billion into businesses built on Hadoop," writes the WSJ's Elizabeth Dwoskin, "including Hortonworks Inc., which went public last week, its rivals Cloudera Inc. and MapR Technologies, and a growing list of tiny startups. Yet companies that have tried to use Hadoop have met with frustration." Dwoskin adds that Hadoop vendors are responding with improvements and additions, but for now, "It can take a lot of work to combine data stored in legacy repositories with the data that's stored in Hadoop. And while Hadoop can be much faster than traditional databases for some purposes, it often isn't fast enough to respond to queries immediately or to work on incoming information in real time. Satisfying requirements for data security and governance also poses a challenge."
This discussion has been archived. No new comments can be posted.

The Joys and Hype of Hadoop

Comments Filter:
  • Well No Shi... (Score:5, Informative)

    by bigdady92 ( 635263 ) on Wednesday December 17, 2014 @02:00PM (#48619471) Homepage
    Hadoop is not a magic thing that can all of a sudden produce reams of new data sets. The setup, on an enterprise scale, takes thousands or tens of thousands of dollars in hardware. Then you have the Map/Reduce jobs to create as well as pointing all your data to the new clusters. Then the tweaking starts, and then your pointy haired Boss or Accounting PencilTwit comes to you and demands results for all of this capital expense you just had them buy for some pinhead to get a better dashboard in sales.

    Hadoop, done right, takes many departments to work on and organize in a big enterprise. Small shops may have one guy who is both SA and Programmer who could get the job done enough to make a difference. Furthermore, you NEED a full install from a big vendor. Installing Hadoop from OpenSource is a nightmare, and the big vendor's make it painfully simple to get the job done quickly. Can you do it by hand? Sure. Do you have the time? Not when you have other projects to work on and you can spend the companies capital to get the install and config done in 1/10th the time. /Cloudera Certified //A year later and they still don't know how to get data through the pipeline ///Setting up the hardware was a BLAST!
    • by Xyrus ( 755017 )

      This also assumes the data and the domain you're trying to apply Hadoop to are ones that Hadoop can effectively be useful for. A lot of PHBs and such are pretty ignorant when it comes to the problems that Hadoop can be applied to

    • "The setup, on an enterprise scale, takes thousands or tens of thousands of dollars in hardware"

      You are off by at least two orders of magnitude, at last by any reasonable definition of "Enterprise".

      An enterprise grade hadoop cluster that is dealing with enterprise workloads is going to start roughly in the mid-six figures and grow into the low 7 or 8 figures over time and scale. Scale is not cheap.

  • by Anonymous Coward on Wednesday December 17, 2014 @02:03PM (#48619513)

    Checkout the job postings in central Maryland near BWI: Java, Hadoop, TS/SCI with full scope poly. Hundreds of postings.

    There is only one customer in near BWI that requires the last.

  • by Code Herder ( 937988 ) on Wednesday December 17, 2014 @02:08PM (#48619589)
    I used to be a big fan of Hadoop until I gave Apache Spark a try. My god, the speed, ease of use and install simplicity was just ridiculous. I mean, words failed me the first time I used it, I got it installed and working under 2 hours and it was so blazing fast, it was just a joke.

    For people who took a look a few years back, it has matured a lot from an interesting prototype to something I now use in production on my clients data. Documentation is still a bit sketchy for niche functions but it's improved a lot also.

    https://spark.apache.org/ [apache.org]
    • by Anonymous Coward

      Running spark on hdfs seems to be a pretty good idea though, and you'll still need a YARN setup.

      Or you can push your spark deployment on mesos.

  • by Anonymous Coward on Wednesday December 17, 2014 @02:11PM (#48619635)

    The reason they're running into problems is they haven't fully embraced the synergy in B2B ROI cloud possibilities. If they utilize agile scrum development, they will be able to be on the bleeding edge of viral blog immersion while reaching convergence with real-time content management crowdsourcing.

    • You forgot vertical integration. :-p

    • The reason they're running into problems is they haven't fully embraced the synergy in B2B ROI cloud possibilities. If they utilize agile scrum development, they will be able to be on the bleeding edge of viral blog immersion while reaching convergence with real-time content management crowdsourcing.

      The first ten words made sense and were in actual English. You're doing it wrong.

  • by mveloso ( 325617 )

    I remember Cloudera saying that most people use hadoop for ETL. Not sure if you've checked, but hadoop is like the ne plus ultra of ETL tools. It's worth a look if you have to transform lots and lots of data.

    • by ionrock ( 516345 )
      <quote><p>I remember Cloudera saying that most people use hadoop for ETL. Not sure if you've checked, but hadoop is like the ne plus ultra of ETL tools. It's worth a look if you have to transform lots and lots of data.</p></quote>

      The problem is you still have to Extract data from other systems, Transform them to make them suitable for Hadoop and Load them in HDFS (or S3). Once that data is available to Hadoop, it becomes extremely powerful.

      Practically all analytical systems have the
    • by sfcat ( 872532 )

      I remember Cloudera saying that most people use hadoop for ETL. Not sure if you've checked, but hadoop is like the ne plus ultra of ETL tools. It's worth a look if you have to transform lots and lots of data.

      Um, for what purpose? After you use it as an "ETL" tool, the idea is that afterwards you can query it, analyze it, etc. Traditionally you used an ETL tool to get data into a database then used tools that spoke SQL to analyze the data. With Hadoop, you have to write all your ETL tools yourself. So using Hadoop as an ETL tool is really a bridge to nowhere.

  • by Anonymous Coward

    That means it is better.

  • Hadoop is good at generally running massive queries of tons of data in a relatively efficient amount of time. I say efficient and not fast, becuase the requests can vary from well structured for grid data sets to massive bloated ugly queries that would be massive bloated and ugly in any DBMS environment. If you want to talk about regulation, etc.. I think you're batrking up the wrong tree with Hadoop. If you're concerned with regulation, seed the DB with unique though meaningless data when importing and avo

  • Comment removed based on user account deletion
    • by cruff ( 171569 )

      ... back in my day we played nethack on the VAX-785!

      I started out playing rogue on the Vaxen. Then there was plain hack. Those were the days. Still play nethack now and again.

  • Paywalled (Score:4, Informative)

    by LordLimecat ( 1103839 ) on Wednesday December 17, 2014 @02:54PM (#48620057)

    Since when is it acceptable to post articles that are paywalled?

    We're not even going to pretend to care about the article?

    • Re: (Score:2, Insightful)

      by Anonymous Coward

      Reading TFA before responding is considered bad form.

    • Yeah, obnoxious. People ought to browse submissions in private browsing mode or something. Then if they happen to have a sub to a paywall site they'd see the article the same way people who don't would.

  • And while Hadoop can be much faster than traditional databases for some purposes,

    If by "some purposes" you mean "idiots who don't know how to design a relational database", then sure.

  • Apple bought out Beats for $3B and change. They make middling, overpriced headphones that come in a variety of colors. Facebook dropped $19B on an app that sends messages. Facebook dropped $1B on a company that makes Polaroids on your smartphone.

    $2B of investments into multiple companies that are working on a technology platform that provides methods for sifting though vast amounts of certain types of business data, running on low-cost, commodity hardware and backed by an open source project seems positivel

  • by michaelmalak ( 91262 ) <michael@michaelmalak.com> on Wednesday December 17, 2014 @04:11PM (#48620953) Homepage
    Free nasdaq.com mirror [nasdaq.com] of this particular article.
  • by Anonymous Coward

    So, Hadoop is a framework for processing embarrassingly parallel (running the same function on a massive amount of aligned data chunked into pieces) tasks using Java.

    This seems like a cluster-fuck (pun intended) to me that could get done as well or faster with an ordinary cluster environment with less software and memory overhead. For those in HPC, am I missing something? This also seems to have a very narrow scope of usage so you're getting a lot of mess for moderate returns.

    • Nope, you've missed nothing. Its over-hyped crap that only gained initial popularity because someone did it in Java, and enterprises like Java.

    • by godrik ( 1287354 )

      The main interest of Hadoop is that it makes it easy to do out of core computation if the computations are loosely coupled and are mostly IO-bound. For anything else, Hadoop is probably not the right tool and is overhyped and typically inefficient.

Technology is dominated by those who manage what they do not understand.

Working...