Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Cloud Databases Apache

Apache Hadoop Has Failed Us, Tech Experts Say (datanami.com) 150

It was the first widely-adopted open source distributed computing platform. But some geeks running it are telling Datanami that Hadoop "is great if you're a data scientist who knows how to code in MapReduce or Pig...but as you go higher up the stack, the abstraction layers have mostly failed to deliver on the promise of enabling business analysts to get at the data." Slashdot reader atcclears shares their report: "I can't find a happy Hadoop customer. It's sort of as simple as that," says Bob Muglia, CEO of Snowflake Computing, which develops and runs a cloud-based relational data warehouse offering. "It's very clear to me, technologically, that it's not the technology base the world will be built on going forward"... [T]hanks to better mousetraps like S3 (for storage) and Spark (for processing), Hadoop will be relegated to niche and legacy statuses going forward, Muglia says. "The number of customers who have actually successfully tamed Hadoop is probably less than 20 and it might be less than 10..."

One of the companies that supposedly tamed Hadoop is Facebook...but according to Bobby Johnson, who helped run Facebook's Hadoop cluster before co-founding behavioral analytics company Interana, the fact that Hadoop is still around is a "historical glitch. That may be a little strong," Johnson says. "But there's a bunch of things that people have been trying to do with it for a long time that it's just not well suited for." Hadoop's strengths lie in serving as a cheap storage repository and for processing ETL batch workloads, Johnson says. But it's ill-suited for running interactive, user-facing applications... "After years of banging our heads against it at Facebook, it was never great at it," he says. "It's really hard to dig into and actually get real answers from... You really have to understand how this thing works to get what you want."

Johnson recommends Apache Kafka instead for big data applications, arguing "there's a pipe of data and anything that wants to do something useful with it can tap into that thing. That feels like a better unifying principal..." And the creator of Kafka -- who ran Hadoop clusters at LinkedIn -- calls Hadoop "just a very complicated stack to build on."
This discussion has been archived. No new comments can be posted.

Apache Hadoop Has Failed Us, Tech Experts Say

Comments Filter:
  • MapReduce is great (Score:4, Insightful)

    by Anonymous Coward on Saturday March 25, 2017 @05:44PM (#54109737)

    If 1) you have a staff of elite programmers like Google or Facebook, who have CS degrees from top universities and are accustomed to picking up new programming languages and tools on a continuing basis; AND
    2) your business has a pressing need to crunch terabytes of logs or document data with no fixed schema and continually changing business needs.

    For the average Fortune 500 (or even IT) shop, not so much. A '90s style data warehouse accessible through SQL queries works much better.

    • by gweihir ( 88907 )

      I have to say that I am less impressed with the quality of the coders at Google the more I know about them. The really good ones are leaving, are thinking about leaving or have already left a while ago. What is left is the mediocre ones that somehow managed to get in.

      • by Anonymous Coward on Saturday March 25, 2017 @06:17PM (#54109891)

        You've done an incredible amount of work to reach this conclusion. Congrats. Did you use map-reduce on your data set?

      • by Anonymous Coward

        If you make people jump through hoops like circus animals to come work at your company you only get the desperate, or the ones who want the job as a status symbol.

        • by Nkwe ( 604125 )

          If you make people jump through hoops like circus animals to come work at your company you only get the desperate, or the ones who want the job as a status symbol.

          Or the ones who like being made to jump through hoops like a circus animal. I guess if you are into that it's okay; who am I to judge?

          • by Anonymous Coward

            I went through the process a few years ago for an SRE position. It was exactly the same process used at most other tech companies: a couple of screening interviews on the office, and a half to 2/3 day of on-site, one on one, specific tech interviews with people who *seriously* know their stuff.

            The campus is overall a weird cult, and they don't have other offices in places I want to live (maybe Pittsburgh, someday), so I don't work there. But they haven't done the really weird interviews that they used to be

        • by gweihir ( 88907 ) on Saturday March 25, 2017 @07:18PM (#54110147)

          Indeed. I went though their "interview-process" a while back at the request of a friend that was there and desperately wanted me for his team. Interestingly, I failed to get hired, and I think it is because I knew a lot more about the questions they asked than the people that created (and asked) these questions. For example, on (non-cryptographic) hash-functions my answer was to not do them yourself, because they would always be pretty bad, and to instead use the ones by Bob Jenkins, or if things are slow because there is a disk-access in there to use a crypto hash. While that is what you do in reality if you have more than small tables, that was apparently very much not what they wanted to hear. They apparently wanted me to start to mess around with the usual things you find in algorithm books. Turns out, I did way back, but when I put 100 Million IP addresses into such a table, it performed abysmally bad. My take-away is that Google prefers to hire highly intelligent, but semi-smart people with semi-knowledge about things and little experience and that experienced and smart people fail their interviews unless they prepare for giving dumber answers than they can give. I will never do that.

          On the plus side, my current job is way more interesting than anything Google would have offered me.

          • I have heard plenty of stories like this.
            And I have to say, while the questions google is asking in an interview are relevant for their business, they are rather simple.
            I guess I would fail an interview, too.
            On the other hand, I work freelance, so big companies are rarely interesting.

            • by lucm ( 889690 )

              The problem those companies face is that they grew so fast that they're struggling with past technical decisions that are difficult to revert (e.g. Twitter and their initial RoR architecture). The wheels keep turning so they end up having to build sophisticated layers on top of their legacy garbage.

              We've all been there. Someone (maybe even you) builds a throwaway Excel macro or Wordpress-driven monstrosity just to address a temporary need that is not worth spending more than 2h on, and first thing you know

          • +1. To get hired at Google, you have to be smart enough to get through the tests but not smart enough to know that what you're working on has already been looked at by two dozen other people at different times and there's a pretty good solution already available. Instead, you get to reinvent the wheel yourself from scratch, but yours is going to be bigger, better, faster, newer, and web scale, because you're Google. I declined to work there, it would have driven me nuts to work in such an environment.
          • by Kjella ( 173770 ) on Sunday March 26, 2017 @12:04AM (#54111091) Homepage

            For example, on (non-cryptographic) hash-functions my answer was to not do them yourself, because they would always be pretty bad, and to instead use the ones by Bob Jenkins, or if things are slow because there is a disk-access in there to use a crypto hash. While that is what you do in reality if you have more than small tables, that was apparently very much not what they wanted to hear. They apparently wanted me to start to mess around with the usual things you find in algorithm books.

            No offense, but "I'd rather just use a library" seriously brings into question what you bring to the table and whether you'll just be searching experts-exchange for smart stuff other people have done..Like everybody knows you shouldn't use homegrown cryptographic algorithms, but if a cryptologist can't tell me what an S-box is and points me to using a library instead it doesn't really tell me anything about his skill, except he didn't want to answer the question. In fact, dodging the question like that would be a pretty big red flag.

            Don't get me wrong, you can get there. But start off with roughly what you'd do if you had to implement it from scratch, what's difficult to get right, then suggest implementations you know or alternative ways to solve it. Because they're not that stupid that they think this is some novel issue nobody's ever looked at before or found decent answers to. They want to test if you have the intellect, knowledge and creativity to sketch a solution yourself. Once you've done that, then you can tell them why it's probably not a good idea to reinvent the wheel.

            • by gweihir ( 88907 ) on Sunday March 26, 2017 @01:52PM (#54113567)

              No offense, but you miss the point entirely. What I answered is very far from "use a library". First, it is an algorithm, not a library. That difference is very important. Second, it is a carefully selected algorithm that performs much better than what you commonly find in "libraries" in almost all situations. And third, the hash-functions by Bob Jenkins (and the newer ones bu DJB, for example) are inspired by crypto, but much faster in exchange for reduced security assurances. In fact so fast that they can compete directly with the far worse things commonly in use. "Do not roll your own crypto" _does_ apply_ though.

              So while I think you meant to be patronizing, you just come across as incompetent. A bit like the folks at Google, come to think of it...

              • Or to put it another way: There are better ways to determine someone's understanding of wheels than asking them to make one.

                If I were interviewing a candidate and wanted, for some reason, some sense of that person's understanding of hash functions, I'd hope for more than "just use a library", but I also wouldn't be looking for a Introduction to Algorithms exposition on them. gweihir's original post comes pretty close to the sweet spot: it shows an understanding of the problem domain, some sense of approache

          • Interesting comments on this thread, thanks. I've learned a lot.

            fwiw, I have a network engineering background and Hadoop always seemed like a clusterfsk to me...good to learn the actual story isn't far from my impressions.

        • If you make people jump through hoops like circus animals to come work at your company

          They jump through hadoops, not hoops. That's how they show they're qualified to work with it.

      • by Anonymous Coward on Saturday March 25, 2017 @08:06PM (#54110315)

        That's because the mediocre programmers are the ones giving the interviews. Close friend interviewed last year only to sit in front of a bunch of know it all elitists. One douche rambled on about how he wishes there were monads in c++ and how great functional design is. Now my friend and his roommate are CS geeks and their spare time is doing shit like build a lisp interpreter in c++ just for fun. So he asked mr monad if the project used a functional approach which was a solid no. Idiot just wanted to show off the fact he knew what functional programming is and wasted time. He passed on the Google job for a big local company doing back end dev work. Job pays as good as Google without the pompous know nothing's with the ability to remotely work. Fuck working for Google.

        • by gweihir ( 88907 )

          Quite possibly these people are vastly overestimating their own skills because they "work at Google". Fortunately, I did not run into socially inept interviewers, but as to the questions asked, they did not have more than surface knowledge. That is not how you interview somebody with advanced skills and experience, because people on that level rarely run into things they have not seen before in some form and that they need to solve on an elementary level. I think this happened to me once in the last 5 years

    • by Mitreya ( 579078 )

      1) you have a staff of elite programmers like Google or Facebook, who have CS degrees from top universities and are accustomed to picking up new programming languages and tools on a continuing basis;

      I disagree.
      MapReduce is actually great for teaching people about parallel processing! I have been able to teach a distributed computing course to non-CS (primarily data science) MS students because it achieves parallelization without most of the complexities associated with distributed query processing. With Hadoop streaming, all you need is basic knowledge of python (or similar) to write your own custom jobs, even without Hive/Pig/etc.
      That to me is one of the greatest accomplishments of MapReduce. Bringi

      • > MapReduce is actually great for teaching people about parallel processing! I

        And about how _not_ to do it. The underlying expense and architecture mistakes "scalability" for actual throughput in processing. It's proven extremely unstable in tasks larger than a small proof of concept, and any task I've encountered in which the actual data to be processed has to be successfully, processed, and verified within a specified deadline.

        • by Mitreya ( 579078 )

          The underlying expense and architecture mistakes "scalability" for actual throughput in processing. It's proven extremely unstable in tasks larger than a small proof of concept

          Can you elaborate on some reasons?
          I was part of a research paper some time ago, and Map Reduce does have the advantage of in the ability to resume (rather than restart) queries on failure and better handling of ad-hoc queries (compared to RDBMS).

          • > Can you elaborate on some reasons?

            It has suffered from a problem common to various object oriented projects: by refusing to acknowledge the existence of lower level structures, such as the very real storage hardware and real network connections necessary to propagate the data among the nodes for effective access, the result is that it didn't scale. Backup of results from well-delineated processing steps, which is critical for debugging or re-running new versions of particular processing steps, wound up

    • When Hadoop arrived on the scene it had the exact smell of typical programmer make work projects like Ruby and SEO.
  • How about: "Hadoop served many people well for a long time, but it is time for it to be deprecated now." ?
  • "It's very clear to me, technologically, that it's not the technology base the world will be built on going forward"... [T]hanks to better mousetraps like S3 (for storage) and Spark (for processing), Hadoop will be relegated to niche and legacy statuses going forward, Muglia says.

    My 4th grade English teacher used to say, "A bad workman blames his tools."

    Sounds relevant to me here.

    • Hadoop is not tools, it is one particular tool. Some tools are just bad -- I give you the magnetic stud finder as an example.

    • by somenickname ( 1270442 ) on Saturday March 25, 2017 @08:59PM (#54110543)

      My 4th grade English teacher used to say, "A bad workman blames his tools."

      Sounds relevant to me here.

      Apparently your 4th grade English teacher has never tried to use a hammer covered in spikes that arrived in a box labeled "Screwdriver".

      • by Tablizer ( 95088 )

        Sounds like a Catholic school punishment tool.

      • Home Depot saw your order for a meat tenderizer [amazon.ca] and did their best to help [wmctv.com]...

      • My 4th grade English teacher used to say, "A bad workman blames his tools."

        Sounds relevant to me here.

        Apparently your 4th grade English teacher has never tried to use a hammer covered in spikes that arrived in a box labeled "Screwdriver".

        You found Windows! Don't forget that the handle's splinters each carry a different painful virus

    • My 4th grade English teacher used to say, "A bad workman blames his tools."

      Did your English teacher also explain the concept of the cliché?

      This particularly tiresome one, of dubious provenance (wikiquote sites numerous variations from a host of sources), is surely mentioned at least a few times in the comments for any thread about deficiencies in a product. It seems terribly unlikely that anyone is reading it here for the first time.

      It's a splendid example of sophomoric thinking. Yes. poor workers often blame tools. So do good ones, with reason. It's as uncompelling a maxim

  • It has not (Score:4, Insightful)

    by gweihir ( 88907 ) on Saturday March 25, 2017 @05:58PM (#54109801)

    What has happened instead is that quite a few "tech experts" did not understand what it actually was and had completely unrealistic expectations. Map-reduce is nice when you a) have computing power coming out of your ears and b) have very specific computing tasks. That means that in almost all cases, this technology is a bad choice and that was rather obvious to any actual expert right from the start.

    • by lucm ( 889690 )

      What has happened instead is that quite a few "tech experts" did not understand what it actually was and had completely unrealistic expectations. Map-reduce is nice when you a) have computing power coming out of your ears and b) have very specific computing tasks.

      Spot on. Hadoop is meant to run on a shitload of commodity computers, which is something most organizations don't have - if you can afford a shitload of commodity computers your sysadmins will probably choose to buy high-end SAN and top notch blade servers, and virtualize everything.

      You can see it immediately when you install a packaged version like Hortonworks; the wizard will put data on all your volumes because it assumes you're running on a bunch of low-end servers with shitty RAID or even JBOD - but if

    • by Xyrus ( 755017 )

      Precisely. Hadoop was marketed as a big data panacea and everyone tried to apply it to everything only to discover that it really wasn't a panacea and really wasn't a good solution to the problems they were throwing at it. In addition, it's not particularly easy to use and you can spend a considerable amount of time just in configuring, tweaking, and maintaining the system.

      Hadoop, like any other tool, has it's uses. But like any other tool if you try to apply it outside of what it was really intended to be

  • That feels like a better unifying principal

    They're choosing someone to lead the merger of some high schools?

    Fucking hell, unless you chew your tongue when you talk they don't even sound the same.

  • by Anonymous Coward on Saturday March 25, 2017 @06:16PM (#54109879)

    Did nobody explain to the original poster that Spark in serious deployments is built on top of Hadoop? Or that Kafka uses the Hadoop (YARN) scheduler and is generally used to sink data to HDFS files, also built on top of Hadoop? This is kind of like someone saying that TCP/IP is no longer relevant because we now have DNS....

  • Just say Pachyderm (Score:4, Informative)

    by awilden ( 110846 ) on Saturday March 25, 2017 @06:16PM (#54109881)
    People should check out these guys: http://pachyderm.io/ [pachyderm.io] The power of Hadoop, but you choose whatever programming language you think is best for you.
  • As far as I know, most people are using Apache Spark for new projects.
    • by lucm ( 889690 )

      As far as I know, most people are using Apache Spark for new projects.

      Spark is a framework that includes ETL, in-memory computing and a machine learning library - a typical case of wheel reinventing.

      Those "most" people you mention probably only use the machine learning part, and on a fairly small data set. In theory, Spark RDD can scale to "Petabytes" (says them) but I've never seen it work on even TB level volumes of data, while Hadoop scales to unlimited volumes (Yahoo used to run a 40,000 nodes cluster).

      Spark is awesome but it's not a replacement for Hadoop for distributed

      • Good to know.
      • If you actually would work with Spark, you would know it is based on Hadoop, just saying.

        • by lucm ( 889690 )

          If you actually would work with Spark, you would know it is based on Hadoop, just saying.

          Even a retard with a low-speed internet access can look this up on Wikipedia and prove you wrong. Are you trolling or just stupid?

          • Are we talking about the same : http://spark.apache.org/ [apache.org] ??
            Why so angry?

            • by lucm ( 889690 )

              Yes. Spark can optionally run on Hadoop, which is not the same thing as being based on Hadoop. So before implying that other people would "know" something if they had worked with Spark, make sure that the thing in question is true.

              • It is the opposite around.

                Spark runs by default on Hadoop, it was designed on top of Haddop.

                Perhaps it can run on other things, too. I never saw one doing it, though.

                What e.g. would be an example? Of such an "other file system"?

  • When your software integration prevents your software from being used in conjunction with a variety of other platforms, you drastically reduce the number of users and in turn the number of developers that will work on it. As you integrate software more and more, you exponentially decrease the number of developers interested in making tools to make operation of your software easier. I'm not saying that making a system that works with everything will attract more developers but I am saying that making an ov

  • Idiotic babble (Score:5, Insightful)

    by lucm ( 889690 ) on Saturday March 25, 2017 @07:40PM (#54110227)

    People who bash Hadoop without understanding at a very minimum the moving parts have obviously no experience with it.

    Hadoop is not one thing. It's three:

    1) a distributed filesystem (HDFS)
    2) a job scheduler (Yarn)
    3) a distributed computing algorithm (MapReduce)

    Many tools like Hbase or Accumulo *need* HDFS. That's a core component and there's no equivalent in Spark. Anyone saying HDFS is obsolete is a clueless idiot.

    Anyways the Spark vs Hadoop narrative is bullshit. A serious Spark setup usually runs on top of a Hadoop cluster, and often you can't get away entirely from MapReduce (or its actual successor, Tez) because Spark runs in-memory and doesn't scale as much; for some workloads you need the read-crunch-save aspect of MapReduce because there's just too much data, and MapReduce is also more resilient as you don't lose as much when a node crashes during a job. Spark is more advanced and has actual analytics capabilities thanks to a powerful ML library (while Hadoop is just distributed computing), but it's not a case of either/or.

    For instance a common approach is to use Hadoop jobs to trim down your data (via Pig or other blunt tool) to a point where you can run machine learning algorithms on Spark.

    As for Kafka, it's just a fucking message queue. It's fast and very powerful, but comparing it to Hadoop is like saying you should use Linux instead of MySQL.

    Whoever considers buying services from those Snowflake morons, run away.

  • I need to scratch Hadoop off my list of technologies that I need to read about because everyone else in the office is reading a particular Big Data book?
  • If you look at the list technical experts
    1 Bob Muglia - Head of a startup competitor that trying to market data analytics product trying to steer some of that Hadoop investment into his fold. His sales model is "Look how easy we are" What you should be asking is how much does it cost and how do I get my data back.
    2 Bob Johnson - Cofounder of an analytics company trying to steer some of that Hadoop investment into his pocket.
    This is a beat up driven by people who wished that they had a slice of the Hadoop p

  • So I have a hadoop stack and a team of 4 data scientists. It takes them a month to develop an interface for new data... How do I get this dev time down. With new data-sets coming in on a weekly basis this team would need to grow 10X to keep up? In the mean time the average users needs to wait a month for access to new streams of info. That leaves our business a month behind on current trends that can definitely be predicted from the data streams. So what do I need to do?? Hire 36 new Data scientists or cha
    • by Anonymous Coward

      Hire someone competent with actual software development skills? Most data scientists I've met were glorified or relabeled data analysts. Some minor stats background and maybe they can hack together a script. That's fine and really valuable for analyzing large datasets and formatting the results into pretty figures for decision-makers to look at.

      If your data is too complex for their basic ETL skills and it's taking a month to build interfaces, hire one competent and expensive developer to build those interfa

  • After only 5 minutes with Hadoop I could figure out it was nothing but a giant boondoggle. It only took to the end of that afternoon to be completely sure. Now, what... 3, 4 years later the rest of the industry is starting to figure it out, en-masse? Seems about right.

    • You should give MongoDB a try. It might impress you for a whole 6 minutes before you realize that the developers are a bunch of assholes.
  • Perhaps the issue here is about unreasonable expectations.

    No software, Hadoop or other, will magically extract meaning from a huge dump of data. You need work to do that, whatever the tool you use.

    This rant reminds me about the people who purchased enterprise service bus to interconnect IT applications, just to discover that instead of interconnecting applications, they now need to interconnect applications with the enterprise service bus. No problem solved for free.

  • '"I can't find a happy Hadoop customer. It's sort of as simple as that," says Bob Muglia, CEO of Snowflake Computing, which develops and runs a cloud-based relational data warehouse offering' slashdot [slashdot.org]

    Here's Bob Muglia while at Microsoft describing how to 'add additional semantics' to Outlook, that is perform a detailed analysis of Lotus Notes and then clone it into Outlook.

    "Notes/Domino R5 is very scary. We all saw the demo. Exchange has worked with teams around the company to put together a very det
  • I read a great article where one guy compared Hadoop to tools such as grep. I many fundamental ways he was able to use UNIX command line tools to wildly outperform Hadoop on what I would consider to be on the larger end of a typical company's data set.

    To me Hadoop was the classic solution desperately in quest of a problem. The worst problem with that being so many people who jumped onto Hadoop and thought they were ass kickers for doing so.

    The simple reality is that for most corporate datasets the too
  • Isn't mongoDB supposed to be similar to hadoop? Do the same pitfalls for hadoop apply to mongoDB?

  • by vile8 ( 796820 ) on Sunday March 26, 2017 @01:13AM (#54111231) Homepage
    Hadoop starts with a vastly distributable, and resilient file system (HDFS) which enables, as a base, technologies that include things like HBase (columnar stores), Impala (Parquet example), Titan (graphs), Spark (lord everything.. its the emacs of data frameworks), or the latest projects which completely change the paradigm of how you are looking at data at unbelievable speeds. (who the hell runs mapreduce and expects real time performance?... its a full disk scan across distributed stores... and fairly sane from that perspective)

    If you don't have problems that relate to these paradigms... dont use it. Seriously. Just because its new doesnt mean it fits every situation. Its not mysql/mariadb/postgresql... if you think its even remotely close to that simple you should run for the hills. If you have a significantly large (not talking hundreds of megs or even a couple gigs... you need to be thinking in Billions of rows here) configuration management problem then its a great base to layer other projects on top of to solve your problem.

    Also, I found a large number of problems to solve using timestamped individual data cells that CANNOT be done using traditional sql methodologies. Lexicographic configuration models, analytics (obv), massive backup history just to name a few. If the management and installation of the cluster are scary... well...not everything in CS is easy... especially when it gets to handling the worlds largest datasets.... so, this probably isn't really your problem... call the sysadmins and ask them (politely) to help. Believe it or not the main companies have wizards which can help get you going across clusters... and even manage them visually (not that I ever would... UI's are for people who can't type).

    When people (or just this CEO) says it doesn't deliver on its promise. You are likely trying to solve a problem wholy inappropriately. I have personally used it to solve problems like making real time recommendations in under 200ms across several gigs of personal data daily (totalling easily into terabytes). (No you don't use mapreduce... think harder... but you DO use HDFS).

    So what promise were you told?

    Other than real time (as illustrated above), you can do archiving, ETL of course, and things like enabling SQL lookups, or RRDs... using a number of toolkits or spark. Seriously, this is one of the best things since sliced bread when it comes to processing and managing real big data problems. Check out the Lambda processing model when you get a chance... you might be impressed, or be utterly confused. Lambda (and not talking about programming Lambda, nor AWS Lambda) applies multiple apache technologies to solve historical with real time problems in a sane manner. Also managing massively distributed backups is much simpler with HDFS

    Honestly, outside of Teradata implementations, there is no where in the world you can get this kind of data resiliency, efficiency, nor management. Granted it doesn't have the 20+ years of chops in HUGE datasets Teradata does, nor the support... but its open source and won't cost you much to try.

    Long long story short. What the hell! I feel like programmers today are constantly ... whining... about complexity. It seems like a trend to say "well I couldn't use it for my project so that means no one really does.. they are just trying to look cool." Which I would have to reply... you're an idiot. Yes its complex... if you understand storage / manipulation / migration / replication / indexing... you should be impressed to say the very very least. If you dont, please go read the changelog, Readme, and any note based install guides. or do some research on the commercial companies using this technology successfully.... instead of making of figures and claiming its gospel.

    Any commercial solution will cost you ... well... millions just to get started solving the problems Hadoop nailed out of the gate.

    If Hadoop seems large and frightening just wait until y
  • I think many of the 'unhappy customers' the article refers to are companies where somebody who didn't quite understand the technology pushed hadoop as a replacement for (expensive) proprietary software like Oracle, to be then sorely disappointed especially on interactive performance.
    I've been working with hadoop since 2007 and have successfully deployed for multiple clients. First of all, you really want to see if the use case makes sense, sometimes you're just better off with a RDBMs like mysql. Some comp

  • Big Data is a nice word. The fact that the concept if it is useful for roughly 5 ginormous global internet companies and beyond pointless for everybody else is probably something that 99.9% of all people making the final decisions on which technologie stack is used have zero clue about. They haven't got the faintes what big data actually means and what problems with it solutions like hadoop actually address.

    I'd bet money that 99 of 100 scenarios in which hadoop would even run better with some unspectacular

  • In other news, Bandwagon jumpers are shocked to discover that the cool new doohickey they read about in Tech Fashion Trends Magazine, doesn't actually magically fix every problem you throw it at.

    Computer technology has now been around and commonplace for several decades now. It isn't knew that this stuff is complicated, and getting even more complicated with each passing year.

    And yet while a client would never demand a builder use this specific kind of scaffolding and cement to build with because they read

  • Imagine a Beowulf cluster of these!

  • Are these people for real?

    The whole article screams, "I don't know what I'm doing but I love jumping on bandwagons."

    Apache Hadoop and Kafka are two completely different tools, intended for two COMPLETELY different workloads.

    So if you used Hadoop when you should have used Kafka, that doesn't mean Hadoop is bad. It means you haven't done your job and properly vetted the tools available for suitability.

"What man has done, man can aspire to do." -- Jerry Pournelle, about space flight

Working...