Insecure Hadoop Servers Expose Over 5 Petabytes of Data (bleepingcomputer.com) 51
An anonymous reader quotes the security news editor at Bleeping Computer:
Improperly configured HDFS-based servers, mostly Hadoop installs, are exposing over five petabytes of information, according to John Matherly, founder of Shodan, a search engine for discovering Internet-connected devices. The expert says he discovered 4,487 instances of HDFS-based servers available via public IP addresses and without authentication, which in total exposed over 5,120 TB of data.
According to Matherly, 47,820 MongoDB servers exposed only 25 TB of data. To put things in perspective, HDFS servers leak 200 times more data compared to MongoDB servers, which are ten times more prevalent... The countries that exposed the most HDFS instances are by far the US and China, but this should be of no surprise as these two countries host over 50% of all data centers in the world.
According to Matherly, 47,820 MongoDB servers exposed only 25 TB of data. To put things in perspective, HDFS servers leak 200 times more data compared to MongoDB servers, which are ten times more prevalent... The countries that exposed the most HDFS instances are by far the US and China, but this should be of no surprise as these two countries host over 50% of all data centers in the world.
dumbass millennials (Score:3, Insightful)
And yet companies keep hiring younger people and getting rid of experienced pros that understand security
also why is the article making it sound like a Hadoop issue when it's clearly the dumbass millennials that configured these so poorly?
Re: (Score:2, Interesting)
At my company, some idiot developer used a public facing URL to put PDFs of our customers' health insurance claims so that he didn't have to write an on-demand report generator to display that same information in an HTTPs session. Even though the file names were pseudo-random, Yahoo quickly crawled it and made the information searchable. It went on for years until a customer called in and asked why his information was found on a Yahoo search.
That inexpensive off-shore developer cost the company millions..
Re: (Score:2)
At my company, some idiot developer used a public facing URL to put PDFs of our customers' health insurance claims so that he didn't have to write an on-demand report generator to display that same information in an HTTPs session. Even though the file names were pseudo-random, Yahoo quickly crawled it and made the information searchable.
So not only was private information made publicly available, the PDF files were in a directory that was marked browseable by the web server? That's extra nice.
Re: (Score:2)
Thanks to H1Bs you don't need to offshore to get the incompetence, now you can bring it in-house.
(That's sort of tongue in cheek, but based on numerous real-world experiences interacting with low-wage devs brought in from overseas. Some of these guys should have been paying their employers rather than being paid, to make up for the amount of damage they were causing).
Re: (Score:2)
why is the article making it sound like a Hadoop issue when it's clearly the dumbass millennials that configured these so poorly?
Baby Boomers - Destroying the ecosystem.
Gen X - Destroying the global economic system.
Millennials - Not giving any fucks because they are the worst paid generation.
I'm glad you're focused on the the right things here. ;)
Re: dumbass millennials (Score:2)
Nice. That's how I feel about C and Perl.
Re: (Score:1, Troll)
Because nobody competent would be using Hadoop in the first place.
Re: (Score:2)
It's a distributed data storage/processing system. Whether it's useful depends on your project.
A good programmer makes sure that their storage and database backend is replaceable and good backend projects make sure that they support at least somewhat standard methods and functions.
The problem with most of these implementations is they're relatively expensive for small setups. You need 3 dedicated nodes at least to make it "work" well enough and it still has huge amounts of overhead compared to a classic sys
MongoDB is webscale (Score:2)
WHy not use MongoDB? MongoDB is a webscale database that scales.
https://www.youtube.com/watch?... [youtube.com]
Re: (Score:1)
Imagine you wanted a database to search petabytes of terabyte-sized files. Now imagine you learned nothing about databases and only knew Java, so naturally started over from scratch, blissfully free of any external normalizing influences.
Re: (Score:1)
It's nothing new. Read a book on how to optimize mysql. I've worked with Hadoop myself. Any notion that it's better for ANYTHING other than creating a giant boondoggle is utter fiction.
Re: (Score:2)
Very badly, don't you remember when you had to train him?
Hang on, that was Hardeep.
Re: Maybe the data is supposed to be public? (Score:1)
NOt saying this isn't a problem, but (Score:1)
Big Data is, by definition, huge volumes of mundane data, usually in unstructured or semi-structured format, which have a very low density of interesting or useful information. But, when aggregated over 100's of TB, some useful patterns can sometimes be gleaned. Now, are the hackers going to ship the terabytes of data out of the datacenter and hope nobody notices what amounts to a DoS attack?
Yes, there should be protection, but it's like heavy equipment and materials being left unattended at a constructio
A hacker stealing a copy of that data (Score:2)
will have to make a run to Best Buy for a few more thumb drives.
Hadoop = Insecure (Score:1)
My experience is a couple of years old, but when I did a deep dive into Hadoop a serious flaw quickly came to light:
Hadoop was NEVER designed for security.
Want to own a Hadoop server? Create an a hadoop account on your own box and connect to it. Bang, you are "root" on an Hadoop install.
Hadoop installs should only be implemented in a secured environment and use restricted VPN connections into it. Anyone who allows the "Internet" to connect to a Hadoop install is an idiot.
This security "flaw" in design is
Re: Hadoop = Insecure (Score:2)
I don't see that as being a flaw at all. Most software should be written like that.
The problem is with the people who use the software assuming that random special purpose projects like Hadoop have planned for security or are competent to do so. Just assume it's all insecure unless there's good reason to think otherwise, and access it via vpn or ssh.
Re: Who's going to steal or even sort through 5PB? (Score:1)
I really don't get this part (Score:2)
"According to Matherly, 47,820 MongoDB servers exposed only 25 TB of data. To put things in perspective, HDFS servers leak 200 times more data compared to MongoDB servers, which are ten times more prevalent..."
Was this statement actually intended as a bragging point for MongoDB? I've looked at this statement several times, and I can't come up with any other spin. Seriously - if somebody threw this line out there trying to sell me on his preferred piece of software, I'd immediately leave and vow to never use
And this is what happens... (Score:1)
Code is like a leaky boat (Score:2)
Fix it now or it costs you 2 orders of magnitude more when the (code) boat sinks.
How many are supposed to be accessible? (Score:2)
How many of those servers are actually supposed to be accessible, and how many of them are accessible only because they exist on a network with insufficient protection and oversight?
Not really a problem (Score:5, Funny)