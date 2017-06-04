Insecure Hadoop Servers Expose Over 5 Petabytes of Data (bleepingcomputer.com) 23
An anonymous reader quotes the security news editor at Bleeping Computer: Improperly configured HDFS-based servers, mostly Hadoop installs, are exposing over five petabytes of information, according to John Matherly, founder of Shodan, a search engine for discovering Internet-connected devices. The expert says he discovered 4,487 instances of HDFS-based servers available via public IP addresses and without authentication, which in total exposed over 5,120 TB of data.
According to Matherly, 47,820 MongoDB servers exposed only 25 TB of data. To put things in perspective, HDFS servers leak 200 times more data compared to MongoDB servers, which are ten times more prevalent... The countries that exposed the most HDFS instances are by far the US and China, but this should be of no surprise as these two countries host over 50% of all data centers in the world.
dumbass millennials (Score:2, Insightful)
And yet companies keep hiring younger people and getting rid of experienced pros that understand security
also why is the article making it sound like a Hadoop issue when it's clearly the dumbass millennials that configured these so poorly?
At my company, some idiot developer used a public facing URL to put PDFs of our customers' health insurance claims so that he didn't have to write an on-demand report generator to display that same information in an HTTPs session. Even though the file names were pseudo-random, Yahoo quickly crawled it and made the information searchable.
So not only was private information made publicly available, the PDF files were in a directory that was marked browseable by the web server? That's extra nice.
why is the article making it sound like a Hadoop issue when it's clearly the dumbass millennials that configured these so poorly?
Baby Boomers - Destroying the ecosystem.
Gen X - Destroying the global economic system.
Millennials - Not giving any fucks because they are the worst paid generation.
I'm glad you're focused on the the right things here.
;)
It's a distributed data storage/processing system. Whether it's useful depends on your project.
A good programmer makes sure that their storage and database backend is replaceable and good backend projects make sure that they support at least somewhat standard methods and functions.
The problem with most of these implementations is they're relatively expensive for small setups. You need 3 dedicated nodes at least to make it "work" well enough and it still has huge amounts of overhead compared to a classic sys
MongoDB is webscale (Score:2)
WHy not use MongoDB? MongoDB is a webscale database that scales.
https://www.youtube.com/watch?... [youtube.com]
Imagine you wanted a database to search petabytes of terabyte-sized files. Now imagine you learned nothing about databases and only knew Java, so naturally started over from scratch, blissfully free of any external normalizing influences.
A hacker stealing a copy of that data (Score:2)
will have to make a run to Best Buy for a few more thumb drives.