NYT: Healthcare.gov Project Chaos Due Partly To Unorthodox Database Choice 334
First time accepted submitter conoviator writes "The NY Times has just published a piece providing more background on the healthcare.gov software project. One interesting aspect: 'Another sore point was the Medicare agency's decision to use database software, from a company called MarkLogic, that managed the data differently from systems by companies like IBM, Microsoft and Oracle. CGI officials argued that it would slow work because it was too unfamiliar. Government officials disagreed, and its configuration remains a serious problem.'" The story does not say that MarkLogic's software is bad in itself, only that the choice meant increased complexity on the project.
MarkLogic = NoSQL (Score:5, Interesting)
MarkLogic is an XML repository, not a RDBMS (Score:5, Interesting)
"Some people, when confronted with a problem, think 'I know, I'll use XML.' Now they have two problems."
-JWZ
MarkLogic is an XML database, not a relational database, so if your data primarily consists of XML content then it's the right tool for the job. Sounds like the vendor building the system had a favorite hammer and decided that a rather traditional database problem looked like a nail.
MarkLogic itself is fine if your data fits neatly into an XML schema, but with healthcare.gov that tree is probably enormous and hard to optimize for DB activity.
Re:MarkLogic = NoSQL (Score:5, Interesting)
A customer at work has a MarkLogic database. I don't know what version it is, but I assure you IT IS HORRIBLE. It's like... an XML database or some shit. Just awful, and extremely confusing to use.
A couple years ago I was asked to do automatic database backups. The only documentation I could find for backing up the database requires logging into a web interface I had no idea existed on an obscure port and clicking a backup button. I literally had to write a perl script to fake a browser and do this just so they could get a database dump.
Do not use this product. Please.
Re:Blow to NoSQL movement (Score:2, Interesting)
The agency imposing their "unconventional" choice of DB, and this sentence:
"They ordered CGI technicians to drive from their offices near Dulles International Airport in Virginia to the agency headquarters near Baltimore to review their code with government supervisors."
sum up everything that went wrong.
Basically you had a bunch of arrogant idiots who thought themselves experts (which they may well be, _in the healthcare domain_) who not only ignored the views of the _software/IT_ experts ("technician" my a**, senior software engineer, more like), who they saw as plebes, but proceeded to question their work.
When will those guys realize that software developers are not blue collar workers?
That they went for the usual bigco contractors and their hordes of yes-men certainly didn't help.
Re:follow the money (Score:2, Interesting)
Because relational dbs are old and stuff. We don't tie onions to our belts anymore either, grandpa. NoSQL and XML are the future! A hipster I met at Starbucks told me so.
Re: MarkLogic = NoSQL (Score:5, Interesting)
Marklogic, afaik, is the only acid compliant nosql solution that exists.
US Federal Government Systems Projects (Score:5, Interesting)
Re:Blow to NoSQL movement (Score:5, Interesting)
I ran into this recently at a company whose new head of engineering was talking to me (an outsider) about the technology problem they had to solve and I thought it sounded very traditional and simple except they'd need to carefully plan for horizontal scaling.
Basically a potentially huge number of devices (in the range of millions) would be reporting in periodic data that had to be stored and potentially evaluated in real-time. The data was quite easily swim laned by geolocation and the data had no appreciable inter-related significance. So basically, one piece of data from one device had nothing to do with any other device's information except in the general sense that can come from a more heuristic correlation of their data.
I should mention that the new engineering head and I had already (together) handle a situation very similar to this at a previously successful software company.
Well, the new engineering head had inherited an external architect who had different ideas. All of these different ideas involved things like Cassandra over Hadoop, AMP/Spark, BDAS. He showed me a diagram of the technologies he wanted to integrate and I'd never heard of almost half of them (and I deal with scaling issues all the time), this diagram had about 15 different technologies stacked together. It was crazy - all to solve a relatively simple data volume problem.
Almost needless to say, I advised otherwise, and afaik they're going the bid data way because it will make it easier for them to pull in VC money since shockingly few VC's actually evaluate technology before they put money in (I do this for VCs also, and other VCs wonder how I get paid to do this, lol.)
It is not the tool, stupid (Score:3, Interesting)
It is NEVER the tool, but who and why it was chosen.
Furthermore, MarkLogic is a legitimate NoSQL vendor with strong ACID.
So the question is "how does MarLogic screw up the site", without the answer to that question, we should all refrain from pointing finger to merely a small piece in a huge software project.
NoSQL is NOT the new hotness, it's been out there for at least 5 years and many successful projects are using them, so for the ones that things NOSQL shouldn't be used, wake up and breath some fresh air.
Re:follow the money (Score:5, Interesting)
Also, the Federal Government fscking LOVES XML. Like, a lot. The things I saw in the new "protocols" they're deploying for internatl air-traffic control data stuff... Wow...
Re:follow the money (Score:5, Interesting)
The chart's hosted on the National Republican Congressional Committee's website. I would take it with a heaping tablespoon of salt, if I were you. It's say to say that chart was designed to look as scary and confusing as possible.
Nothing wrong with the tech. (Score:4, Interesting)
I've used it, personally, to implement a public-facing website. That site endured the dreaded 'slashdot effect' several times. No failures.
When implemented properly, Marklogic is damn near unkillable. It will slow down, it will reject connections when queues are full, but it will not fall over. Naturally, this assumes proper underpinnings and capacity calculations. With Marklogic, those are actually documented.
Mandatory disclosure: I do not have and have never had any association with Marklogic other than a paying customer.
Re: follow the money (Score:4, Interesting)
For those wondering about the link between MUMPS and government healthcare, my vague recollections from years ago when I worked as a developer for health + insurance software: the old MUMPS language included its own looks-like-all-in-memory database system (essentially just a recursive map of string to object, either a value or another map -- the JSON comparison is fair) which made serialization simple. The language got used to build some early health IT systems, including the one for the VA (VistA) and its IHS derivative (RPMS). That stuff's available for free, by the way, through FOIA. The projects have sufficient inertia that they still use the same data-store (at least at the API level). InterSystems Caché, for example, is a MUMPS-compatible database with some relational features (and SQL parsing) thrown on top. They bill themselves as post-relational, but yeah, it's a network database pretending to be a relational database.
It kind of makes sense to continue using network databases for health data -- in a privacy-conscious world, it's not insane to isolate patient data into a document-oriented storage system, because you're not planning to relate data willy-nilly. We were somewhat frustrated that the HL7 interchange format tended to assume things were hierarchical, where we had seen potential graphs and coded for them -- but nobody wanted our better-related data. They prefer to re-enter the data in each place, and prevent things from being synchronized -- it protects the data from unexpected changes. So if all the systems and agencies you're integrating with have this attitude anyway, and you're constantly worried about data-interchange, I can see how you might come to the conclusion that a document-oriented, XML-backed storage engine would be a good idea.