Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Programming

Can We Replace YAML With an Easier Markup Language? (chrisshort.net) 161

On his personal blog, Red Hat's Chris Short (also a CNCF Cloud Native Ambassador) told his readers that "We kinda went down a rabbit hole the other day when I suggested folks check out yq. ("The aim of the project is to be the jq or sed of yaml files.")

"First, there's nothing wrong with this project. I like it, I find the tool useful, and that's that. But the great debate started over our lord and savior, YAML."

And then he shares what he learned from a bad experience reading the YAML spec in 2012: It was not an RFC, which I am fond of reading, but something about the YAML spec made me sad and frustrated. Syntax really mattered. Whitespace really mattered... It is human-readable because you see the human-readable words in the scalars and structures, but there was something off-putting about YAML. It was a markup language claiming not to be a markup language. I held the firm belief that markup languages are supposed to make things simpler for humans, not harder (XML is the antithesis of markup languages, in my opinion)...

Close to ten years later, I see YAML in the same somewhat offputting light... I hope that a drop in replacement is possible. The fact that we need tools like yq does show that there is some work to be done when it comes to wrangling the YAML beast at scale... Incrementally, YAML is better than XML but, it sucks compared to something like HTML or Markdown (which I can teach to execs and children alike)...

Yes, balancing machine and human readability is hard. The compromises suck, but, at some point, there's enough compute to run a process to take in something 100% human-readable and make it 100% machine-readable... There will always be complexity and a need to understand the tool you're using. But, YAML gives us an example that there can and should be better things.

In a comment on the original submission, Slashdot reader BAReFO0t writes "Binary markup or GTFO." UTF8 is already binary. Hell, ASCII is already binary numbers, not directly readable, but mapped to vector drawings or bitmap images ... that again are rendered to pixel values, that are then turning on blinkenlights or ink blots or noises that a human can actually recognize directly.

So why not extend it to structure, instead of just letters (... and colors ... and sound pressures... EBML's core [Extensible Binary Meta Language] is the logical choice.

If all editors always display it as, say XML, just like they all convert numbers into text-shaped blinkenlights too, people will soon call it "plain, human readable" too...

This discussion has been archived. No new comments can be posted.

Can We Replace YAML With an Easier Markup Language?

Comments Filter:
  • by GlennC ( 96879 ) on Sunday October 25, 2020 @05:38PM (#60647680)

    You want Yet Another Markup Language?

    • by Striek ( 1811980 ) on Sunday October 25, 2020 @06:21PM (#60647816)

      Obligatory XKCD [xkcd.com]

    • by Junta ( 36770 ) on Sunday October 25, 2020 @06:36PM (#60647882)

      Of course, YAML stands for YAML Ain't Markup Language (and it isn't one, it's a data serialization language).

      • by frisket ( 149522 )
        >>YAML is better than XML...

        ...for some value of "better".

        When I can write my thesis in YAML then I'll look at it more closely as a markup language.

        Right now it's for config files, and I'm sure it's very happy where it is.

        --
        How do we persuade new users that spreading fonts across the page like peanut butter across hot toast is not necessarily the route to typographic excellence? [c.t.t]

        • by Junta ( 36770 )

          Most people are aware of XML as its role in data serialization, rather than basis for a markup language (e.g. like HTML).

          Of course, by itself generic XML won't help you write your thesis either, as the syntax of tags, logistics of processing, and things like namespaces and some other features (perhaps too many, e.g. enables for XML injection attacks) are covered. However it does not prescribe any semantics for any tag, so it'd have to go to a markup language based on XML that further defines semantics for t

      • Of course, YAML stands for YAML Ain't Markup Language (and it isn't one, it's a data serialization language).

        Actually, I would suggest that YAML is an extremely poor choice for data serialization. YAML has many features to make the text human readable, such as semantic whitespace and numerous lexical constructs, but those features complicate machine-to-machine data transfer.

        YAML is very much a markup language.

        • by Junta ( 36770 )

          Except generally 'markup' means it is pretty good at marking up text. YAML is pretty terrible for that, and is really only well suited for describing key-value pairs and lists with nesting.

          So decent choice for a config file or a serialization where one end might be a shell script or something.

          I would have loved back in the day if games used yaml for their save file format... for... reasons...

    • This. Also, if I may add, all Markup Languages are doomed in the sense that there will be at least some special characters. This creates two problems: One, which characters to use as the special ones (from the ASCII subset) differs according to one's aesthetic (for example, I personally find YAML's choice to use space and newline a pleasing one, even if it makes the Unix weenies sad) and second, you will need some kind of escape sequence so the content can have those special characters. So, it's essentially
      • by nagora ( 177841 )

        This. Also, if I may add, all Markup Languages are doomed in the sense that there will be at least some special characters. This creates two problems: One, which characters to use as the special ones (from the ASCII subset) differs according to one's aesthetic (for example, I personally find YAML's choice to use space and newline a pleasing one, even if it makes the Unix weenies sad)

        Don't be silly.

        and second, you will need some kind of escape sequence so the content can have those special characters.

        Wow. If only there was some well-known convention for that.

        I increasingly get the feeling that computing has been taken over by people fresh out of university who think that three (part-time) years looking at the subject while drinking heavily has in some way made them into experts when they have barely scraped the surface of what is now a very large field.

    • Yes, Another Markup Language.
    • No they want YAYML. Just call it YAY! for short

    • by Livius ( 318358 )

      I recommend Yet One More Markup Language instead.

  • This is the one time I disagree with Betteridge's law of headlines.
  • with LISP
  • by EditorDavid ( 4512125 ) Works for Slashdot on Sunday October 25, 2020 @05:43PM (#60647698)
    "XML was designed to be human readable and is perfectly fine for humans," a commenter wrote to Chris on Twitter [twitter.com]. They added "the problem is some humans."

    Chris responded, "I'm one of those humans."
    • It doesn't really make sense to read XML completely manually, though. You're just having to ignore bunches of it. It makes a lot more sense to at least use a formatting tool and turn it into styled text. But if you're going that far, why not something with a tree view?

    • by _xeno_ ( 155264 )

      XML may be human readable (or not, it's really easy to create an entirely unreadable but valid XML file using entities and never using whitespace), but what it's not is human writable.

      No one writes XML purely by hand. Everyone uses an IDE to take care of some of the "housekeeping" for them, by which I mostly mean the end tags. Its these end tags that make Spring configuration files infamous, since you'd end up with places that have tags that read like <org.apache.some.ClassName.property>3</org.apac

      • But how is in XML less manageable than in HTML? In the 20+ years I've been writing XML things, I've never found an interface that was easier to use or make sense of than a simple text editor.
        • by pjt33 ( 739471 )

          Depends on your target audience. I'm quite happy writing XML files by hand, but the last time I built a system which required complex user configuration I looked around and found a graphical XML editor "for dummies" [sourceforge.net], as it were, which let you supply an XSD and then enforced it. It would have driven me nuts to use it, but for the end users of my software it was much less painful than trying to figure out what was wrong with a malformed raw XML document.

        • That was my thought when I read the article. HTML and XML are both SGML grammars. HTML (sans XHTML) just doesn't make you end a relatively small number of elements with a closing or self-closing element. So the window between "better than XML" and "worse than HTML" seems incredibly small.
      • Uhm... I write XML regularly by hand.... It's so easy... People can bitch all they want, but if there is something that's very easy to read and write (by hand) it's XML. But then again, I like structure...
    • Way too verbose. I personally prefer json2 over yaml!
      And yes yaml is not ideal but it is one of the better solutions there is.

    • Comment removed based on user account deletion
      • by AmiMoJo ( 196126 )

        JSON is so close to being good but stupid stuff like the choice of where you need commas and where you don't makes it annoying and brittle.

        • This is why I really do like YAML. It has 3 key advantages over JSON:

          - Solves the "trailing comma" problem
          - Allows comments (this is huge)
          - Because it is explicitly line-based, it is better suited for source control in general

          • This is why I really do like YAML. It has 3 key advantages over JSON:

            - Solves the "trailing comma" problem
            - Allows comments (this is huge)
            - Because it is explicitly line-based, it is better suited for source control in general

            You can just solve those by allowing the two first in your parser, and the third is already allowed and is a question of coding style.

          • we use both. yaml for static config files, easy to read and easy to write. and json for data serialisation by machines. xml is too verbose and hard to read, even when you use a tool to format it as best as possible. closing tags suck.
          • This is why I really do like YAML. It has 3 key advantages over JSON:

            - Solves the "trailing comma" problem
            - Allows comments (this is huge)
            - Because it is explicitly line-based, it is better suited for source control in general

            This assumes that users enter data from a text editor. That is possibly an antiquated interface. For example, my electronic design CAD uses an XML format, but I hardly ever manipulate the CAD data with a text editor. In fact, I screwed up a working file by trying to be clever with some copy and paste in the text editor.

      • by _xeno_ ( 155264 )

        It was designed to be expressed using text, but that's not the same thing at all.

        It was designed to be a simplification of SGML to make parsing easier and more consistent, thereby making it easier to add new tags to existing structured content while continuing to be able to parse it using existing parsers. A lot of the weirdness with things like the DTD are due to this SGML heritage.

        I don't know if they managed to keep XML a strict subset of SGML but that was sort of the original intention. It literally wasn't designed for people, it was designed to be easy to parse. (Did it succeed? We

      • by skids ( 119237 )

        Yeah, Apple "dict" files are an atrocity. Use tag names where appropriate and just have consistent rules to convert them to hashes. Jeez.

    • by hey! ( 33014 )

      The problem with XML is more features than anyone needs -- which is not the same as having features that nobody needs. All of those features are useful some of the time, but only a minuscule fraction of those features are useful *most* of the time.

      In a very broad set of applications XML used to be used for, it is gross overkill. When you can do some of the things that you'd do with language features of XML with programming conventions, usually that makes your life a lot simpler. Except when it doesn't.

  • Barefoot is mental (Score:5, Interesting)

    by nagora ( 177841 ) on Sunday October 25, 2020 @05:47PM (#60647714)

    "If all editors always display it as, say XML, just like they all convert numbers into text-shaped blinkenlights too, people will soon call it "plain, human readable" too"

    Here's a tip: if you invent something and require EVERYONE ELSE to do work to make it usable, then your invention is shit and you are a lazy cunt.

    This is exactly the problem with Python whitespace: it's such a bad idea that your editor needs to understand it and step in to help you. That's not "plain text" and neither is some fucking binary blob that will require "Version x or later" of vim or nano or whatever to edit when something goes wrong.

    • One of the reasons I love Python is its whitespace handling, you inconsiderate c...

      • by mfearby ( 1653 ) on Sunday October 25, 2020 @05:58PM (#60647748) Homepage

        The main reason I haven't used Python IS its whitespace handling. Meaningful whitespace is an abomination.

        • Meaningful whitespace is only a symptom of the problem: the entire philosophy of Python is "There is only one way to do things."
      • by nagora ( 177841 ) on Sunday October 25, 2020 @06:00PM (#60647752)

        One of the reasons I love Python is its whitespace handling, you inconsiderate c...

        It doesn't handle whitespace: it requires you to handle it. Braindead.

        • And you really should be mindful and pay attention to indentation. Are you a savage that write all the code without indentation?

          • by nagora ( 177841 )

            And you really should be mindful and pay attention to indentation. Are you a savage that write all the code without indentation?

            The issue is not indentation, it is mandated amounts of indentation and parsers which break because they see an extra space somewhere. That's braindead fragile pointless bollocks and a clear sign of a poor designer behind it.

            • Haskell has an interesting attitude to indentation. The formal syntax uses tokens like semicolon statement terminators, but you can write most Haskell code without that, and use semantic indentation, a bit like Python. I do not think the implementations of syntax are anything like the same between Haskell and Python.

      • As a hardware guy it makes little sense to me. Once the compiler/interpreter gets the code it doesn't care about whitespace. One extra space on a line makes the whole thing bomb out?

        • by Wolfrider ( 856 )

          Fucking whitespace in YAML *at the end of a line* and indentation rules drove me nuts, who implements a standard THAT BAD? ...And I used to program in COBOL!

          • What drove me nuts in my 5 minute try at yaml was that it was for swagger, so we could document our APIs. WTF should I need what amounts to another programming language to document an internal use REST API for the next dev? OK, if you want the testing and whatnot that swagger provides.... How about a nice form with a button "add GET", "add POST", etc. with some text fields and let everything else be generated?

      • Sorry, but the whitespace in Python is really ugly, a really REALLY bad way of having to write your code.. It's way to easy to make a mistake. having {} or begin/end or End if/End select etc is much better.
    • Leading white space in Python might be aesthetically displeasing to some, but using a control character to end strings is 11+ digit dollar mistake.

    • by AmiMoJo ( 196126 )

      That criticism applies to Unicode too. The editor needs to understand it, and even worse it needs to understand various languages too.

      So let's get rid of Unicode and replace it with something that works for both human languages and for structured data.

      • by nagora ( 177841 )

        I don't believe that there is any need for a config file format that is not editable as text; this is quite different from the situation before Unicode. Binary configs just make every disaster harder to recover from. The cost/benefit ratio is radically different between the two examples.

        • by AmiMoJo ( 196126 )

          It would be editable as text. There would simply be extra characters that allow for creating structure.

          If you look at Unicode there are some characters designed for formatting, for example. A dozen different types of space, tabs etc. The goal of a text processor being able to handle Unicode as easily as ASCII was missed but that doesn't mean we can't learn from Unicode's mistakes and do better.

    • by AmiMoJo ( 196126 )

      I just remembered that this already exists. Google created Protocol Buffers.

      https://developers.google.com/... [google.com]

      It's not a bad system but could have done more to support embedded systems.

    • That's not "plain text" and neither is some fucking binary blob that will require "Version x or later" of vim or nano or whatever to edit when something goes wrong.

      It only took about 25 years for Microsoft to add support for more than 64K characters or non-DOS newlines to their Notepad editor. So people using a standard install of Windows might expect to be able to view these binary blobs by the the second half of this century.

    • by HiThere ( 15173 )

      Python whitespace has problems when implemented as spaces, but when you use tabs at the line start, it becomes as reasonable as braces. (I.e., not perfect, for sure, but easy enough to handle and quite useful.)
      (I'm well aware this is a minority opinion, but it's my opinion, and I've held it for over a decade.)

    • Yes, I think someone should explain to Guido that Bjorn Stavtrup's idea of overloaded whitespace was just a joke.
  • by koinu ( 472851 ) on Sunday October 25, 2020 @05:59PM (#60647750)
    Well, there is YAML... so what? You know there is also TOML or JSON or XML or INI files. Pick one, which ever you like. Or make your own syntax, but don't complain about trivialities like these.
  • Even ASCII has code points for meta-information, e.g. "start of heading", "start of text", "record separator". This never works. Non-printable code points always end up as unused baggage reminding us of this folly. Computers are fast, storage space is cheap and parsers are a solved problem. If you invent a "blink" UTF code point, I'll reconsider my stance on cruel and unusual punishment.
  • by Hizonner ( 38491 ) on Sunday October 25, 2020 @06:26PM (#60647842)

    Syntax and whitespace really matter in lots of formats invented by human, for humans.

    Yes, I can decode the shitty mangled English syntax that a lot of people use on Slashdot, but that doesn't mean it's easy or reliablie. It matters.

    As for whitespace, the reason you see so much significant whitespace these days is that humans had already started routinely adding rigidly formatted whitespace to languages where it did not matter to the computer, because it made reading it easier for the humans. You can't easily read anything with a block structure unless it has indentation. And if that indentation matters is what you use to decode it, then it had better damned well be what the computer uses to decode it.

    HTML has a bunch of markup intended to create significant white space for human consumption.

    • by vux984 ( 928602 )

      "And if that indentation matters is what you use to decode it, then it had better damned well be what the computer uses to decode it."

      Agree to disagree.

      In particular, there is NO DIFFERENCE between your position and this hypothetical one:

      You can more easily read program code with syntax highlighting, keywords in one color, functions in another, string constants in another, etc. Since that color coding assists and ultimately matters to how humans decode the document, then it had better damned well be what th

      • by Hizonner ( 38491 ) on Sunday October 25, 2020 @07:58PM (#60648056)

        If you're going to go that route, why represent the code as a text file at all? Make it a binary AST on disk, and make the editor convert it to and from text. It could work, except that you'd have to retool the entire world to support it.

        However, as long as programs are text, you have to require humans to enter some text to tell a compiler where, say, a block of statements ends. If you do that using some random characters like curly braces, some human is going to have to type those. That human (or more likely the text editor) is also going to type whitespace to make the program readable. You're requiring the work to happen twice, and putting two different representations of the same information into the file on disk.

        For that matter, if you wanted, say, Python to have curly braces and semicolons and no indentation at all, you could have your editor present it that way. Nobody cares if you do that, so long as you write out a properly formated Python file in the end (and make sure you don't change things that are significant to SCM when you don't mean to).

        So why are extraneous printable syntax characters, or whatever else you want to do to indicate where things begin and end in code, any less "code presentation preferences" than whitespace? If you really, truly hate significant whitespace that much, you can have what you want as a simple matter of programming.

        The only real difference between indentation and explicit printable separators is that at least some kind of indentation seems to be an almost universal human preference... even in languages where all the sugar characters have to be there anyway. People are actually willing to do double work to get indentation even when it doesn't matter to the compiler. It's important enough that basically all large projects have formal rules about really picayune details.

        A simple editor can only really maintain one or the other, so it makes sense to use the one that most people rely on, and not to require manually maintaining two different representations of the same information in the disk file. A complicated editor can do whatever you want once it loads that file.

        Since that color coding assists and ultimately matters to how humans decode the document, then it had better damned well be what the computer uses to decode it.

        If that color coding were part of the actual program text, as opposed to being something that the editor tags on when you load the file and that vanishes when you save the file and close the buffer, then that would in fact be right. If the convention in the language were that variable names were entered in red, and you entered them in green, and that were going to show up on my screen when I viewed the code, then that should be a compiler error.

        But in fact the color isn't part of the program in any way. It's ephemeral stuff, added at load time and stripped at save time. What the color coding does is actually to make the machine's view more visible to the human.

        • by vux984 ( 928602 )

          "If you're going to go that route, why represent the code as a text file at all?"

          Because there are innumerable advantages to having source code stored as plain-text.

          "But in fact the color isn't part of the program in any way. It's ephemeral stuff, added at load time and stripped at save time."

          You actually could setup an IDE to do that. There's just nothing really gained AT ALL by doing it.

          At work, we have our IDE's set to delete trailing whitespace at line endings at save, we usually tick the box to ignore

    • Syntax and whitespace really matter in lots of formats invented by human, for humans.

      I recently bought a philosophy/economics book, and it is very difficult to read: Adam Smith "Wealth of Nations". There is no table of contents, no index. and paragraphs do not start with an indent. I thought this was just some Amazon cheapness, but I am given to understand that the original book was published like that.

      From an esthetic point view, things like page layout are obviously important for human viewers. I get rather annoyed with excessive line lengths and unsuitable font choices. But these present

  • It's only difficult when you get into the nitty gritty.

    Also, it isn't a markup language, it's a way to organize data into structures. It's like JSON. Basically you have YAML, JSON, and MsgPack as decent serialization choices depending on the situation (the first is the most human readable, but a bit fragile if a human tries to modify it carelessly, JSON is human friendly enough, and msgpack is optimized for best performance when machine processed).

    Markdown is generally the most straightforward for 'markup'

  • by subreality ( 157447 ) on Sunday October 25, 2020 @07:25PM (#60648002)

    YAML and JSON store data structures. JSON handles the basics. YAML allows more complicated data structures, possibly creating headaches if you didn't need such flexibility.

    XML, HTML, Markdown, etc, are document markup languages. They allow adding metadata (perhaps regarding structure, or display information) to text.

    Either group can be bludgeoned into trying to do the other's job. Most egregiously, XML was widely used to store data structures back before JSON existed. It was terrible, because that's not what it was meant for.

    So when the author says:

    It was a markup language claiming not to be a markup language. I held the firm belief that markup languages are supposed to make things simpler for humans, not harder (XML is the antithesis of markup languages, in my opinion)...

    What I hear is:

      * You don't understand why YAML isn't a markup language.
      * You don't understand why XML is a markup language.
      * "simpler for humans" is nice, but orthogonal.
      * You have no business championing for change when you don't understand the above points.
      * Especially when there are already many other choices already available to you.

  • by AReilly ( 9339 ) on Sunday October 25, 2020 @07:32PM (#60648006)

    Writing parsers is easy. Even with a standard markup or serialization language you need to pull in an enormous, buggy support library and _still_ write code that understands what's in the data file, range-checks it and sanitizes it. You're 90% of the way to a bespoke data representation already.

    The whole idea that we need one common data representation so that we don't have to keep writing parsers and serializers is bunk.

    And get off my lawn...

    • The entire point of having a standardized serialization format should be to use external tools with it.

      Things like yq, jq, xpath, xquery, xslt, schema validators, etc.

      Inventing another standard, to save keystrokes, or to look prettier, and then bickering about tooling years later, then more arguments about how schemas should be implemented years after that is SO BACKASSWORD.

      If you're making a new "standard" data serialization format and you're not already planning for search and extraction, transformation,

  • I successfully avoided YAML until recently, when I started setting up Docker Compose and Home Assistant, both of which rely on it heavily.

    I would be perfectly happy with JSON, if comments were allowed.

  • I enjoy YAML for situations where you need to have several layer of an embedded serialization. When you start seeing \\\\" and you've spent the last hour trying to figure out why de-serialization doesn't fail, but something is wrong with the final result. YAML is great for this. No escaping, just indentations. I would deal with white space sensitivity than the hell of multi-layer escaping.
  • "Binary markup or GTFO... If all editors always display it as, say XML..."

    So you want to discard the human-readable format because you expect to always use complex editing software that will render the data in a human-readable format...
    You want to discard the human-readable format because someone else will render it in a human-readable for you.
    Am I missing something here that can resolve this self-contradiction?

  • YAML is already as easy as it can possibly get. I know that's hugely disappointing, but it's true. Anything you do in an attempt to make it easier will only make it harder to use.

    That said, we probably shouldn't be using YAML. I have a lot of software the depends on it, and it's often a source of mis-configuration bugs and wasted time. We repeatedly run into problems that XML validation would have caught. And if you're worse than XML, then you've failed in some serious way.

  • I've never programmed Cobol, but having a syntax rely on exact indentation was a bad idea then.

    Having something that relies on your personal definition and preferences and forcing everybody else to do it the same way is even worse now. Especially if you want people to use the files that may not have learned to the complete extent on how to do it. I've been using computers now for since the early 80's. ZX Spectrum, C64, Amiga, PCs w/ DOS/Windows 3.11 and up/Minix/SCO/*IX/Linux, Apollo Domain Unix, SGI ... pr

    • by garry_g ( 106621 )

      P.S. - why do we need a "human-readable" files anyway? Who defines what "human readable" is anyway?

      These files need to be machine-readable, as they are typically configuration files. Just make sure you have decent editors that make the sources (e.g. XML, JSON, etc.) easily understandable and allow for simple, syntactically and structurally correct editing ...

      I hate the trend for "participation trophies" and dumbing down things like language, spelling, punctuation etc. in order to make it easier for the bott

  • by 1s44c ( 552956 )

    Is anyone else getting the impression that the CNCF are a bunch of hipster fake IT people that just use the CNCF name as a brand to pretend legitimacy?

    Anyone else getting that impression? Or is it just me?

  • by Greyfox ( 87712 )
    The problem isn't the markup, the problem is the engineering decisions required to build a program that does what you want are still hard, because programming is hard. Whatever markup you build for yourself or decide to use, you're still going to have to decide how to serialize and deserialize your objects and interact between the various component parts. You know, the hard bits. Having 30 markup languages or libraries, none of which do exactly what you want isn't helping. You still have to evaluate each on
  • How much fucking easier can you get? YAML is "drooling on yourself while eating paste" easy as it is, I'm pretty sure that it's far too much effort for far too little gain at this point to try for even easier (you know, diminishing returns and all that.)

    Seriously, at this point, the sanest option for "easier" is Microsoft Word or something functionally equivalent. I'd say Markdown but even that's more complicated than YAML, and Markdown is also "drooling on yourself while eating paste" easy.

UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever things. -- Doug Gwyn

Working...