5 XML Alternatives to Consider in 2018

It’s been two decades since Extensible Markup Language (XML), a markup language for encoding documents, made its debut. While XML has enjoyed considerable success during that time, many programmers (including myself) dislike it.

XML was intended as backwards compatible with the Standard Generalized Markup Language (SGML); this makes it a lot heavier than necessary for just exchanging structured data between programs. Although designed as human- and machine-readable, XML ends up more readable to software than humans—but even worse, it bulks up data to an excessive degree. (It’s also fiddly to edit.)

Despite its flaws, XML remains commonplace in many businesses. In my last job maintaining oil-trading software, traders entered trades into a client application. The trade data was sent to the server in XML, where it was converted into SQL and run against a database. Some trades could have a large amount of data, so sending them as XML made for a lot of traffic. It worked, but I always felt it could have been quicker had some smaller transmission format been utilized, instead.

Given its pervasiveness, XML will probably be around for some time to come. For instance, it’s used a lot in .NET, in config and other files. The problem of handling structured data is not a new one; this wheel has been reinvented many times. The most obvious and well-known XML alternative is probably JSON; here are some others you might not have heard of:

YAML (Yet Another Markup Language)

YAML is a data serialization language with an emphasis on human readability; it’s better at that than JSON, although parsing still requires some effort. In any case, YAML is simpler than XML. Here’s part of an example:

--- !<tag:clarkevans.com,2002:invoice>
invoice: 34843
date   : 2001-01-23
bill-to: &id001
    given  : Chris
    family : Dumars
    address:
        lines: |
            458 Walkman Dr.
            Suite #292
        city    : Royal Oak
        state   : MI
        postal  : 48046
ship-to: *id001

There are YAML libraries provided for C/C++, Ruby, Python, Java, Perl, C#, Golang, PHP,OCaml, JavaScript, and others.

Protocol Buffers

Created by Google, Protocol buffers are (in the company’s) words: “XML, but smaller, faster, and simpler.”

Protocol buffers rely on a different approach to editing files. You create a specification for your data to be serialized in a .proto file. On the overview page, there’s a simple example of XML:

  
    John Doe
    jdoe@example.com
  

And then the Protocol buffer equivalent:

person {
  name: "John Doe"
  email: "jdoe@example.com"
}

When you compile this with Google’s protocol buffer compiler, it generates code for your language. Protocol buffers currently support generated code in Java, Python, Objective-C, and C++; with the new proto3 language version, it also works with Go, JavaNano, Ruby, and C#.

The protocol version would probably be 28 bytes long and take around 100-200 nanoseconds to parse, compared to 69 bytes for the XML version (minus whitespaces) and 5,000-10,000 nanoseconds to parse.

AXON

More akin to JSON, AXON combines the best of JSON, XMl and YAML. The code is for Python, with source and examples of use on Github. Like YAML, it uses indentation to distinguish hierarchic levels:

statement form	formatted expression form
    axon
      name: "AXON is eXtended Object Notation"
      short_name: "AXON"
      python_library: "pyaxon"
      atomic_values
        int: [0 -1 17]
        float: [3.1428 1.5e-17]
        decimal: [10D 1000.35D -1.25E+6D]
        bool: [true false]
        string: "abc ??? ???"
        multiline_string: "one

ConfigObj

Though not updated since 2014, the Python ConfigObj is handy for creating and reading configuration files. There’s in-depth documentation on Readthedocs.

It produces files like the following; this is a Key-Value system, combined with indentation for hierarchy levels:

# initial comment
keyword1 = value1
keyword2 = value2

[section 1]
keyword1 = value1
keyword2 = value2

    [[sub-section]]
    # this is in section 1
    keyword1 = value1
    keyword2 = value2

OGDL

Short for Ordered Graph Data Language, OGDL is another format that writes trees or graphs of text and uses indentation. OGDL is simple and clean. Here’s an example:

eth0
  ip
    192.168.1.1
  gateway
    192.168.1.10
  mask
    255.255.255.0
  timeout
    20

There are implementations for C, Go, Java and Perl. In Go, you would read it with code like this:

g := ogdl.FromFile("config.g")
ip := g.Get("eth0.ip").String()
to := g.Get("eth0.timeout").Int64(60)

Further Exploration

During the research for this article, I found a link to a now-defunct webpage (still available in the Wayback Machine) that links to a page with 26 XML alternatives, including some I’ve covered here. A few links will have rotted, but it might be still worth a look.

Related