It’s been two decades since Extensible Markup Language (XML), a markup language for encoding documents, made its debut. While XML has enjoyed considerable success during that time, many programmers (including myself) dislike it.
It’s useful to keep in mind the differences between a programming language (like Python, for instance) and markup languages such as XML, which are used to describe data and manage data structures and static user interfaces. In many ways, XML is similar in some ways to HTML, although it’s more powerful because it can add context to data. If you’re building a web page, and you need to define elements such as buttons and images, you’re going to need to know your way around XML.
XML was intended as backwards compatible with the Standard Generalized Markup Language (SGML); this makes it a lot heavier than necessary for just exchanging structured data between programs. Although designed as human- and machine-readable, XML ends up more readable to software than humans—but even worse, it bulks up data to an excessive degree. (It’s also fiddly to edit.)
Despite its flaws, XML remains commonplace in many businesses. In my last job maintaining oil-trading software, traders entered trades into a client application. The trade data was sent to the server in XML, where it was converted into SQL and run against a database. Some trades could have a large amount of data, so sending them as XML made for a lot of traffic. It worked, but I always felt it could have been quicker had some smaller transmission format been utilized, instead.
Given its pervasiveness, XML will probably be around for some time to come. For instance, it’s used a lot in .NET, in config and other files. The problem of handling structured data is not a new one; this wheel has been reinvented many times. The most obvious and well-known XML alternative is probably JSON; here are some others you might not have heard of:
YAML (Yet Another Markup Language)
YAML is a data serialization language with an emphasis on human readability; it’s better at that than JSON, although parsing still requires some effort. In any case, YAML is simpler than XML. Here’s part of an example:
--- !<tag:clarkevans.com,2002:invoice> invoice: 34843 date : 2001-01-23 bill-to: &id001 given : Chris family : Dumars address: lines: | 458 Walkman Dr. Suite #292 city : Royal Oak state : MI postal : 48046 ship-to: *id001
There are YAML libraries provided for C/C++, Ruby, Python, Java, Perl, C#, Golang, PHP,OCaml, JavaScript, and others.
Protocol Buffers
Created by Google, Protocol buffers are (in the company’s) words: “XML, but smaller, faster, and simpler.”
Protocol buffers rely on a different approach to editing files. You create a specification for your data to be serialized in a .proto file. On the overview page, there’s a simple example of XML:
John Doe jdoe@example.com
And then the Protocol buffer equivalent:
person { name: "John Doe" email: "jdoe@example.com" }
When you compile this with Google’s protocol buffer compiler, it generates code for your language. Protocol buffers currently support generated code in Java, Python, Objective-C, and C++; with the new proto3 language version, it also works with Go, JavaNano, Ruby, and C#.
The protocol version would probably be 28 bytes long and take around 100-200 nanoseconds to parse, compared to 69 bytes for the XML version (minus whitespaces) and 5,000-10,000 nanoseconds to parse.
AXON
More akin to JSON, AXON combines the best of JSON, XMl and YAML. The code is for Python, with source and examples of use on Github. Like YAML, it uses indentation to distinguish hierarchic levels:
statement form formatted expression form axon name: "AXON is eXtended Object Notation" short_name: "AXON" python_library: "pyaxon" atomic_values int: [0 -1 17] float: [3.1428 1.5e-17] decimal: [10D 1000.35D -1.25E+6D] bool: [true false] string: "abc ??? ???" multiline_string: "one
ConfigObj
Though not updated since 2014, the Python ConfigObj is handy for creating and reading configuration files. There’s in-depth documentation on Readthedocs.
It produces files like the following; this is a Key-Value system, combined with indentation for hierarchy levels:
# initial comment keyword1 = value1 keyword2 = value2 [section 1] keyword1 = value1 keyword2 = value2 [[sub-section]] # this is in section 1 keyword1 = value1 keyword2 = value2
OGDL
Short for Ordered Graph Data Language, OGDL is another format that writes trees or graphs of text and uses indentation. OGDL is simple and clean. Here’s an example:
eth0 ip 192.168.1.1 gateway 192.168.1.10 mask 255.255.255.0 timeout 20
There are implementations for C, Go, Java and Perl. In Go, you would read it with code like this:
g := ogdl.FromFile("config.g") ip := g.Get("eth0.ip").String() to := g.Get("eth0.timeout").Int64(60)
Further XML Exploration
During the research for this article, I found a link to a now-defunct webpage (still available in the Wayback Machine) that links to a page with 26 XML alternatives, including some I’ve covered here. A few links will have rotted, but it might be still worth a look.
If you’re an Android developer, check out this extensive Android Authority walkthroughof how XML can work for you; it covers everything from syntax to the language’s use outside of layout files. And it’s also worth noting how Google regards sitemap-related XML. Although language alternatives exist, it’s always good to know the fundamentals of the language itself, given its pervasive use. (Also, make sure to check out our XML vs. JSON Comparison.)
You missed HJSON (hjson.org).
YAML == YAML Ain’t Markup Language…. it literally says so on that URL you linked……..
There is also ELDF (eldf.org).