Perfection Kills

by kangax

Exploring Javascript by example

Tag is not an element. Or is it?

June 1st, 2010

It’s interesting how widely some misconceptions spread around. The one I noticed recently is the “issue” of elements vs. tags. The problem is that people say tags when they mean elements, and do it so often that it’s not clear if the distinction is still relevant.

Or if anyone even cares anymore.

Elements vs. tags

If you look at section 3 of HTML 4.01 — “on SGML and HTML”, there’s an explicit note about elements not being tags. In HTML 4.01,
<p>foo bar</p> is an element, not a tag. An element consists of a start tag, content, and an end tag. In case of <p>foo bar</p>, <p> is a start tag, foo bar is content, and </p> is an end tag.

In other words, elements consist of tags.

Optional tags

The distinction between tags and elements becomes slightly less clear once we start dealing with elements that have optional tags, as defined by HTML 4.01. For example, <p> or <td> elements don’t have to have end tags. They could very well exist without them. When parser finds <p>foo bar in markup, it still creates an element. There’s no end </p> tag, but parser doesn’t really need it; start <p> tag already denotes what kind of element it is.


  <p>foo bar

  <tr>
    <td>baz
    <td>qux
  </tr>

But that’s not all.

Some elements, besides having optional end tags, have empty content model, which means that they can’t have any content at all. And when an element is not allowed to have any content and has an optional tag, it’s called an empty element. Not only are end tags optional in such elements, but they must be completely omitted. These, unfortunately, are not some obscure elements, but are very much useful ones like <br>, <link>, <img>, <input>, <meta> and few others.

What’s interesting is that <br> is still an element, only an element that consists of start tag only. It’s just that its content and end tag must never be present. The fact that <br>, <img> or other empty elements consist of start tags only, makes things rather confusing.

And we’re not even talking about elements with both tags optional — <html>, <head>, <body>. Those could exist without any visible traces at all, and are only created based on the context.


  <html>
    <!--
            There's no HEAD start tag, no HEAD end tag, and no HEAD content here.
            Yet, HEAD element is still created implicilty.
            This happens because content model of HTML element is defined as `head, body`,
            which means that both elements should be present in HTML element in that order.
            As soon as BODY start tag is found, even if HEAD tags are not present,
            HEAD element is created automatically.  -->
    <body>
    ...
    </body>
  </html>

Which confusion?

So which practical implications does this confusion actually have?

For one, saying something like “insert an image after a <p> tag” is ranging from “wrong” to “ambiguous”, since we can’t insert anything but a chunk of text after a <p> tag, and <p> tag can be either a start one (<p>) or an end one (</p>). In this case, a better way would be to say — “insert an <img> tag after a start <p> tag”:


  <p>
    <img ...> <!-- IMG tag is inserted after a start P tag -->
    ...
  </p>

in which case <img> element would become a child of <p> element. Or we could say — “insert an <img> tag after an end <p> tag”:


  <p>
    ...
  </p>
  <img ...> <!-- IMG tag is inserted after an end P tag -->

in which case <img> element would be a sibling following <p> one.

Of course, most of the time, what people really mean by “insert an image after a <P> tag” is a second version. It’s just that “element” is accidentally replaced with a “tag”. An even better way — and the one that avoids mention of tags in the first place — is to say “insert an <IMG> element after a <P> element”. This version leaves no room for incorrect interpretation.

Global confusion

What’s interesting about all this is not so much the finer points of difference between tags and elements, but just how widely this misconception prevails. Google search returns 480,000 results for “div tag”, but only 137,000 for “div element”. For an empty element, such as img, the difference is even scarier — “img tag” returns 959,000 results, while “img element” only 48,200. An element is confused for a tag everywhere, from blogs, articles, and mailing lists to books, references, and frameworks.

Pedantry or an important distinction?

Once you start thinking about the distinction, edges become somewhat blurry. Are all of the examples above really wrong?

When describing “image_tag”, Ruby on Rails documentation says “Returns an html image tag …”. The returned string — “<img …>” — can actually very well be considered an image (start) tag. Yes, the string represents an element, but since an element is empty, it’s also a string that consists of <img> tag only, and so can probably be called an “image” tag.

At the same time, “javascript_include_tag” already crosses the line of correctness. It still uses “Returns an html script tag, but already returns a string that can only be considered an element — “<script type=”text/javascript” src=”…”></script>”, since there’s now a start tag, content (empty), and an end tag.

w3schools is just plain wrong [1], saying things like “The <div> tag defines a division or a section in an HTML document.” or “The <div> tag is often used to group block-elements to format them with styles.”. Tags do not define division, they represent elements, and it is elements that have certain semantic meaning; in this case — division.

In some of the popular articles, we can find phrases like “… the nearer ancestor of our <footer> tag is the <body> tag …”, in which case it’s pretty clear that “tag” is not the right word at all; Tags can not be ancestors, but elements can.

However, saying that “browser supports <video> tag” is technically not wrong, since browsers supporting <video> element, most definitely can parse and understand <video> tags as well (it is by recognizing video tags that they are able to create video elements in DOM).

Speaking of DOM…

What about DOM?

Before I knew the difference between tags and elements, I would always think in terms of tags when talking about HTML, and in terms of elements when talking about DOM. It just made sense that HTML, being markup language, consists of tags, while HTML DOM — or rather, the document available for scripting — is a tree-like structure consisting of elements, and other kinds of nodes. I knew that browser parses HTML markup (and so tags), and then creates a tree-like structure to represent a document, in which case tags essentially become elements. The fact that elements are not just kinds of nodes, but are also chunks of text in markup seemed very strange when I first found out about it.

It seems that this is exactly how most of the people think about tags vs. elements. Tags exist in HTML (text), and elements – in document (DOM). This would explain why tags prevail in discussions about HTML, or markup in general; and why elements are mostly mentioned in context of scripting, rendering, etc.

Nevertheless, I believe that keeping terminology straight is important. Things should be called as they really are, to avoid the ambiguity that we’ve seen in the previous example. A method named something like forEachTag should not iterate over each element, and vice-versa; technical discussions, articles, and documentation should really strive to use proper terms.

What now?

The attempts at demystification were already made in the past, yet the effect is barely visible. So I wonder — why? Is it too unintuitive to speak in terms of elements in context of HTML, or is this a lack of explanation and exposure of the subject? Does the distinction even matter? Or does it matter in technical discussions only? Does it make sense to distinguish these two entities, or should we just try to infer the exact meaning based on the context, as it seems to be done right now? Are we all simply used to the word “tag”, and don’t care about the difference most of the time?

What do you think?

[1] …which is not surprising, considering the amount of other misconceptions on that site, such as classifying HTML comments as tags.

Categories: don'ts, html

Comments (21)

  1. Gravatar

    Mathias Bynens said:

    So I wonder — why? Is it too unintuitive to speak in terms of elements in context of HTML, or is this a lack of explanation and exposure of the subject?

    People misusing the terms ‘tag’ and ‘element’ just haven’t read this article yet. (Or, you know, those other two you linked to from back in 2004.)

    I agree correct nomenclature is important — I even like to be corrected whenever I make a mistake — so it’s definitely a good thing you’re bringing it to people’s attention once again.

  2. Gravatar

    Johan Arensman said:

    Very clear article! I’m building a reference website and took a look at w3schools etc. for the correct naming of items. Looking back I think I’ve made the right choice but that’s more luck than wisdom in my case. I’ll definitely keep this in mind when creating functions and variables in the future.

  3. Gravatar

    Mariusz Nowak said:

    I also understand element as object in DOM tree, and tag as a string in html/xml document.
    That’s why I’d rather say “element is described by tags” instead of “element consist of tags”.

    RoR documentation talks about output strings (their HTML processing is not object oriented) that’s why they refer to tags.

    I think we should speak “tags” only when talking about string output, and “elements” when thinking objects.
    Generally using “tags” for all cases looks not professional to me. It reminds old HTML days, back when JavaScript was world’s most misunderstood language ;-)

  4. Gravatar

    Thomas Broyer said:

    I wholeheartedly agree with Mariusz!

    Element is a notion of the Infoset (HTML4 was written before the Infoset was formalized, but it’s undoubtedly implicit that tags are parsed into a tree) while tag is related to how the Infoset is represented in HTML or XML (those being only *serializations* of an Infoset, i.e. a “bag of characters”).

    In XML, an element is always represented with a pair of start and end tags, or a single start-end tag; while in HTML (and SGML), some tags may be omitted (in some cases, such as a table’s tbody, both the start and end tags can be omitted, but the element still is present, just not visible when serialized to HTML).

    Hopefully, HTML5 makes it clearer (now that the Infoset and DOM have been formalized).

    (note that although HTML5 defines how to parse “tag soup”, i.e. things like <b>bold <i>italicized</b> text</i>, they in the end build a tree; this is only “error recovery”, this construct cannot come from serialization of an Infoset)

  5. Gravatar

    bobince said:

    It seems that this is exactly how most of the people think about tags vs. elements. Tags exist in HTML (text), and elements – in document (DOM).

    I wish it were that harmless! A lot of authors seem to have the idea that the HTML is the live version of the document, and that scripted changes directly alter the HTML. This leads to over-use of `innerHTML` rather than direct DOM properties/methods, and horrors like:

        element.innerHTML+= '<table><tr><td>';
        element.innerHTML+= 'Hello';
        element.innerHTML+= '</td></tr></table>';
        // oh no! why is Hello outside the table?
        // and why did my event handlers suddenly stop working?

    considering the amount of other misconceptions on that site, such as classifying HTML comments as tags

    And much worse stuff like the beginner-level misunderstandings of text handling, and scripting tutorials full of security holes.

    w3schools is a disaster area; it always pains me to see people linking to it as definitive. Anyone got a less consistently-wrong general-webdev resource I can redirect people to?

  6. Gravatar

    Paul Irish said:

    > Anyone got a less consistently-wrong general-webdev resource I can redirect people to?

    Bobince, the MDC is quickly becoming the defacto resource among my circles: https://developer.mozilla.org/en/HTML/Element

    It’s especially nice that anyone can edit the pages.

  7. Gravatar

    "Cowboy" Ben Alman said:

    I wonder if you could say that, in an HTML (or XML or XHTML or *ML) context, elements or nodes are delimited by tags, much like in JavaScript, strings literals are delimited by single- or double-quotes. Maybe that’s pushing it, since the tags are more than just delimiters.. but they are still this tangible structure, used only in definition-of-an-element-as-*ML.

    Still, I agree wholeheartedly that people should use correct terminology and best practices when presenting code or information, especially when used in a learning context, as minds absorbing information aren’t just taking in the direct subject of the article in question, but also the manner in which it is presented, along with whatever supporting language or code is used.

    Of course, I’ve considered this kind of thing before, because it really is a big deal. With the ever-growing “database” of not-necessarily-accurate that is the internet, we, the “professionals,” really need to take it upon ourselves to help people actually understand how things work.

  8. Gravatar

    Andrew Vit said:

    About using the word “tag” in method names: what if you try reading it as a verb, as in “please tag this content” (“image-tag this url”)? You could substitute “mark up” as a synonymous verb, and method names like “image_markup” would be no more or less correct. It’s all just shorthand.

    Method names are often chosen to describe the process, not the result. It’s the addition of tags (process) that implicitly creates the elements (result).

  9. Gravatar

    Joel said:

    Glad someone else feels the same!

    As you’ve eloquently stated: the tag refers to the characters, the element to the meaning that is wrapped around it.

    Hence there are HTML Elements (as Paul Irish linked to with the MDC page), and they are implemented using tags – either a single self closing tag, such as to create and image element as mentioned, or with a separate closing tag, such as a paragraph.

  10. Gravatar

    html said:

    tags and elements are different, sure!

  11. Gravatar

    Bobby Jack said:

    Nice article! Once we’re done with this one, can we move on to getting people not to say “alt tags” ;-)

  12. Gravatar

    Animal said:

    I agree with the article. Element and tag mean completely different things. They exist in wholly different contexts.

    A tag is a textual string exists in textual HTML (or XML) source code.

    Tags do not exist in a document. Only elements (and other Nodes)

    It’s the same issue people have with “JSON object” in javascript which I’ve seen a lot. JSON is a string. Tags are strings.

  13. Gravatar

    Dave Doolin said:

    I’m really happy to see someone else carrying this torch. Personally, I prefer precision whenever possible.

    But, as you note, people concerned about this are in the minority. Google results prove it.

    I’ve made my peace with it. Language changes, evolves, and the real meaning of a word in any social sense that matters is the meaning ascribed, implicit or not, by the group (or really, by whoever sets the discourse).

  14. Gravatar

    Troy III said:

    1.
    cite:
    &ltp>foo bar&lt/p> is an element, not a tag. An element consists of a start tag, content, and an end tag. In case of &ltp>foo bar&lt/p>, &ltp> is a start tag, foo bar is content, and &lt/p> is an end tag.

    In other words, elements consist of tags.

    Perhaps in xhtml!
    In HTML, &ltP> is the Tag of Element P
    P is still an element regardless of a closing tag or content. &ltDIV> is the Tag of DIV Element regardless of its content or a closing tag. A higher level block element will terminate/close it automatically, although there is only one element strong enough to do that.

    2.
    cite:
    “For one, saying something like “insert an image after a &ltp> tag” is ranging from “wrong” to “ambiguous” “.

    You can’t insert an inline element &ltimg> [equivalent of a character] “after” a line-break element &ltP> – but inside of it. The P element expands indefinitely, There is no After until you break it with another block level element. Or arbitrarily close it with its optional end-mark &lt/P>.

    3.

    cite:
    At the same time, “javascript_include_tag” already crosses the line of correctness. It still uses “Returns an html script tag”, but already returns a string that can only be considered an element — “&ltscript type=”text/javascript” src=”…”>&lt/script>”, since there’s now a start tag, content (empty), and an end tag.


    See the point made under 1.
    In addition to it, – returned is a Tag not an Element; plus it is a string, – meaning it’s an HTML code containing a Script [markup] Tag. And it cannot be considered something that it isn’t.

    4.
    cite:
    “However, saying that “browser supports &ltvideo> tag” is technically not wrong, since browsers supporting &ltvideo> element, most definitely can parse and understand &ltvideo> tags as well (it is by recognizing video tags that they are able to create video elements in DOM).”

    In fact notting that the “browser supports &ltvideo> tag” is plain wrong!
    Any browser will support every Tag possible including &ltmytag>, &ltanytag> etc. The question is: will it support the video Element?
    -Perhaps ‘recognizing’ is more appropriate word in this case.

  15. Gravatar

    Gunnar Bittersmann said:

    Full ACK. I can’t count how many times I have already pointed to Jens Meiert’s article saying the same in plain German.

    BTW: How many accessabilistas does it take to fix a missing alt tag? Ten. One to fix it, nine to complain that they’re called alt attributes. ;-)

    As for w3schools.com, I wish somebody would finally capture the site and redirect to w3fools.com.

  16. Gravatar

    James said:

    Hi,
    Great blog – would you be able to send me an email so I can discuss the possibility of you joining the DZone MVB program. See here for more details.

    Regards
    James

  17. Gravatar

    Alistair B said:

    I agree with Mariusz. I don’t think <p>asdf</p> is an element, it becomes an element when it is loaded into a DOM tree. I think it is correct to talk about the ‘div tag’ or ‘using an img tag’, because the tags are the tools we use to get the browser (or whatever) to create elements. The contents of a p element are not part of the tool (the tag) itself, but are how we use it. Perhaps you could even say that there are 2 meanings for tag. div tag can be the tool, or div tag can be the actual text in an xml document.

    ‘The fact that elements are not just kinds of nodes, but are also chunks of text in markup seemed very strange when I first found out about it.’ Is there a reference that supports this? As, as you say, most developers don’t seem to talk in this way.

  18. Gravatar

    kangax (article author) said:

    @Alistair B
    The reference is linked to from the first paragraph. In HTML4.01 it’s in this section. In HTML5, it’s in this one.

  19. Gravatar

    Alistair B said:

    Thank you

    The HTML4 reference does appear to be in line with my thinking referring to ‘element types’ eg. ‘Each element type declaration generally describes three parts: a start tag, content, and an end tag.’ (Note: it doesn’t say the ‘element’ but the ‘element type declaration’ suggesting that this by itself is not an element until the declaration has been processed and loaded into the DOM).

    The HTML5 reference seems to have changed its terminology a bit and makes no mention of ‘element type declaration’. It also talks about using elements eg. ‘Contexts in which this element can be used’ (so I am clearly wrong there). Still, it doesn’t really explicitly state that elements are also chunks of text in markup.

Trackbacks

  1. What the Heck is an HTML TITLE element? | Website In A Weekend said:

    [...] 12/12/2010: Perfection Kills has a great article on the difference between HTML tags and elements.] Would you like more? Send me a [...]

  2. CSS Vocabulary said:

    [...] is similar confusion with HTML terms. My hope is this makes it easier for everyone to communicate exactly what they mean when taking [...]

Leave a Comment

Please, don't forget to escape your input (<, > and &). Wrap code sections with <pre>

Allowed tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>