These forums are currently read-only due to receiving more spam than actual discussion. Sorry.

It is currently Sat Dec 02, 2017 4:06 pm Advanced search

Attribute for arbitrary data

Do you think the HTML spec should do something differently? You can discuss spec feedback here, but you should send it to the WHATWG mailing list or file a bug in the W3C bugzilla for it to be considered.

Attribute for arbitrary data

Postby Cerbera » Mon May 14, 2007 6:36 am

The Problem
Seeking a long-term solution for the problems outlined in Web Standards Project: hAccessibility. Namely, human-unfriendly text being placed in title attributes due to there being no attribute for arbitrary data in HTML4.

Background
The Microformats community are producing ways to interlace web content with small amounts of machine-readable information. But there are problems:
  • HTML4 has many attributes for machine-readable information (such as href and cite) but most elements are not allowed them.
  • A suitable attribute does not exist for some types of data (e.g. digital timestamps and geographic locations).
Until now, Microformats have tended to put arbitrary data in a title attribute.

title in HTML4
However, the title attribute is explicitly defined as being for human-readable information in HTML4:
W3C wrote:title = text [CS]
This attribute offers advisory information about the element for which it is set.

(Source: HTML4: 7.4.3 The title attribute.)
The "text" link points to this:
W3C wrote:6.3 Text strings

A number of attributes ( %Text; in the DTD) take text that is meant to be "human readable".

(Source: HTML4: 6.3 Text Strings.)
So it seems arbitrary data is not currently permitted in title attributes. Given the problems it can pose to users (as outlined by the Web Standards Project) this seems like a useful constraint.

Meta Data in HTML4
HTML4 does define a mechanism for supplying arbitrary data. The <meta> element:
W3C wrote:
Code: Select all
<!ELEMENT META - O EMPTY               -- generic metainformation -->
<!ATTLIST META
  %i18n;                               -- lang, dir, for use with content --
  http-equiv  NAME           #IMPLIED  -- HTTP response header name  --
  name        NAME           #IMPLIED  -- metainformation name --
  content     CDATA          #REQUIRED -- associated information --
  scheme      CDATA          #IMPLIED  -- select form of content --
  >

(Source: HTML4: The META element.)
However, this cannot be interlaced with content and that is the purpose of Microformats.

Proposed Solution
I propose creating an attribute for arbitrary data:
  • Values for this attribute MAY be convenient for humans to read but this is NOT REQUIRED.
  • It MAY be presented to users but this is NOT REQUIRED (like how href is presented in GUI status bars whilst cite is not).
  • The attribute could logically be called content as its purpose is similar to <meta content> in HTML4. Other names might be better.
  • The attribute would be allowed on lots of elements, perhaps any element permitted in <body>?
This would provide Microformat authors with a place to put arbitrary data which doesn't fit in existing allowed attributes whilst still using the nearest semantically correct HTML element.

Allowing the scheme attribute to the same elements may be useful. Its purpose would be to carry the Microformat class value to prevent class-based selection (e.g. via CSS or DOM) in cases where this may be undesirable. This purpose is similar to <meta scheme> in HTML4.


Solution Examples

Digital Timestamps
Code: Select all
The party is at
<abbr class="dtstart" title="20051010T10:10:10-0100">10 o'clock on the 10th</abbr>.
Would become:
Code: Select all
The party is at
<time class="dtstart" datetime="20051010T10:10:10-0100">10 o'clock on the 10th</time>.
Or:
Code: Select all
The party is at
<time scheme="dtstart" datetime="20051010T10:10:10-0100">10 o'clock on the 10th</time>.
("10 o'clock on the 10th" is not literally an abbreviation, so <abbr> seemed inappropriate. HTML5's <time datetime> element seems like a better match than HTML4 offers.)

Digital Geographic Locations
Code: Select all
<abbr class="geo" title="30.300474;-97.747247">Austin, Texas</abbr>
Would become:
Code: Select all
<span class="geo" content="30.300474;-97.747247">Austin, Texas</span>
Or:
Code: Select all
<span schema="geo" content="30.300474;-97.747247">Austin, Texas</span>
("Austin, Texas" is not literally an abbreviation, so <abbr> seemed inappropriate.)
Cerbera
<h4>
 
Posts: 34
Joined: Wed Feb 21, 2007 1:04 pm

Postby zcorpan » Mon May 14, 2007 11:36 am

(Also see the thread starting with http://lists.whatwg.org/pipermail/whatw ... 10800.html )
zcorpan
<article>
 
Posts: 807
Joined: Tue Feb 06, 2007 8:29 pm
Location: Sweden

Postby Cerbera » Tue May 15, 2007 12:56 am

The x_ prefix seems like a good idea for naming. It's similar to experimental MIME types, language codes and vendor-specific CSS.
Cerbera
<h4>
 
Posts: 34
Joined: Wed Feb 21, 2007 1:04 pm

Postby JoeGermuska » Sun Jun 17, 2007 12:09 pm

This is a general problem in which I'm interested.

I would argue for some other attribute name than "content", because the content of an element is what is between the tags. I assume content was suggested by analogy to the meta tags, but I don't think that is reason enough to use it.

Perhaps "meta"?

I'm also not sure how I feel about the "scheme" attribute. It seems limiting, especially if you had more than one kind of meta information. If you have a single attribute with syntax like "IDREF" where it could contain any number of whitespace separated metadata-points, you wouldn't have to worry about coordinating between the two elements.


if, for example, you wanted to have a unique identifier for the place (in this case, from the Getty Thesaurus of Geographic Names) but also wanted to provide a geocode.

Code: Select all
<span meta="geo:30.300474;-97.747247 getty-id:7013346">Austin, Texas</span>


This idea that authors would add disambiguating unique IDs to elements in documents is the real use case I had in mind when I put a "watch" on this thread. Of course, the problem of managing and interpreting those ids is a thing of its own (and not really an HTML5 problem), but if there were a place to put them, it would be a good starting point.

I'm not sure if I think the x_ style is appropriate to HTML5. It doesn't quite feel like a good fit. But I haven't spent much time thinking about it yet...
JoeGermuska
<h6>
 
Posts: 1
Joined: Wed May 23, 2007 7:07 pm
Location: Chicago, IL

Postby jheacock » Fri Mar 07, 2008 5:01 pm

Why not relax the specification on the existing <meta /> element to allow it to be used as an inline (display:none; ) element in the body of the document.
Note the 'http-equiv' attribute would have no meaning in this context.

Current browsers could continue to ignore it as invalid, while any newer browser could include the node in the DOM for access by scripts and microformat processors. The use of the three attributes (name, content, and scheme) allows easy extension for future uses without making a change to the HTML5 specification.
jheacock
<h6>
 
Posts: 1
Joined: Fri Mar 07, 2008 3:25 pm
Location: New Jersey, USA

Postby zcorpan » Tue Nov 10, 2009 2:59 pm

<meta> and <link> are allowed in <body> now when used together with microdata.
zcorpan
<article>
 
Posts: 807
Joined: Tue Feb 06, 2007 8:29 pm
Location: Sweden


Return to Feedback on the Specs

Who is online

Users browsing this forum: No registered users and 1 guest