Microdata Tutorial

Last updated:

What is Microdata?

Microdata is a way of embedding machine-readable information in your web page. More specifically, it's a way of instructing a machine how to extract information that already exists in your web page.

Why would you want to do this? There are lots of reasons. Here a few possible ones:

  • You can mark up the date of an event, so a browser can automatically offer to add it to a visitor's calendar.
  • You can mark up your contact information, so a browser can automatically offer to add it to a visitor's address book.
  • You can mark up a restaurant review on your blog, so search engines can find it, associate it with the correct restaurant, and show it when people search for that restaurant.
  • You can mark up your blog post, so feed readers can just look at your HTML and automatically extract a feed from it.
  • You can mark up your resume, so that when you use a site like LinkedIn you can just point it at your resume page and it can automatically extract all the information it needs.
  • You can mark up a recipe, so a user's shopping-list app can discover what ingredients are needed and add them to the week's shopping list.

There are many ways in which making information easier for a machine to discover and read can potentially be useful. Many of these uses won't be obvious at first, but will arise naturally as more data exists on the web to discover. Microdata allows you to easily and simple indicate in your webpage where this data is. You typically won't have to redesign your page at all to add Microdata to it; at most, a few <span>s wrapping particular bits of information may be necessary. By keeping the information in the page and then just indicating how to extract it, Microdata ensures that the data is as fresh as possible - there's no way for you to update some information on the page and accidentally forget to update the corresponding entry in some data structure elsewhere.

How do I use Microdata?

Basic usage of microdata is extremely simple, and requires only two attributes, itemscope and itemprop.

itemscope

itemscope is a binary attribute. It doesn't have an = sign or a value after it; you just put it in the element and leave it alone. It indicates that an element is a Microdata Container, which means that it contains some data inside of it that is all related. For example, the name of a restaurant, its location, your rating, and your review comments are all different pieces of data that are related together to form your review as a whole. The closest element in your markup that contains all of these things should be your container, and should receive the itemscope attribute.

Example:

<h1>My blog!</h1>
<article itemscope>
	<h2>I went to McDonalds; a Review</h2>
	<p>I went to the McDonalds on Main Street, and it sucked!</p>
	<p>0/5, would not go again.</p>
</article>

itemprop

Inside of a Microdata Container, you have various bits of information that you want to express in a machine-readable way. You indicate these pieces of data using the itemprop attribute. The itemprop attribute takes the name of the property as its value. For example, if you are reviewing a restaurant, you would use itemprop=rating to indicate that this bit of text is a rating.

Example:

<h1>My blog!</h1>
<article itemscope>
	<h2>I went to McDonalds; a Review</h2>
	<p>I went to the <span itemprop=item>McDonalds on Main Street</span>, and it sucked!</p>
	<p><span itemprop=rating>0/5</span>, would not go again.</p>
</article>

Once you've indicated where the bit of data is (by giving it a container element) and what type of data it is (by giving the container an itemprop attribute), how does the machine know what the data is? To make things easy, Microdata gives several easy ways to indicate exactly what the data you're specifying is, based on the type of element that itemprop appears on.

If itemprop appears on a:

element that also has the itemscope attribute : The 'data' is a Microdata item itself, with its own bunch of data. For example, in your restaurant review, you may include the location of the restaurant, which uses the hcard microformat to encode the address. This overrides the other things that could specify the data (below).

<meta> element : The data is whatever appears in the content attribute of the <meta> element. This is used when you need to include some data that isn't actually in the text of your page; for example, maybe you didn't actually put the date you went to McDonalds in your blog post, but you want to indicate that in your official review so other people know exactly when it sucked.

<a>, <area>, <audio>, <embed>, <iframe>, <img>, <link>, <object>, <source>, or <video> element : The data is the url in the element's href, src, or data attribute, as appropriate. This way you can easily include include a picture of yourself in your resume, and let machines know what the link to that picture is without having to actually write the link out in the text of your page.

<time> element : The data is the time in the element's datetime attribute. This lets you, for example, just say "last week" in your blog post about when you went to McDonalds, but still indicate the exact date you went.

anything else : The data is whatever's in the text of the element.

You can have multiple things with the same itemprop in a single Microdata Container (they just all get popped into an array together), and you can have a single element with multiple itemprop values on it (if you're using the hCard vocabulary to mark up a business's location and contact information, you'd write the business name as <span itemprop="fn org">Foo Corp</span>).

That's it!

Yup, Microdata is that simple. Pop an itemscope on some container to group together bits of data into a coherent whole, put itemprop on elements to indicate where the data is and what type of data it is, and you're done. A machine can now go through and find all the Microdata on your page and extract it.

For example, the Microdata in the previous example would be extracted into this bit of JSON:

{
  "items": [
    {
      "properties": {
        "item": [
          "McDonalds on Main Street"
        ],
        "rating": [
          "0/5"
        ]
      }
    }
  ]
}

But wait, there's more

You didn't really think that was all, did you? What I described above is indeed all that you need to use Microdata, but there is more if you need it.

itemtype - What type of data do you have?

itemprop gives a name to a particular piece of data. But how do you tell the computer what the type of the entire blob of data is? For example, how does the computer know that you just wrote a review? It could possibly try to guess that you did, based on the types of itemprops you specified, or the text in your page. But if you want to make the job a bit easier, you can specify this manually with the itemtype attribute.

You can put itemtype on any element that already has an itemscope attribute. The value of the itemtype attribute can potentially be anything - a name, a url, a globally unique number - the individual vocabulary that defines what sort of itemprop values are appropriate will define what its itemtype should be. For example, if you're using the hCard vocabulary to mark up your name and contact information, you would use itemtype="http://microformats.org/profile/hcard" on your Microdata Container to indicate that, hey, this data is an hCard.

Example:

<h1>My blog!</h1>
<article>
	<!-- posts and stuff -->
</article>
<address itemscope itemtype="http://microformats.org/profile/hcard">
	<p>My name is <span itemprop=fn>Tab Atkins</span>, and I wrote this blog.</p>
	<p>You can contact me <a itemprop=email href="mailto:jackalmage@gmail.com>by email</a>, if you want.</p>
</address>

itemid - What are you talking about?

With some types of data you want it to be clear exactly what you're referring to. For example, if you're writing a book review and marking it up with Microdata, you want it to be easy for a search engine to discover exactly what book it is you're reviewing, so it can show your review when people search for that book. The itemid attribute exists for this. If you are using a particular vocabulary (indicate it with itemtype!) to mark up some data, and this vocabulary has some way to refer to unique identifier for the subject of it, you just put that in the itemid attribute on your Microdata Container. For example, a book review using the hReview vocabulary can use itemid to point to the ISBN of the book, precisely identifying the book you're reviewing.

itemref - But my page is organized wrong!

Microdata works by grouping together a bunch of properties into a Microdata Container - some element that contains a bunch of things with itemprops on them. But sometimes your page doesn't make this easy. For example, you may be listing out a bunch of concert dates for the bands you like, with each concert filling in a column in a table. If each concert was a row in the table, this would be easy - just put itemscope on the <tr> and you're done. But there is no element that acts as a good Microdata container for a single concert - the closest container is the <table> element, and it contains all the concerts.

In these cases, you can use itemref to refer to elements that aren't inside your Microdata Container, but which contain itemprop data that should be included. The itemref attribute goes on your Microdata Container element (the one with the itemscope element) and has a space-separated list of ids of elements that should be included.

Example:

<h1>Upcoming concerts!</h1>
<table>
	<col itemscope itemref="a0 a1 a2"></col>
	<col itemscope itemref="b0 b1 b2"></col>
	<tr>
		<th>Who</th>
		<td id=a0 itemprop=band>Tub Ring</td>
		<td id=b0 itemprop=band>Cake</td>
	</tr>
	<tr>
		<th>When</th>
		<td id=a1><time itemprop=date datetime=2010-01-01>The Big Fool Day!</time></td>
		<td id=b1><time itemprop=date datetime=2010-12-25>Jingle Bells!</time></td>
	</tr>
	<tr>
		<th>Where</th>
		<td id=a2 itemprop=location>Atlantis!</td>
		<td id=b2 itemprop=location>The North Pole!</td>
	</tr>
</table>

That Really is it

That's all there is to Microdata. The syntax is simple and easy, but gives you some powerful tools if you need them. Most of the work is in developing and spreading good vocabularies. The Microformats community does a great job of this, and have a ton of vocabularies ready-made to help you mark up important data. Head over there to see if there's something you can use, or join them to help develop something new that they're missing!

(a limited set of Markdown is supported)