Hopefully at this point you're thinking
If you're not thinking this, either you weren't paying attention or we failed ;-)
TimBL has delivered unto us the four commandments of Linked Data
Use URIs as names for things
Use HTTP URIs so that
people can look up those names.
When someone looks up a URI provide
useful information, using the standards (RDF, SPARQL)
Include links to other URIs so that they
can discover more things.
this encourages the re-use of your data in wonderful and unexpected ways
Tom Heath from Talis has outlined the following steps
Hopefully we've covered this more or less
It really is all about applying TimBL's 4 rules to encourage re-use of data
for now let's just remember that...
There are plenty of manifestos and guides to read linkeddata.org is a good place to start
Cool, we understand the principles
Next we have a good think about what are the things in our data we want to publish
Try to avoid re-inventing ontologies - have a look around and see what might work
MO is great but maybe we want some shortcuts
We're short-circuiting FRBR
This will shock and appall Yves
But anyone can say anything on the Web of Data
if you're gonna hack, hack it responsibly using open.vocab - this way all our terms dereference and we still obey rule 3
Cool, we understand our data
choosing URIs for your data is arguably the most important step
Of course we'll use HTTP URIs
in case you weren't sure
and remember cool URIs don't change
Use a namespace that you control
try to keep the URIs free of clutter such as file extensions
Decide whether you want hash or slash URIs
This is really a matter of preference but probably better to stick with one or the other
Note if you have a very small dataset, the hash URI gives you the option of publishing a flat file (which is really easy)
| http://dbpedia.org/resource/New_York_City | ← Thing |
| http://dbpedia.org/data/New_York_City | ← RDF data |
| http://dbpedia.org/page/New_York_City | ← HTML page |
| http://revyu.com/people/tom | ← Thing |
| http://revyu.com/people/tom/about/rdf | ← RDF data |
| http://revyu.com/people/tom/about/html | ← HTML page |
| http://www.bbc.co.uk/music/artists/db4624cf#artist | ← Thing |
| http://www.bbc.co.uk/music/artists/db4624cf.rdf | ← RDF data |
| http://www.bbc.co.uk/music/artists/db4624cf.html | ← HTML page |
note these URIs have been cut short to fit on this slide
Cool, we've picked out some cool URIs
RDFa is an increasingly popular way to publish Linked Data
If you have a rather small and static dataset (like an ontology)
and you've decided to use a hash URI system (or slash and mod_rewrite)
it's easy!
now only one HTTP request is needed to grab all your data
(good if it's small, bad if it's large)
Linked Data content negotiation can be a bit tricky (ask the BBC guys)
but don't worry
There's plenty of tools already out there to help
we'll discuss a few of them shortly...
There are several really good open-source triple stores
In our example we're using 4Store
Our workflow is rather simple:
We use various Python scripts to generate various bits of the RDF
We use Sonic Annotator to generate audio features RDF
4Store gives us several options loading the RDF including a command line utility
$ 4s-import mytriplestore /path/to/myfile.rdf
but...
when we look up http://classical.catfishsmooth.net/resource/track/842 we get:
that is not cool
We've picked out cool URIs and just dumped them in our 4Store
but they don't dereference
don't worry, Pubby is a good solution
it's a Java-based server app that pulls RDF descriptions from a SPARQL endpoint and handles content negotiation in a nice Linked Data fashion
caveat: Pubby doesn't actually work with 4Store "out of the box" we had to do some hacking because 4Store can't handle describe queries (yet)
get RDF with curl:
$ curl -I -H "Accept: application/rdf+xml" \
http://classical.catfishsmooth.net/resource/track/842
Cool, we've got our infrastructure
When we're talking about music and Linked Data we're almost always talking about MusicBrainz
they provide URIs for artists, tracks, and albums
If your dataset deals with any of these things, you will want to create
owl:sameAs links to Musicbrainz URIs in your dataset
<http://classical.catfishsmooth.net/resource/artist/123>
owl:sameAs
<http://musicbrainz.org/artist/123> .
luckily there are some tools to help
Now we've just got to publicize our data
Help others discover and index your data
Apply a license or waiver to your data set