From Algorithm to Structured-Data

An introduction to using RDF in audio analysis

Overview: The tools we use

Overview: Ontologies

  • Audio Feature Ontology (AF):
    An extension of MO for publishing content-based audio descriptors.
  • Vamp Plugin Ontology:
    Describe audio feature extractors in RDF.
  • Vamp Transform Ontology:
    Vamp feature extractor configuration (plugin parameters).
  • LV2 Plugin Schema:
    Describe audio processor plugins and their input/output requirements.

Audio Features in RDF?   Why bother?

1.69004874e+00 3.03187215e-01 1.33318623e+00 -1.30552702e-02 8.32073038e-01 -3.62798702e-01 2.55154253e-01 -2.42700580e-01 4.05045464e-02 -2.58310827e-01 4.68943617e-01 1.77893445e-01 3.98363671e-01 -1.06080685e-01 2.08837166e-01 9.14348219e-02 -4.25139800e-01 -2.61554461e-01 2.11715669e-01 2.43940887e-01 -8.34046236e-05 -3.30750712e-01 -2.83539983e-01 -2.72162695e-01 -1.63518323e-01 -3.24927376e-01 -2.15598345e-01 3.20141531e-02 1.94742101e-01 1.04970093e-01 5.77336165e-02 -1.33649254e-01 4.05435412e-03 -2.40341400e-01 -2.13628811e-01 5.67761768e-02 -7.14260596e-03 82.74982802 103.48549016 54.35183501 41.91234708 62.91679459 45.55888886 18.39474018 22.20581866 3.70123911 8.00125797 19.38406733 12.98726784 10.33309185 9.88495179 14.79919427 12.59098558 8.14752679 13.33978282 19.40150629 8.75662233 8.96523693 13.06456267 15.09590791 17.74723727 14.89982264 15.71060977 22.70530114 26.39947967 21.06203171 30.87444089 39.83333877 57.93335017 56.53057888 53.4724693 52.11037484 73.067626 75.25670636 50.9927027 37.25109373 23.1124413 6.74092448e-01 1.52131089e+00 -6.58556692e-02 4.16046037e-01 -6.71808209e-01 -1.75452651e-01 -4.90723450e-01 -6.27474258e-02 -2.32073936e-01 7.51359037e-01 3.79243787e-01 6.56707377e-01 -5.07695345e-02 1.86687629e-01 8.18389681e-03 -7.79910213e-01 -3.74565011e-01 2.80968256e-01 4.56856532e-01 1.60420662e-01 -2.49590554e-01 -4.31253394e-01 -3.24418707e-01 -2.31215271e-01 -6.31837837e-01 -2.79277761e-01 1.29780486e-01 3.78556299e-01 2.65721050e-01 2.49854946e-01 2.52861990e-02 -1.83222913e-02 -2.62697655e-01 -3.27161231e-01 -9.84714954e-02 -1.41848368e-01 45.55888886 18.39474018 22.20581866 3.70123911 8.00125797 19.38406733 12.98726784 10.33309185 9.88495179 14.79919427 12.59098558 8.14752679 13.33978282 19.40150629 8.75662233 8.96523693 13.06456267 15.09590791 17.74723727 14.89982264 6.74092448e-01 1.52131089e+00 -6.58556692e-02 4.16046037e-01 -6.71808209e-01 -1.75452651e-01 -4.90723450e-01 -6.27474258e-02 -2.32073936e-01 7.51359037e-01 3.79243787e-01 6.56707377e-01 -5.07695345e-02 1.86687629e-01 8.18389681e-03 -7.79910213e-01 -3.74565011e-01 2.80968256e-01 4.56856532e-01 1.60420662e-01 -2.49590554e-01 -4.31253394e-01 -3.24418707e-01 -2.31215271e-01 -6.31837837e-01 -2.79277761e-01 1.29780486e-01 3.78556299e-01 2.65721050e-01 2.49854946e-01 2.52861990e-02 -1.83222913e-02 -2.62697655e-01 -3.27161231e-01 -9.84714954e-02 -1.41848368e-01 -2.49590554e-01 -4.31253394e-01 -3.24418707e-01 -2.31215271e-01 -6.31837837e-01 -2.79277761e-01 1.29780486e-01 3.78556299e-01 2.65721050e-01 2.49854946e-01 2.52861990e-02 -1.83222913e-02 -2.62697655e-01 -3.27161231e-01 -9.84714954e-02 -1.41848368e-01

Audio Features in RDF?   Why bother?

1.69004874e+00 3.03187215e-01 1.33318623e+00 -1.30552702e-02 8.32073038e-01 -3.62798702e-01 2.55154253e-01 -2.42700580e-01 4.05045464e-02 -2.58310827e-01 4.68943617e-01 1.77893445e-01 3.98363671e-01 -1.06080685e-01 2.08837166e-01 9.14348219e-02 -4.25139800e-01 -2.61554461e-01 2.11715669e-01 2.43940887e-01 -8.34046236e-05 -3.30750712e-01 -2.83539983e-01 -2.72162695e-01 -1.63518323e-01 -3.24927376e-01 -2.15598345e-01 3.20141531e-02 1.94742101e-01 1.04970093e-01 5.77336165e-02 -1.33649254e-01 4.05435412e-03 -2.40341400e-01 -2.13628811e-01 5.67761768e-02 -7.14260596e-03 82.74982802 103.48549016 54.35183501 41.91234708 62.91679459 45.55888886 18.39474018 22.20581866 3.70123911 8.00125797 19.38406733 12.98726784 10.33309185 9.88495179 14.79919427 12.59098558 8.14752679 13.33978282 19.40150629 8.75662233 8.96523693 13.06456267 15.09590791 17.74723727 14.89982264 15.71060977 22.70530114 26.39947967 21.06203171 30.87444089 39.83333877 57.93335017 56.53057888 53.4724693 52.11037484 73.067626 75.25670636 50.9927027 37.25109373 23.1124413 6.74092448e-01 1.52131089e+00 -6.58556692e-02 4.16046037e-01 -6.71808209e-01 -1.75452651e-01 -4.90723450e-01 -6.27474258e-02 -2.32073936e-01 7.51359037e-01 3.79243787e-01 6.56707377e-01 -5.07695345e-02 1.86687629e-01 8.18389681e-03 -7.79910213e-01 -3.74565011e-01 2.80968256e-01 4.56856532e-01 1.60420662e-01 -2.49590554e-01 -4.31253394e-01 -3.24418707e-01 -2.31215271e-01 -6.31837837e-01 -2.79277761e-01 1.29780486e-01 3.78556299e-01 2.65721050e-01 2.49854946e-01 2.52861990e-02 -1.83222913e-02 -2.62697655e-01 -3.27161231e-01 -9.84714954e-02 -1.41848368e-01 45.55888886 18.39474018 22.20581866 3.70123911 8.00125797 19.38406733 12.98726784 10.33309185 9.88495179 14.79919427 12.59098558 8.14752679 13.33978282 19.40150629 8.75662233 8.96523693 13.06456267 15.09590791 17.74723727 14.89982264 6.74092448e-01 1.52131089e+00 -6.58556692e-02 4.16046037e-01 -6.71808209e-01 -1.75452651e-01 -4.90723450e-01 -6.27474258e-02 -2.32073936e-01 7.51359037e-01 3.79243787e-01 6.56707377e-01 -5.07695345e-02 1.86687629e-01 8.18389681e-03 -7.79910213e-01 -3.74565011e-01 2.80968256e-01 4.56856532e-01 1.60420662e-01 -2.49590554e-01 -4.31253394e-01 -3.24418707e-01 -2.31215271e-01 -6.31837837e-01 -2.79277761e-01 1.29780486e-01 3.78556299e-01 2.65721050e-01 2.49854946e-01 2.52861990e-02 -1.83222913e-02 -2.62697655e-01 -3.27161231e-01 -9.84714954e-02 -1.41848368e-01 -2.49590554e-01 -4.31253394e-01 -3.24418707e-01 -2.31215271e-01 -6.31837837e-01 -2.79277761e-01 1.29780486e-01 3.78556299e-01 2.65721050e-01 2.49854946e-01 2.52861990e-02 -1.83222913e-02 -2.62697655e-01 -3.27161231e-01 -9.84714954e-02 -1.41848368e-01

Audio Features in RDF?   Why bother?



Will your friend's machine know how to deal with those numbers?


Data without semantics is meaningless for others!

Audio Features in RDF?   Why bother?

Main use-cases:

What are the important factors?

Audio Features in RDF?   Why bother?

Audio Features in RDF !

Some existing standards

Arguably, these all describe some aspects of music or audio:

Problems

Audio Feature Ontology

Audio Feature Ontology at a glance

Audio Feature Ontology

Key points:

Basic feature types:

Audio Feature Ontology

How complete is it ?

AF Examples (1): Onsets

A note onset on the signal timeline:


@prefix tl: <http://purl.org/NET/c4dm/timeline.owl#>.
@prefix af: <http://purl.org/ontology/af/>.
	
:signal_timeline a tl:Timeline .
:onset_23 a af:Onset;
	    event:time [ 
	        a tl:Instant ;
	        tl:timeline :signal_timeline ;
	        tl:at "PT1.710S"^^xsd:duration ;
	    ] .

AF Examples (2): Segments

A key segment:


	
:signal_timeline a tl:Timeline .
:key_segment_1 a af:Segment;
		rdfs:label """Bb major""" ;
		af:feature "11" ;
		event:time [
			a tl:Interval ;
			tl:timeline :signal_timeline ;
			tl:start "PT30.1S";
			tl:duration "PT200S";
		] .

AF Examples (3): Dense features

Describing a Chromagram: first express the temporal relations between the audio and the features.



:signal_timeline a tl:Timeline .
:feature_timeline a tl:DiscreteTimeLine .

:feature_timeline_map a tl:UniformSamplingWindowingMap ;
    tl:rangeTimeLine :feature_timeline ;
    tl:domainTimeLine :signal_timeline ;
    tl:sampleRate "44100"^^xsd:int ;
    tl:windowLength "65536"^^xsd:int ;
    tl:hopSize "8192"^^xsd:int .

AF Examples (3): Dense features

Describing a Chromagram: express the Chromagram signal itself.


@prefix mo: <http://purl.org/ontology/mo/>.

:feature a af:Chromagram ;
    mo:time [
        a tl:Interval ;
        tl:timeline :feature_timeline ;
    ] ;
	af:dimensions "12 0" ;
	af:value "0.00701192 0.00501109 ... 0.00446376" .

Is that all?

We can use MO to link feature data to recordings and other stuff on the Semantic Web

Vamp Plugins

Recall: Vamp plugins take audio input and return structured data (but not RDF!).

Vamp Plugins

Key points:

VamPy: An easy way to write Vamp plugins

Vampy is a Python wrapper for the Vamp Plugin API

Key Features:

Vampy example plugins available here.

VamPy Example

The process() function, where the feature extraction happens. Given a super-class computing MFCCs, one could write:


	def process(self,inputbuffers,timestamp):
		fftsize = self.m_blockSize
		complexSpectrum = inputbuffers[0]
		magnitudeSpectrum = abs(complexSpectrum)[0:fftsize/2]

		melSpectrum = self.warpSpectrum(magnitudeSpectrum)
		melCepstrum = self.getMFCCs(melSpectrum,cn=True)

		outputs = FeatureSet()
		outputs[0] = Feature(melCepstrum)
		outputs[1] = Feature(melSpectrum)
		return outputs

Vamp Plugin Ontology

Back to ontologies :-)

Vamp Transform Ontology

An open-ended model to describe an unlimited set of parameters.

Vamp Transform Ontology Example (1)

Setting up a Chromagram plugin. The basic settings:


@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix vamp: <http://purl.org/ontology/vamp/>.
@prefix : <#>.

:transform a vamp:Transform ;
	vamp:plugin  ;
	vamp:step_size "8192"^^xsd:int ; 
	vamp:block_size "65536"^^xsd:int ; 
	vamp:output  ;

Vamp Transform Ontology Example (2)

Setting up a Chromagram plugin. Parameter bindings:


    vamp:parameter_binding [
        vamp:parameter [ vamp:identifier "tuning" ] ;
        vamp:value "440"^^xsd:float ;
    ] ;
    vamp:parameter_binding [
        vamp:parameter [ vamp:identifier "minpitch" ] ;
        vamp:value "12"^^xsd:float ;
    ] ;
    vamp:parameter_binding [
        vamp:parameter [ vamp:identifier "maxpitch" ] ;
        vamp:value "96"^^xsd:float ;
    ] .

LV2 plugins: using RDF in audio processing?

LV2 is the successor of the LADSPA audio processor plugin API for Linux.

A quick recoup

So far we looked at:

Sonic Annotator

A command line Vamp plugin host that outputs RDF

Key features:

Using Sonic Annotator

(1) Create an RDF transform skeleton:


$ sonic-annotator -s vamp:vamp-example-plugins:fixedtempo:tempo > test.n3

(2) Edit the file if necessary and run the feature extractor:


$ sonic-annotator -t test.n3 audio_file.wav -w rdf --rdf-stdout

This will dump the results on the standard output.

A detailed tutorial is available here.

Sonic Visualiser

A Vamp host, Audio Player and Feature Visualiser

Sonic Visualiser

Sonic Visualiser can import audio features expressed using the AF Ontology.

CLAM Aggregator

CLAM is an audio analysis software framework which includes tools for
music annotation, such as CLAM Annotator.

CLAM Aggregator is an CLAM Annotator extension:

Putting it all together

An eco-system for using Semantic Web technologies in audio analysis:

Putting it all together

SAWA: A proof of concept application on the Web that brings these technologies together.

SAWA: An easy way to get started

Key features:

SAWA: An easy way to get started

How it's built:

Demos

(1). Using SAWA and Sonic Visualiser

(2). RDF-ize your algorithm using Sonic Annotator and VamPy (demo+hands-on)

Image credits

ezdyelectricity

Chris Cannam: Vamp Plugin Programmer's Guide.

Sonic Annotator tutorial.

[any material that should appear in print but not on the slide]

and now publishing Linked Data