Vocal Music Data Exchange Format – Publication #2

Introduction

Based on the publication #1 (you can download it here: http://europeanvoices.org/filedepot_download/22/4) I had a lot of communication with different people, potential users and members of groups, booking agents and IT professionals, music professionals and other interested people.

While talking it through over and over again, looking at it from different angles, I came to the conclusion that, besides being focussed on data types and format on one end, I have to start another parallel thread that is at least as important: Data management.

Thesis

Similar to the nature of the Internet – where we are aiming to bring our concept to life – the data publishing and provision to the system has to be in a de-centralized way.

The how

The content provider and most likely the rights owner of the information, i.e. artists, groups, teachers, organizations, have to be able to place it where ever they want and „the other parties“ pull it from there. The only need for the publisher is to let the receiving party know, where to find it. This is basically the same approach as for RSS or similar feeds.

The why

Same as for the feeds mentioned above, the publisher places and updates the information in one place only. This is a huge advantage comparing to today where he has to keep all information in all places published in sync.

I mean: these people are artists and focussed on their art and not book keeper or IT

People. Simplifying their life is one of the major topics here.

On the other side for the receiving parties: how often does it happen that major changes happen in a group and you still provide the outdated information, just because you are not aware?

Here you just decide, how often you look up for information updates and your webserver pulls all changes accordingly.

The challenge with distributed storage

While thinking the process through and having in mind that a seamless connect between several data objects, potentially coming from different sources, is the key challenge, I again compared to the mechanics of the internet as a highly distributed datapool.

I recognized that there is one central authority not following the distributed approach: the Domain Name Service (DNS) administrating the assignment of Domain names to IP adresses (finally: the computer the domain’s information is stored on).

If you request to call www.cnn.com for example, the system checks in the register what IP address links with the domain name and advises the requester to retrieve the information from there.

Central Registration Authority needed

We need a central authority just for the purpose to ensure uniqueness of information. This Authority will be used to generate and provide a unique identifier for artist information (let’s call it AUID = Artist Unique IDentifier).

Why does it make sense? Imagine the data of a concert by a group called VocalExpress is transported several times from one platform to the other. Unfortunately it ends up at a place where a group of that name is already registered.

How do I know if it is the same group or just one with the same name?

With the AUID attached it is easy to find out. If it matches to the the groups AUID I have in my data already, I can simply link the information together. If it does not, I can refuse the record or I automatically request the matching group’s information via the central authority, add that group to my database and link the concert to it.

Next steps

I have decided to start with a proof of concept now because I want to get the first pieces of information flying as soon as possible and not wait until we have the 100% solution graved in stone. I will keep you posted.

Feel free to get in contact with me in case you want to comment, participate, help or just give some input to incorporate in this piece of work. You can reach me at volker@europeanvoices.org

Volker

P.S. join the #ACAtech forum to participate in the discussion. You can download the complete article here:http://europeanvoices.org/filedepot_download/22/7

Introduction

I’m dealing with vocal music related data for quite a while now. It started back in 2002 when I wrote a web based administration backend for CASA to maintain addresses, groups, memberships and the famous arrangement library, followed by a full integration of the www.casa.org website in 2004.

Main goal for me was to link as many information together instead of keying it in multiple times.

In the following years I developed this approach for my own website project (www.acappella-online.de) also, in first place in my own PHP system, in 2007 replaced by a system based on Drupal.

The main challenges I faced every single time were the following:

1)    How to link what data together to avoid as much (inconsistent) data input and generate maximum value for the reader?

2)    How do I encourage groups and agencies to provide their data for my records?

For 1) it was beneficial that I’m writing software for ages and I know quite well how to normalize data to avoid redundancy. In addition I checked other websites with such information and analysed what were the missing pieces, pitfalls and other circumstances that makes is difficult to deal with the data.

Just two examples in the following:

–       if the group names are mentioned just in the description of the event, you cannot generate a list of concerts of that group.

–       if the location is not given, you cannot search for events in your region

For 2) it is important to understand that group already have a lot of places in the www they have to maintain. Think about the group’s website, Facebook, twitter, MySpace, YouTube and more. Why the heck should they key it in another time on my page??

I solved that problem in the beginning in way that needed a lot of individual programming: As in those days most of the websites were manually programmed and usually just having simple HTML structures, I wrote Site grabbing robots that visited the concert schedule pages of the groups, extracted the event data and pushed into my database.

As this approach was not maintainable with a growing number of Group Sites I decided to stop it when I moved to my Drupal driven page.

The now remaining argument (coming back to my initial question) why people should maintain data on my platform is: it has a high visibility in the scene and at Google and other search engines.

Having read this you may ask: “And?? What’s the problem of this guy now???”

I can tell you what it is: in our global world of the Internet nothing is local or regional anymore. People from the US and Asia are visiting European vocal festivals (and vice versa), groups touring worldwide, fans search for information more globally and much more. This means that the problem is not just how to convince people to key in data on the Website of vocal music in one (the home) country. In addition it is potentially worth to be present in the database of organizations representing the continent as well as global information stores for vocal music. (Honestly, this is not restricted to vocal music, but I personally will focus on it)

Status Quo

At the moment I’m designing the database and information store structures for EVA (european voices association), a non-profit organization to foster vocal music in Europe – see www.europeanvoices.org – and I see myself back facing my all-time favourite problems of data exchange as lined out above.

After thinking about it for a while I decided to do a 3 phase approach to nail it down:

1)    Write down the generic concept of a global data exchange format for vocal music data based on the analysis of needs in the user base

2)    Publish the concept and discuss it with organizations and website owners worldwide and finally decide on a format.

3)    implement it for EVA, potentially write data connectors/tools to be used by other website owners

Kicking Off Phase 1

What you usually do first when you start a software design is a stakeholder analysis to find out who is potentially interested the outcome of your project. If you are clear about your stakeholders, you get a much better view on the different angles the results will be looked at later and it helps you a lot during your requirement definition.

Let’s have a look at every single stakeholder to identify the expectation and derive the needs for the concept. I have focussed on mayor interests only to make the list not too long and too much diversified.

–       The Artist/Group

  • Wants to spread the word information and activities
  • Wants to find teachers and business partners

–       The Fan

  • Wants to get information about his favourite group
  • Wants to get information about activities in his area
  • Wants to get information about activities and groups when travelling
  • Wants to get information about organizations and festivals

–       The Teacher

  • Wants to spread the word about his offerings, skills and activities
  • Wants to get information about artists interested in his offering

–       The Company

  • Wants to spread the word about its offerings
  • Wants to get information about artists interested in its offering
  • Wants to get information about events to support, take part or offer services

–       The Organization

  • Wants to provide an as complete as possible view on their core interests in the area it covers
  • Wants to provide information beneficial to other stakeholders

–       The Media

  • Wants to get information about interesting activities to publish

If you drill those needs down you can see that the basics can be categorized into 2 types of information: Address and Calendar

AddressCalendar
Group Information
Group EventGroup Event
Teacher Information
WorkshopWorkshop
Booking Agency
Event/Group Management
Recording Studio
Equipment vendor
FestivalFestival
Organization
School
Media Appearance

Only the area of media (CDs/DVDs/MP3/…) is not represented in one of the 2 main buckets. In addition I have to say that fur sure the for example the group information consist of much more than the address, but the address is an important part.

If we have a look at the identified data blocks in the first go, it seems quite simple and straight forward. We can put them all in an address database and a calendar and we are done. But this is valid for the first look only. If you dig a bit deeper you can identify the 3rd important component needed: linking information.

What does it help if I have the group information and not having it linked to the concerts of the group? Would I have to key in the Group information for every single event? What does it help to have a festival listed and not being able to list the concerts and workshops during that Festival? Linking information is the key!

Below an exemplary link list between data objects:

ObjectLinkObject
Artist/GroupRepresented byBooking Agent
Artist/GroupRepresented byManagement
ConcertPerformed byArtist/Group
ConcertHappens atFestival
WorkshopHeld byTeacher
WorkshopHappens atFestival
CD/DVDRecorded byArtist/Group
Web LinkShowsArtist/Group

With this information I’m able to design the basic data model with all the information to be exchanged, BUT:

The creation and transport of data is not the biggest challenge!

The biggest challenge in that environment is the definition of uniqueness and the so called “leading system”.

To illustrate the problem imagine the following situation:

There is a Group XYZ added into a database in the US. At the same time a group XYZ is added into a database in Europe. What defines the valid entry? Is that the same group or two different groups? In case it is the same group, what do we do if in both places changes were made? What change is the correct one? In case they are two different groups, how do we make sure that the events attached to the US group XYZ are not moved to the European XYZ when a data transfer of concerts between both databases takes place?

Honestly I haven’t found the golden bullet yet. There are multiple possibilities I will think through during the further development of the concept, just as a heads-up a few possible approaches:

–       One big global central database acting as the master and everything is added and edited there and just the output will be provided to the local databases.

–       Globally defined unique identifiers per database instance to identify the source of information and to make sure to update the right record

–       Clear matching criteria to identify duplicates. Clear process to handle those exceptions, like: no overwrite of duplicate and alert importing party about the duplicate.

–       Mark imported data (origin other than own data) as read only. Updates can only be made on the source side.

–       Enable the owner of information (e.g. the group) to decide what database holds their “master” of data.

Have one global database instance doing the importing, consolidating and data clensing. This database is used as the communication master (upload to master to this DB only, download to local DBs from this DB only)

–       The group(s) maintain the data on their own website, are able to provide the data in the exchange format (RSS newsfeed-like) and register this feed with all other databases (CASA, EVA, Festivals, … ) where they want this data to be present

Ok – let’s make it happen. This is definitely not a one man show. I see it as an effort of the community and I invite everybody interested to become a part of this journey. May it be as a contributor to the fields needed in the database, objects of interest I forgot to include. May it be as a potential stakeholder I missed and explain what you see as your explicit needs and benefits. I’m open to any comment, any help and support to make this happen.

I will now start writing down the object structures in detail as the basis for further discussion.

To summarize the advantages and benefits if we get this off the ground:

  • Maintain once, but be present in multiple places. Reduces a lot of effort for groups, artists, teachers, website owners
  • Exchange of information and datamining for the big organizations and websites. With this concept it is possible to generate the most complete data pool EVER for vocal music.
  • Useful information filtering or searching/finding as all data using this format have a guaranteed data quality and content. All interested people can search their area, their country or region.
  • Potentially also better usability across the participating platforms as with similar datastructure must likely the access to the data is comparable too.

All in all there are tons of benefits for all of us beeing in the vocal music scene and it will allow us to reach a level of service for the community we never had before (and maybe no other art form has).

Feel free to get in contact with me in case you want to comment, participate, help or just give some input to incorporate in this piece of work. You can reach me at volker@europeanvoices.org

Volker